For learners it is confusing to see the nonlinear decision boundaries for linear and logistic regression, IMO a note about the feature expansion should be added
Good point, I've updated my post. For linear and logistic regression there's cubic expansion on the features (which is how they can fit curved problems). The relevant Javascript code is on lines 91 and 96.
PS: It can be changed to "linear" or "quadratic" as well.
It's a Javascript library I put together a long time ago for dealing with datasets and machine learning algorithms. It was used for some of my own personal projects and hasn't been focused on for release in the wild (although I'm considering it now).
I'm intrigued also. Definitely some sort of Machine Learning related library anyway. Found something related to it but it doesn't really have any substantial information on it either:
Yes, k-nn is theoretically the one of the best ML algorithms in the sense that it will find the closest items in the training set. For classification or finding similar looking items it is great. However, it has pretty poor running times for evaluation of unseen data (http://nlp.stanford.edu/IR-book/html/htmledition/time-comple...). This is contrary to something like neural networks, which take a while to train, but then evaluate very quickly. For real world use the training times matter to an extent, but in a web app or real time application the latency from knn is just impractical.
These visualisations are great but misleading regarding the performance of these classifiers. In practice you don't have a lot of data in a small number of dimensions (2 in this case). You have a little bit of data in zillions of dimensions. Think of classifying a 100x100 pixel image: that's 3x100x100=30000 dimensional data. You may not even have one training sample per class per dimension. Generalizing from comparatively little data to a very high dimensional space is the true difficulty of machine learning. Unfortunately you can't easily visualize that.
I'm a little surprised neural network comes up with a straight line and linear regression doesn't, which I thought by definition it would do. (e.g. on 2 normal groups)
Some discussion of methods, ie how many hidden layers/nodes for the neural network, would probably help make some sense of it.
Looking at the code (http://jsfiddle.net/wybiral/3bdkp5c0/light/) it seems they are expanding the features to include all second and third order terms (options.expansion = cubic), that's why linear regression does not come up with a straight line.
It's using the X and Y location of the dots as training data. Each algorithm is being trained on (x,y)->color in an attempt to buildup a rule for predicting what color an unseen (x,y) pair would be. The hypothesis it builds is then used to color the background so that you can see the decision boundary.
I accidentally left k means in there as an option and it doesn't make much sense in the context of this example. So, yeah, it's a bit of a bug. Realistically, linear regression doesn't make sense being included either but it still kinda works.
any visualization of these algorithms in 2 dimensions (with cubic feature expansion!) is completely misleading if you intend to work on any real problem with many dimensions. Also, for those asking for execution times, these would be horribly misleading as well.
The dataset is quite small and you have a fast machine. On my laptop, a 7 year old Core 2, there's a slight delay when running some of the heavier algorithms (e.g. running neural net or svm on the island data set).
[+] [-] blt|10 years ago|reply
[+] [-] wybiral|10 years ago|reply
PS: It can be changed to "linear" or "quadratic" as well.
[+] [-] edwinksl|10 years ago|reply
[+] [-] mtw|10 years ago|reply
Also what is nerdy.js? I saw it was related to "Carl Edward Rasmussen" but couldn't find another reference on the net
[+] [-] wybiral|10 years ago|reply
The reference to Carl Edward Rasmussen is because I based my minimize function heavily off of this one: http://learning.eng.cam.ac.uk/carl/code/minimize/
[+] [-] adriancooney|10 years ago|reply
http://nerdyjs.appspot.com/
[+] [-] indubitably|10 years ago|reply
[+] [-] obmelvin|10 years ago|reply
[+] [-] jules|10 years ago|reply
[+] [-] darkmighty|10 years ago|reply
[+] [-] maurits|10 years ago|reply
[1]: http://mldemos.epfl.ch/
[+] [-] RockyMcNuts|10 years ago|reply
Some discussion of methods, ie how many hidden layers/nodes for the neural network, would probably help make some sense of it.
Random forest could be worth adding.
[+] [-] pedrosorio|10 years ago|reply
[+] [-] lottin|10 years ago|reply
[+] [-] narsil|10 years ago|reply
It's because of the browser blocking mixed content: The JS libraries are being loaded over HTTP but the JSFiddle is over HTTPS.
The version above loads the libraries over HTTPS via cdnjs.com
[+] [-] autoreleasepool|10 years ago|reply
[+] [-] revorad|10 years ago|reply
[+] [-] wybiral|10 years ago|reply
[+] [-] andrelaszlo|10 years ago|reply
Refresh, choose dataset: curved, algorithm: k means clustering. You get this:
http://imageshack.com/a/img633/7110/sfteaE.png
If you play around and select different algorithms before selecting k means clustering you can get very different results. :)
[+] [-] wybiral|10 years ago|reply
[+] [-] orliesaurus|10 years ago|reply
[+] [-] throwaway_bob|10 years ago|reply
[+] [-] heinrichhartman|10 years ago|reply
Are you aware of reasonable high dimensional "visualizations". It cant' be accurate of course. But catpuring essential features would be nice.
E.g. here is a 4d cube: https://commons.wikimedia.org/wiki/File:8-cell.gif
[+] [-] chestervonwinch|10 years ago|reply
edit: I should also mention: these is very cool :)
[+] [-] joshvm|10 years ago|reply
[+] [-] p1esk|10 years ago|reply
[+] [-] alexanderb|10 years ago|reply
[+] [-] 0x99|10 years ago|reply
[+] [-] wybiral|10 years ago|reply