Let me know if you have any questions. I do intend to keep up with this series, although my pace is pretty slow at about one article every three months or so.
There are already a couple of comments about running ML in JS and how JS and the browser environment isn't terribly suited for heavy calculations. First: you're totally correct; second, I chose JS because it's
1) accessible -- whether you're Python or Ruby or PHP on the backend, you're probably comfortable with JS and
2) it demystifies machine learning -- you have to write your ML from scratch, without the help of all those wonderful Python libs, and I think this exercise shows you that it's not so mysterious after all.
Anyway, thanks for reading, and I'll poke in here throughout the day if you have questions.
I've been building a data management platform for the last 8 years and we are now at the stage where we want to provide tools to help our customers get more from their data than just statistics. As my programming experience is mainly in PHP and JS this set of articles is helping me grasp ML rather than trying to wrap my head around a new language. I'm currently working on k-means clustering and re-implementing everything in PHP to get the best possible understanding I can .. my aim after that is to see how well I can implement things at an SQL level.
Article looks really good! Look forward to longer read tonight. Minor nit: I'd stick with idiomatic JS style/formatting. Looks like you mixed in styles from some of the other languages (tab indentation, braces, etc.) This is always a religious argument, but I write a lot of different languages, and always just try to stick with the most popular idiomatic way, regardless of whether I care for it or not.
"2) it demystifies machine learning -- you have to write your ML from scratch, without the help of all those wonderful Python libs, and I think this exercise shows you that it's not so mysterious after all." - you can demystify machine learning in any better language.
JS and PHP are slow, crappy and bug prone. Sane languages (like Python, C++) have tools to make your job easier (like numpy, blas, eigen and other libraries). They provide fast and reliable math routines so you don't have to worry about some eigenvalue decomposition, matrix multiplication and other problems.
For those who aren't familiar Andrej Karpathy has done a lot of cool stuff with ML in JS. Particularly he has a CNN library -- deep learning comes to JS!
To those wondering why someone would want ML in JS, there are loads of reasons.
For starters, node.js, which makes most of the arguments regarding server/client moot.
Secondly, there are many client side applications for these types of algorithms as well. K-means clustering, for example, is already used by many mapping libraries to group together large numbers of points[1].
I personally use neural networks and affinity propagation in many of my applications for predictive analysis. This does not have to only be educational, or of a 'toy' nature.
Nodejs is for i/o, i know it has workarounds for long running tasks(threadpool) but it does not excell at that. Training your model, updating your model, validation, matrix factorization on large datasets etc, i just don't see how Javascript helps here. Maybe just taking the http request and dumping it onto a rabbitmq queue to to classify something but you still have a whole host of other stuff to deal with.
So the input/output pairs are a linked list of objects? Which then contain vectors comprised of linked lists? I am not very into JavaScript, but that right there must preclude this from doing anything significant in a reasonable amount of time?
For some reason it's somewhat hard to find C-style science code examples in some disciplines. Python feels a bit like a plague in this respect. Everytime I have to wrap my head around while converting code to C-like language (C, C++, PHP, JS).
The distance to convert math to python is so much shorter than math to C or math to javascript.
You need something like numpy to make working in javasctipt easier before there will be a proliferation of of ML in JS.
I really love JS for its distribution and some of the visualizations are amazing. But the low level, numerically stable, matrix math primitives are sorely lacking.
> … well, most of the time. There are some things you really can’t do in PHP or Javascript, but those are the more advanced algorithms that require heavy matrix math.
Leaving out javascript (in the browser), it sounds like an odd statement to make about php -- after all one of php strengths is how easy it is to link with c-libraries (or other with c ffi)? Among other things I quickly found:
I still view JS as a UI-oriented language, and I really don't know why would you want to implement processor-heavy algorithms in a browser environment, which need a lot of data and don't use the networking.
I would still stick to python. Or java. Or anything else which has a clear syntax and can run at a useful speed (I'm not mentioning C++ because of the coding overhead and dirty tricks which makes it a bit unfriendly for learning an algorithm)
Clarity of syntax is a matter of opinion (personally, I agree that Python is clearer than JS... Java, not so much.)
Implying that JavaScript can't "run at a useful speed" is wrong, using modern implementations. This is especially true for code that runs through lots of repetition as the just-in-time compilers in the JS engines do a remarkable job.
Not to mention that viewing JS as a UI-oriented language seems a bit out of date given the 40k or so packages for Node.js that are in npm.
JavaScript of today is pretty different than JS of 2007, and there are more changes coming with generators, iterators, destructuring, class syntax, arrow functions, promises, etc.
I kicked around with some JS manifold learning stuff[1] a while back for essentially the same purpose: practice in writing things from scratch, while making it easier for other people to play with.
You cannot write asm.js by hand (in a sane way... it uses one big array for everything). It's meant to be translated from emscripten clang compiler project. So you can compile C/C++ code to asm.js.
But Javascript engines like V8 with its JIT are way faster than Python. You can even use typed arrays that give you almost native speed for such operations (e.g. matrix). I am coding a 3D game in WebGL and JS is as fast as Java when used in a modern fashion, though JS run in every browser
[+] [-] bkanber|12 years ago|reply
Let me know if you have any questions. I do intend to keep up with this series, although my pace is pretty slow at about one article every three months or so.
There are already a couple of comments about running ML in JS and how JS and the browser environment isn't terribly suited for heavy calculations. First: you're totally correct; second, I chose JS because it's
1) accessible -- whether you're Python or Ruby or PHP on the backend, you're probably comfortable with JS and
2) it demystifies machine learning -- you have to write your ML from scratch, without the help of all those wonderful Python libs, and I think this exercise shows you that it's not so mysterious after all.
Anyway, thanks for reading, and I'll poke in here throughout the day if you have questions.
[+] [-] xd|12 years ago|reply
I've been building a data management platform for the last 8 years and we are now at the stage where we want to provide tools to help our customers get more from their data than just statistics. As my programming experience is mainly in PHP and JS this set of articles is helping me grasp ML rather than trying to wrap my head around a new language. I'm currently working on k-means clustering and re-implementing everything in PHP to get the best possible understanding I can .. my aim after that is to see how well I can implement things at an SQL level.
[+] [-] d_j_b|12 years ago|reply
>all those wonderful Python libs
As a non-mathematician I have no understanding of how wonderful they really are, which is why this sort of thing is so valuable.
[+] [-] zenocon|12 years ago|reply
[+] [-] usamec|12 years ago|reply
JS and PHP are slow, crappy and bug prone. Sane languages (like Python, C++) have tools to make your job easier (like numpy, blas, eigen and other libraries). They provide fast and reliable math routines so you don't have to worry about some eigenvalue decomposition, matrix multiplication and other problems.
[+] [-] kajecounterhack|12 years ago|reply
http://cs.stanford.edu/people/karpathy/convnetjs/
http://cs.stanford.edu/people/karpathy/svmjs/demo/
Heather Arthur (npm libraries brain, classifier) has also done a bunch of cool stuff!
https://github.com/harthur
[+] [-] morganherlocker|12 years ago|reply
For starters, node.js, which makes most of the arguments regarding server/client moot.
Secondly, there are many client side applications for these types of algorithms as well. K-means clustering, for example, is already used by many mapping libraries to group together large numbers of points[1].
I personally use neural networks and affinity propagation in many of my applications for predictive analysis. This does not have to only be educational, or of a 'toy' nature.
[1] http://danzel.github.io/Leaflet.markercluster/example/marker...
[+] [-] nashequilibrium|12 years ago|reply
[+] [-] Already__Taken|12 years ago|reply
Do hope this author writes more again it has been quite.
[+] [-] bkanber|12 years ago|reply
[+] [-] viana007|12 years ago|reply
https://github.com/harthur/brain
[+] [-] nightski|12 years ago|reply
[+] [-] frik|12 years ago|reply
For some reason it's somewhat hard to find C-style science code examples in some disciplines. Python feels a bit like a plague in this respect. Everytime I have to wrap my head around while converting code to C-like language (C, C++, PHP, JS).
[+] [-] tlarkworthy|12 years ago|reply
You need something like numpy to make working in javasctipt easier before there will be a proliferation of of ML in JS.
I really love JS for its distribution and some of the visualizations are amazing. But the low level, numerically stable, matrix math primitives are sorely lacking.
[+] [-] Joe8Bit|12 years ago|reply
[0]: https://github.com/NaturalNode/natural
[+] [-] e12e|12 years ago|reply
> … well, most of the time. There are some things you really can’t do in PHP or Javascript, but those are the more advanced algorithms that require heavy matrix math.
Leaving out javascript (in the browser), it sounds like an odd statement to make about php -- after all one of php strengths is how easy it is to link with c-libraries (or other with c ffi)? Among other things I quickly found:
http://www.php.net/manual/en/intro.lapack.php
[+] [-] code_scrapping|12 years ago|reply
I would still stick to python. Or java. Or anything else which has a clear syntax and can run at a useful speed (I'm not mentioning C++ because of the coding overhead and dirty tricks which makes it a bit unfriendly for learning an algorithm)
[+] [-] dangoor|12 years ago|reply
Implying that JavaScript can't "run at a useful speed" is wrong, using modern implementations. This is especially true for code that runs through lots of repetition as the just-in-time compilers in the JS engines do a remarkable job.
Not to mention that viewing JS as a UI-oriented language seems a bit out of date given the 40k or so packages for Node.js that are in npm.
JavaScript of today is pretty different than JS of 2007, and there are more changes coming with generators, iterators, destructuring, class syntax, arrow functions, promises, etc.
[+] [-] xd|12 years ago|reply
[+] [-] perimo|12 years ago|reply
[1]: https://github.com/perimosocordiae/js_manifolds
[+] [-] ecesena|12 years ago|reply
[+] [-] frik|12 years ago|reply
But Javascript engines like V8 with its JIT are way faster than Python. You can even use typed arrays that give you almost native speed for such operations (e.g. matrix). I am coding a 3D game in WebGL and JS is as fast as Java when used in a modern fashion, though JS run in every browser
[+] [-] TeeWEE|12 years ago|reply
if all you have is a hammer, everything looks like a nail
[+] [-] bkanber|12 years ago|reply
[+] [-] LambdaAlmighty|12 years ago|reply
There's money to be made with this combination. The field is ripe.
Good write up too.