top | item 2469983

(no title)

MLnick | 15 years ago

Also worth looking at for linear SVMs:

Sofia-ml which is a very fast linear svm and classification C++ package. Supports PEGASOS as well as logistic regression and also learning rankings. Has no bindings for other languages which is a bit of a downside. Still, a useful command-line tool.

http://code.google.com/p/sofia-ml/

It also includes a package for very fast mini-batch K-Means (http://code.google.com/p/sofia-ml/wiki/SofiaKMeans). Combining these two approaches one can effectively learn a "kernelized" model while still being linear and therefore very fast (at least this is the claim, I haven't tried this).

I've used both the SVM and k-means package and they work very well. For sparse datasets with >500 dimensions and > 10 million rows, file IO time was <15 sec, training time <3 sec. K-means is slower but still orders of magnitude faster than standard batch k-means.

Finally, Vowpal Wabbit is a very fast package that also uses stochastic gradient descent as the workhorse. Also has a nice feature-hashing compression scheme which is being widely adopted (e.g. in Mahout, and also in sofia-ml above).

https://github.com/JohnLangford/vowpal_wabbit/wiki

discuss

No comments yet.