(no title)
SeppoErviala | 12 years ago
http://radimrehurek.com/gensim/
It has good implementations of various algorithms, some of which support streaming or dirstribution, and it allows loading and dumping data in various formats.
I've used it for building content based recommender using tf-idf, lsi and similarity index. After the index is built, queries to it are really fast. It can handle quite large corpuses with little memory.
sbrother|12 years ago
The reason for that is a pretty epic list of dependencies (have fun explaining why the prod boxes need a fortran compiler), but in terms of efficiency and speed of development it's an obvious choice.
Radim|12 years ago
Hopefully the SciPy & BLAS dependencies will only get easier to install from now on... Continuum Analytics received shit loads of money and some of it is going towards better scientific Python packaging, I believe.
hnriot|12 years ago