top | item 36398288

(no title)

Yes, I am using it on a not so small dataset (roughly 1 million docs) and the output is a fairly efficient model. I am using gensim with pre-trained word vectors. New docs can be inferred via .infer_vector().

Overall my approach is less automated than what I have seen in your codebase so it’s likely a bigger investment. I am happy to share more.

discuss

julien040|2 years ago

It's very interesting. I may try it in the future.