top | item 41089158

(no title)

fbdab103 | 1 year ago

I might have a work use case for which this would be perfect.

Having no experience with word2vec, some reference performance numbers would be great. If I have one million PDF pages, how long is that going to take to encode? How long will it take to search? Is it CPU only or will I get a huge performance benefit if I have a GPU?

discuss

order

9dev|1 year ago

As someone working extensively with word2vec: I would recommend to set up Elasticsearch. It has support for vector embeddings, so you can process your PDF documents once, write the word2vec embeddings and PDF metadata into an index, and search that in milliseconds later on. Doing live vectorisation is neat for exploring data, but using Elasticsearch will be much more convenient in actual products!

michaelmior|1 year ago

I would personally vote for Postgres and one of the many vector indexing extensions over Elasticsearch. I think Elasticsearch can be more challenging to maintain. Certainly a matter of opinion though. Elasticsearch is a very reasonable choice.

SkyPuncher|1 year ago

Vector stuff doesn’t take much. It’s essentially the same computational time as regular old text search.

kristopolous|1 year ago

No. Look at the implementation. Gpu isn't going to give you huge gains in this one