top | item 32156089

(no title)

mrintellectual | 3 years ago

Great article. We used something very similar to help implement simlarity search at Yahoo a couple years back (https://yahooresearch.tumblr.com/post/158115871236/introduci...). We were using a indexing strategy called Locally Optimized Product Quantization, which worked great in terms of query times but required a training procedure which made successive inserts fairly inefficient.

Thankfully, we have a much wider variety of indexing options these days (https://milvus.io/docs/index.md) in addition to powerful vector databases (https://zilliz.com/learn/what-is-vector-database). I'm glad to see the barrier to entry for semantic image retrieval becoming lower and lower as ML infrastructure matures.

[EDIT] Disclosure: I work at Zilliz.

discuss

order

gk1|3 years ago

If folks just want to get started with vector search faster they can try https://www.pinecone.io.

Full disclosure: I work for Pinecone. It's important to disclose you work for a company if you're going to promote their links.

mrintellectual|3 years ago

Good catch on the disclosure, I edited my original comment to reflect this fact.

On the topic of vector search, Milvus is another great vector database - it's open source and we provide single-line startup scripts via `docker-compose` in addition to installation via apt & yum (https://milvus.io/docs/install_standalone-docker.md). There are also no restrictions on the number of vectors that users can store. Internally, we've successfully scaled Milvus to handle billion+ vectors, while many of our users have stored hundreds of millions of vectors in a production environments as well.