I've done vector indexes on CouchDB before, because it supports arrays as keys in it's map-reduce implementation. Worked great for fast document similarity search.
The way that vector indices work typically can make doing CRUD with them a real challenge. There is definitely novelty in being able to do both ANN indexing and fast high throughput CRUD.
In addition, the R of crud is hard to combine with vector indices. Case in point I am still waiting for elastic search to support both ANN and regular, structured filtering together well.
As mentioned in the article, I recommend Milvus (https://milvus.io) - it's open source and cloud native with standalone versions available. Alternatively, if you're looking for an open-source solution for generating embeddings, I recommend (https://github.com/towhee-io/towhee).
One downside for milvus is that version 1 doesn't do filtering (necessary for most search applications) and version 2 is significantly slower.
Google's vector nearest neighbors offering, weaviate, and Vespa are much better options if you're expecting to extend to more realistic workloads
You've posted several comments in this thread alone linking to your product, and it seems that the majority of your posts have been doing this for quite a while now. I'm sure it's excellent work, but can you please stop doing this?
It's fine to link to your own work occasionally, when it's particularly relevant, as part of a diverse mix of posts on unrelated things*. It's not ok to use HN primarily for promotion. See https://news.ycombinator.com/newsguidelines.html:
"Please don't use HN primarily for promotion. It's ok to post your own stuff occasionally, but the primary use of the site should be for curiosity."
When people do that we eventually start penalizing their accounts and sites, or in egregious cases, banning them. You're a good HN user, but this is still excessive. You're crossing the line at which the community starts to think of the word 'spam', and we inevitably start getting emails about it.
* I do get that your work is particularly relevant in a thread like this. What's missing is the 'diverse mix of posts on unrelated things'. In such a context, posting repeatedly about your own stuff starts to come across the wrong way.
Thank you for this. One approach I find missing in your blog is that of distance-based indexing. It's an approach that indexes vectors according to distances from chosen vantage points from within the data set. I've done some preliminary work on creating a system for images: phash.dev
[+] [-] splatcollision|4 years ago|reply
Brief writeup: http://splatcollision.com/page/fast-vector-similarity-querie...
[+] [-] joexner|4 years ago|reply
[+] [-] gk1|4 years ago|reply
[+] [-] dontreact|4 years ago|reply
In addition, the R of crud is hard to combine with vector indices. Case in point I am still waiting for elastic search to support both ANN and regular, structured filtering together well.
[+] [-] mrintellectual|4 years ago|reply
[+] [-] tabtab|4 years ago|reply
[+] [-] mrintellectual|4 years ago|reply
[+] [-] dtjohnnyb|4 years ago|reply
[+] [-] phenkdo|4 years ago|reply
[1] https://github.com/qdrant/qdrant
[+] [-] occupant|4 years ago|reply
[+] [-] cbsmith|4 years ago|reply
[+] [-] gk1|4 years ago|reply
For anyone interested in going down this rabbit hole, we have an entire learning center about vector databases and vector search (https://www.pinecone.io/learn/) including the obligatory "What is a Vector Database" intro with example notebooks: https://www.pinecone.io/learn/vector-database/
[+] [-] dang|4 years ago|reply
It's fine to link to your own work occasionally, when it's particularly relevant, as part of a diverse mix of posts on unrelated things*. It's not ok to use HN primarily for promotion. See https://news.ycombinator.com/newsguidelines.html: "Please don't use HN primarily for promotion. It's ok to post your own stuff occasionally, but the primary use of the site should be for curiosity."
When people do that we eventually start penalizing their accounts and sites, or in egregious cases, banning them. You're a good HN user, but this is still excessive. You're crossing the line at which the community starts to think of the word 'spam', and we inevitably start getting emails about it.
* I do get that your work is particularly relevant in a thread like this. What's missing is the 'diverse mix of posts on unrelated things'. In such a context, posting repeatedly about your own stuff starts to come across the wrong way.
[+] [-] starkd|4 years ago|reply
[+] [-] liminal|4 years ago|reply
[+] [-] krishnakatyal|4 years ago|reply