Amazon's Opensearch (fork of Elasticsearch) natively supports vector-based approximate KNN (using https://github.com/nmslib/nmslib/) which is integrated with Opensearch's native filtering functionality. Elasticsearch also has similar functionality, but I don't know if their KNN code scales quite as well.
Opensearch only supports "pre-filtering" or "post-filtering," which leads to either high latency or incomplete results, as explained in the article.
This is why single-stage filtering was the most-requested feature for us.
From the Opensearch docs:
> You should not use approximate k-NN if you want to apply a filter on the index before the k-NN search, which greatly reduces the number of vectors to be searched.
> Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor search.
> If you use the knn query alongside filters or other clauses (e.g. bool, must, match), you might receive fewer than k results.
I realize you are hinting at it as the subject of a later article, but could you share with us a little bit more about single-stage-filtering :D ? how does it work? How can metadata and vector data "coexist" in a same index?
This works by streaming results from the similarity search to the filter, in descending order of similarity. As soon as enough matches are found, the similarity search is terminated.
Postgres does the same thing for many queries if you look at the query plan.
Not really worthy of a blog post - especially one that says "wait till the next blog post to find out how it works!!".
How is what you describe _not_ just an efficiently implemented post filter with early out?
If it turns out that’s all pinecone have, then yeah, I’m gonna be disappointed. My mind is working overtime imagining prefixing vectors with their filter terms to root everything and other naive things…
[+] [-] tobrien6|4 years ago|reply
[+] [-] gk1|4 years ago|reply
This is why single-stage filtering was the most-requested feature for us.
From the Opensearch docs:
> You should not use approximate k-NN if you want to apply a filter on the index before the k-NN search, which greatly reduces the number of vectors to be searched.
> Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters are applied on the results produced by the approximate nearest neighbor search.
> If you use the knn query alongside filters or other clauses (e.g. bool, must, match), you might receive fewer than k results.
(https://opensearch.org/docs/search-plugins/knn/approximate-k...)
I know Elasticsearch is working on introducing vector search but it is not yet available. I don't know how they will support filtering.
[+] [-] sysctl21|4 years ago|reply
[+] [-] jamesbriggs|4 years ago|reply
[+] [-] Noe2097|4 years ago|reply
I realize you are hinting at it as the subject of a later article, but could you share with us a little bit more about single-stage-filtering :D ? how does it work? How can metadata and vector data "coexist" in a same index?
[+] [-] throwaway_pdp09|4 years ago|reply
Did animation actually add anything significant anyway.
[+] [-] sysctl21|4 years ago|reply
[+] [-] jamesbriggs|4 years ago|reply
[+] [-] gk1|4 years ago|reply
[+] [-] TOMDM|4 years ago|reply
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot....
[+] [-] londons_explore|4 years ago|reply
Postgres does the same thing for many queries if you look at the query plan.
Not really worthy of a blog post - especially one that says "wait till the next blog post to find out how it works!!".
[+] [-] willvarfar|4 years ago|reply
If it turns out that’s all pinecone have, then yeah, I’m gonna be disappointed. My mind is working overtime imagining prefixing vectors with their filter terms to root everything and other naive things…
[+] [-] sgt101|4 years ago|reply
[+] [-] WithinReason|4 years ago|reply
[+] [-] gk1|4 years ago|reply
This is not a trivial problem when dealing with vector similarity search, which Postgres does not have.