That's a nice diagram! Yeah that's roughly it. We'll be adding support for more sources of truth in the future to expand coverage, like the ones you mention but also NoSQL like MongoDB
1. Correct - we don't rely on pgvector. As a result, we're compatible with more existing managed Postgres services.
2. Probably the biggest differentiator between Vespa and Retake is the core architecture - Retake is built on top of OpenSearch. There's been quite a bit of debate regarding different search engines since Yahoo released Vespa - we leaned into OpenSearch because we saw that Open/ElasticSearch and its query language was much more familiar to more developers. Something that's coming soon to Retake is the ability to control how keyword/semantic scores are normalized and combined, which should give developers more fine-tuned control over their results.
3. In the short term, our support for models like SPLADE is constrained by OpenSearch, which uses BM25. In the medium to long term we would definitely consider modifying OpenSearch to do stuff like this.
4. We support both post-filtering and efficient kNN filtering, which takes place during the kNN search and guarantees that k results are returned. More details on the faiss kNN filter implementation can be found on the OpenSearch docs: https://opensearch.org/docs/latest/search-plugins/knn/filter...
isaacfung|2 years ago
I think vespa also supports hybrid search(it can also use late interaction model like colbert). How is retake compared to vespa?
Will retake supports sparse vector models like SPLADE(I heard they solve the vocab mismatch problems of keyword search).
How do you guys implement filtering?
retakeming|2 years ago
2. Probably the biggest differentiator between Vespa and Retake is the core architecture - Retake is built on top of OpenSearch. There's been quite a bit of debate regarding different search engines since Yahoo released Vespa - we leaned into OpenSearch because we saw that Open/ElasticSearch and its query language was much more familiar to more developers. Something that's coming soon to Retake is the ability to control how keyword/semantic scores are normalized and combined, which should give developers more fine-tuned control over their results.
3. In the short term, our support for models like SPLADE is constrained by OpenSearch, which uses BM25. In the medium to long term we would definitely consider modifying OpenSearch to do stuff like this.
4. We support both post-filtering and efficient kNN filtering, which takes place during the kNN search and guarantees that k results are returned. More details on the faiss kNN filter implementation can be found on the OpenSearch docs: https://opensearch.org/docs/latest/search-plugins/knn/filter...