(no title)
generall | 3 years ago
In addition to the indexing algorithm, there is the tokenizer, which depends on the language, lemmatizer, synonyms, stop-words, and so on and so forth. In addition, the ranking function itself may be quite different and based on different rules. See how Meilisearch does it. Reducing full-text search to just a reverse index is a misconception
> Your criticism of the Weaviate ANN benchmarks isn't relevant to our discussion on Hybrid Search.
It is very much relevant, as I mentioned, in parallel processes
total_latency = max(BM25_latency, Vector_search_latency) + merge overhead
and my claim is that in specialized tools both BM25_latency and Vector_search_latency will be better than what the multi-tool system can provide.
> I have linked this to show that Weaviate has produced comparative benchmarks which was your original claim.
I don't see any comparisons in your benchmarks here - https://weaviate.io/developers/weaviate/benchmarks/ann
You just benchmarked yourself, that is not interesting and not helping.
> I also agree that it would be interesting to run ANN recall tests on several hardware configurations.
That is not the point. In our benchmark we run all engines on exactly the same machine to make it fair. Sometimes same configuration in different regions already gives very different performance on some cloud providers.
CShorten|3 years ago
2. Following on #1 what optimizations does BM25 require that justify an entirely separate tool that requires maintenance of two separate search systems? Also helps that merge overhead to have both searches in the same system.
3. Any company's report of benchmarking itself against its competitors should be taken with a grain of salt.. this is obviously bad practice. The purpose of these benchmarks are to compare in this case hyperparameters of HNSW and in future works around the BEIR numbers I provided earlier, Hybrid Search performance.
4. Again, no company can seriously benchmark itself against its competitors due to obvious conflicts of interest. Maybe this competition could be hosted here - https://big-ann-benchmarks.com/index.html#organizers.
generall|3 years ago
Benchmarks like https://big-ann-benchmarks.com/index.html#organizers are good for comparing algorithms, but not engines. They are focused on a single use scenario and do not cover variety of possible applications. Like, for example, how filtering affect the performance.