top | item 40982016

(no title)

yingfeng | 1 year ago

Hi, I'm one of the creators of infinity, and the article has mentioned about the sparse vector vs bm25. While the sparse vector performs well under some evaluations, it is obtained by training a model, which means that it can't fully represent all of the user's keywords/tokens, and those that don't appear in the training set, are truncated. So this is a very big impact for many enterprise vertical scenarios. And bm25 doesn't have such a limitation

discuss

order

philippemnoel|1 year ago

BM25 is indeed way more important than these vector DBs will claim. At ParadeDB, we've observed significant use cases where customers need both