top | item 39142482

(no title)

I no longer work there, but Lucidworks has had embedding training as a first-class feature in Fusion since January 2020 (I know because I wrapped up adding it just as COVID became a thing). We definitely saw that even with just slightly out-of-band use of language - e.g. in e-commerce, things like "RD TSHRT XS", embedding search with open (and closed) models would fall below bog-standard* BM25 lexical search. Once you trained a model, performance would kick up above lexical search…and if you combined lexical _and_ vector search, things were great.

Also, a member on our team developed an amazing RNN-based model that still today beats the pants off most embedding models when it comes to speed, and is no slouch on CPU either…

(* I'm being harsh on BM25 - it is a baseline that people often forget in vector search, but it can be a tough one to beat at times)

discuss

softwaredoug|2 years ago

Heh. A lot of what search people have known for a while, is suddenly being re-learned by the population at large, in the context of RAG, etc :)

mvkel|2 years ago

The thing with tech is, if you're too early, it's not like you eventually get discovered and adopted.

When the time is finally right, people just "invent" what you made all over again.

data_maan|2 years ago

Sorry, what is it that people in search _have_ known?

I know nothing about search, but a bit about ML, so I'm curious

az226|2 years ago

What’s the model?