I no longer work there, but Lucidworks has had embedding training as a first-class feature in Fusion since January 2020 (I know because I wrapped up adding it just as COVID became a thing). We definitely saw that even with just slightly out-of-band use of language - e.g. in e-commerce, things like "RD TSHRT XS", embedding search with open (and closed) models would fall below bog-standard* BM25 lexical search. Once you trained a model, performance would kick up above lexical search…and if you combined lexical _and_ vector search, things were great.Also, a member on our team developed an amazing RNN-based model that still today beats the pants off most embedding models when it comes to speed, and is no slouch on CPU either…
(* I'm being harsh on BM25 - it is a baseline that people often forget in vector search, but it can be a tough one to beat at times)
softwaredoug|2 years ago
mvkel|2 years ago
When the time is finally right, people just "invent" what you made all over again.
data_maan|2 years ago
I know nothing about search, but a bit about ML, so I'm curious
az226|2 years ago