top | item 37047462

Vector Similarity Beyond Search

39 points| generall | 2 years ago |qdrant.tech

11 comments

order

llogiq|2 years ago

Very nice diagrams, they make the article really easy to follow! This really drives the point home that vector search isn't only a qantitative (as in faster), but a qualitative evolutionary step.

Makes one wonder what other use cases are lurking that would need just another small modification and haven't even been thought of yet because they used to be impossible to implement.

weinzierl|2 years ago

I like their documentation in general and learned a lot from it. Especially since - unlike Pincore (which has good documentation too) - they don't focus primarily on their commercial offerings. I feel it's really written with the intention to inform first and to sell second.

brianjking|2 years ago

All the diagrams are dead images for me :(

DrScientist|2 years ago

The shift to vector search approaches over exact text searches has, for me at least, made googling harder.

If I'm searching for something which has words which have a more common meaning than the context I care about - then exact matching ( of my carefully crafted search term ) performs much better than vector search.

Not every query is looking for the average result.

generall|2 years ago

I would say that text text and vector search are orthogonal. Some scenarios are better with one, others with combination. But fitting vector search into the interface designed for text is limiting vector search potential

jillesvangurp|2 years ago

Vector search can be a useful tool for building a good search experience but people usually start at the wrong end of assuming they need it and then taking a few short cuts to just pick some technology and rush something out. This rarely leads to good results.

What I've seen:

- Vector search without good models does not tend to perform that well. I've seen comparisons where off the shelf free models struggle to keep up with simple text search and some manually tuned queries. Many companies might use those as a starting point but end up investing in their own models. BM25 (text ranking algorithm) provides a pretty solid baseline performance for a lot of things.

- Building good models is typically left as an exercise to the reader by those who provide vector search engines. These solutions are great for comparing vectors once you have them. However, getting good vectors is a bit of a dark art. And getting those is actually the hard part of the problem. Using a vector search engine is easy, getting good vectors isn't.

- Building good models to get good vectors requires a lot of expertise and skill. And not just technical skills. For example, understanding and building good processes for evaluating your search performance is not something they teach people in universities. I know some people and companies that can do this; they are not cheap (or bored). Cutting corners here leads to predictably meh results.

- The other thing you need is lots of data. The free open stuff that everybody else trains on as well is nice as a start but generally not good enough. That's why the likes of Google other big tech companies are so casual about sharing algorithms. They are worthless without data. And they're mostly not sharing data.

- Implementing vector search can be expensive. Basically it's function of hardware, people and time. It takes ages to train models, and it requires people who understand how to do that. You can speed it up with really expensive hardware. If your people make mistakes (because they don't know what they are doing), you'll burn a lot of time and hardware cost.

- Most startups or smaller companies don't really have the level of funding needed to do a proper job. Hence a lot of startups being a bit hand wavy about doing something something AI bla bla bla vector search on some beautiful slides. When you scrutinize these companies that usually means they have a (very) junior data "scientist" fresh out of college that heard a thing or two about how these things might actually work and not a whole lot else.

I've seem some companies doing this stuff properly. Some startups even. But not a lot. Sometimes you get the right mix of people and knowledge and ideas.