top | item 46383029

(no title)

markerz | 2 months ago

The problem with LLMs using full-text-search is they’re very slow compared to a vector search query. I will admit the results are impressive but often it’s because I kick off an agent query and step away for 5 minutes.

On the other hand, generating and regenerating embeddings for all your documents can be time consuming and costly, depending on how often you need to reindex

discuss

order

leobg|2 months ago

Not an apples to apples comparison. Vector search is only fast after you have built an index. The same is true for full text search. That too, will be blazing fast once you have built an index (like Google pre-transformer).

markerz|2 months ago

LLMs will always have the tool call overhead, which I find to be quite expensive (seconds) on most models. Directly using vector databases without the LLM interface gets you a lot of the semantic search ability without the multi-second latency, which is pretty nice for querying documents on a website. E.G. finding relevant pages on a documentation website, showing related pages, etc. Can be applied to GitHub Issues to deduplicate issues, or show existing issues that could match what the user is about to report. There are plenty of places where “cheap and fast” is better and an LLM interface just gets in the way. I think this is a lot of the unsqueezed juice in our industry.