top | item 45065859

(no title)

tomomomo | 6 months ago

Yeah, I agree it’s not something new, since humans also do this kind of retrieval. It’s just a way to generate a table of contents for an LLM. I’m wondering, when LLMs become stronger, will we still need vector-based retrieval? Or will we need a retrieval method that’s more like how humans do it?

discuss

dragonwriter|5 months ago

> I’m wondering, when LLMs become stronger, will we still need vector-based retrieval? Or will we need a retrieval method that’s more like how humans do it?

If we knew how humans do it well enough to reproduce it, we’d probably skip straight to that. Everything in AI, though, is basically throwing ideas at the wall about how you might get closer to that, starting from very little knowledge of the mechanism and lots of anecdotes and subjective impressions about, but very little structured understanding of, even the behavior we want to mimic.

sdesol|6 months ago

> will we still need vector-based retrieval

I think for most use cases, it doesn't make much sense to use vector DBs. When I started to design my AI Search feature, I researched chunking a lot and the general consensus was, you can can lose context if you don't chunk in the right way and there wasn't really a right way to chunk. This was why I decided to take the approach that I am using today, which I talk about in another comment.

With input cost for very good models ($0.30/1M) for Gemini 2.5 Flash (bulk rates would be $0.15/1M), feeding the llm thousands of documents to generate summaries would probably cost 5 dollars or less if using bulk rate pricing. With input cost and with most SOTA LLMs being able to handle 50k tokens in context window with no apparent lost in reasoning, I really don't see the reason for vector DBs anymore, especially if it means potentially less accurate results.

CuriouslyC|6 months ago

Actually, chunking isn't such a bad problem with code, it chunks itself, and code embeddings produce better results. The problem is that RAG is fiddly, and people try to just copy a basic template or use a batteries included lib that's tuned to QA, which isn't gonna produce good results.