(no title)
cmenge | 5 months ago
Still, that single tender can be on the order of a billion tokens. Even if the LLM supported that insane context window, it's roughly 4GB that need to be moved and with current LLM prices, inference would be thousands of dollars. I detailed this a bit more at https://www.tenderstrike.com/en/blog/billion-token-tender-ra...
And that's just one (though granted, a very large) tender.
For the corpus of a larger company, you'd probably be looking at trillions of tokens.
While I agree that delivering tiny, chopped up parts of context to the LLM might not be a good strategy anymore, sending thousands of ultimately irrelevant pages isn't either, and embeddings definitely give you a much superior search experience compared to (only) classic BM25 text search.
elliotto|5 months ago
Embeddings had some context size limitations in our case - we were looking at large technical manuals. Gemini was the first to have a 1m context window, but for some reason its embedding window is tiny. I suspect the embeddings might start to break down when there's too much information.
codyb|5 months ago