Embedding based RAG will always just be OK at best. It is useful for little parts of a chain or tech demos, but in real life use it will always falter.
The difference is this feature explicitly isn't designed to do a whole lot, which is still the best way to build most LLM-based products and sandwich it between non-LLM stuff.
Most of my ChatGPT queries use RAG (based on the query ChatGPT will decide if it needs to search the web) to get up to date information about the world. In reality life it's effective and it's why every large provider supports it.
rag will be pronounced differently ad again and again. it has its use cases. we moved to agentic search having rag as a tool while other retrieval strategies we added use real time search in the sources. often skipping ingested and chunked soueces. large changes next windows allow for putting almost whole documents into one request.
phillipcarter|4 months ago
The difference is this feature explicitly isn't designed to do a whole lot, which is still the best way to build most LLM-based products and sandwich it between non-LLM stuff.
charcircuit|4 months ago
DSingularity|4 months ago
underlines|4 months ago
esafak|4 months ago
leetharris|4 months ago
To give a real world example, the way Claude Code works versus how Cursor's embedded database works.
sgt|4 months ago