(no title)
mif | 1 year ago
From the video in this IBM post [0], I understand that it is a way for the LLM to check what its source and latest date of information is. Based on that, it could, in principle, say “I don’t know”, instead of “hallucinating” an answer. A RAG is a way to implement this feature for LLMs.
[0] https://research.ibm.com/blog/retrieval-augmented-generation...
simonw|1 year ago
The art of implementing RAG is deciding what text should be pasted into the prompt in order to get the best possible results.
A popular way to implement RAG is using similarity search via vector search indexes against embeddings (which I explained at length here: https://simonwillison.net/2023/Oct/23/embeddings/). The idea is to find the content that is semantically most similar to the user's question (or the likely answer to their question) and include extracts from that in the prompt.
But you don't actually need vector indexes or embeddings at all to implement RAG.
Another approach is to take the user's question, extract some search terms from it (often by asking an LLM to invent some searches relating to the question), run those searches against a regular full-text search engine and then paste results from those searches back into the prompt.
Bing, Perplexity, Google Gemini are all examples of systems that use this trick.
petervandijck|1 year ago
There are many tricks to get better context to send to your LLM, and that’s a large part of making the system give good answers.