(no title)
dongecko | 2 years ago
You can improve on the retrieved documents in many ways, like - by better chunking,
- better embedding,
- embedding several rephrased versions of the query,
- embedding a hypothetical answer to the prompt,
- hybrid retrieval (vector similarity + keyword/tfidf/bm25 related search),
- massively incorporating meta data,
- introducing additional (or hierarchical) summaries of the documents,
- returning not only the chunks but also adjacent text,
- re-ranking the candidate documents,
- fine tuning the LLM and much, much more.
However, at the end of the day a RAG system usually still has a hard time answering questions that require an overview of your data. Example questions are:
- "What are the key differences between the new and the old version of document X?"
- "Which documents can I ask you questions about?"
- "How do the regulations differ between case A and case B?"
In these cases it is really helpful to incorporate LLMs to decide how to process the prompt. This can be something simple like query-routing, or rephrasing/enhancing the original prompt until something useful comes up. But it can also be agents that come up with sub-queries and a plan on how to combine the partial answers. You can also build a network of agents with different roles (like coordinator/planner, reviewer, retriever, ...) to come up with an answer.
* edited the formatting
CharlieDigital|2 years ago
My experience has been that they are far too unpredictable to be of use.
In my testing with agent networks, it was a challenge to force it to provide a response, even if it was imperfect. So if there's a "reviewer" in the pool, it seemed to cause the cycle to keep going with no clear way of forcing it to break out.
3.5 actually worked better than 4 because it ran out of context sooner.
I am certain that I could have tuned it to get it to work, but at the end of the day, it felt like it was easier and more deterministic to do a few steps of old-fashioned data processing and then handing the data to the LLM.
dongecko|2 years ago
Maybe my use case is narrow enough, so that in combination with a rather constraining and strict system message an answer is easy to find.
Second, I have lately played a lot with locally running LLMs. Their answers often break the formatting required for the agent to automatically proceed. So maybe I just don't see spiraling into oblivion, because I run into errors early ;)
chenxi9649|2 years ago
It also feels like we are at a bottle neck when it comes to the knowledge retrieval problem. I wonder if the "solution" to all of these is just a smarter foundational model, which will come out of 100x more compute, which will cost approximately 7 trillion dollars.
dongecko|2 years ago
In particular, I wonder if RAG systems will soon be a thing of the past, because end to end trained gigantic networks with longer attention spans, compression of knowledge, or hierarchical attention will at some point outperform retrieval. On the other hand, I can also see a completely different direction coming, where we develop architectures that, like operating systems, deal with memory management, scheduling and so on.