top | item 46899162

(no title)

optimal chunk size is strongly query-dependent - very true.

Faced similar issues. Ended up adding a agentic tool call layer on the top to retrieve the nearby chunks to handle a case where a relevant answer was only partially available in a chunk (like a 7 step instruction in which only 4 were available in a chunk). It worked ok.

discuss

Djamba|24 days ago

Interesting. Can you elaborate a bit more please

graphitout|24 days ago

The RAG was setup on a bunch of documents, most of them were manuals containing steps about measurements, troubleshooting, and replacing components of industrial machines.

The issue was that most of these steps were long (above 512 tokens). So the typical chunk window wouldn't capture the full steps. We added a tool calling capability by which LLM can request nearby chunks of a given chunk. This worked well in practice, but burned more $$.