I have usually seen people recommend to chunk by sentences or paragraphs or some fixed length of characters. IMO, all these are suggested because they are easy to write code for, but in reality, length of a meaningful chunk depends entirely on the data. The way we chunk an FAQ document vs a PRD is different.Based on this assumption, I have a couple of questions:
1. Is chunking the most significant factor in RAG quality?
2. If there are no limitations, would humans that are experts in that dataset, be the best people to create chunks?
No comments yet.