top | item 46706025 Show HN: RAG chunk size "best practices" failed on legal text – I benchmarked it 2 points| metawake | 1 month ago |medium.com 3 comments order hn newest metawake|1 month ago Author here. Built RagTune to stop guessing at RAG configs.Surprising findings:1. On legal text (CaseHOLD), 1024 chunks scored WORST (0.618). The "small" 256 chunks won (0.664). 7% swing.2. On Wikipedia text? All chunk sizes hit ~99%. No difference.3. Plot twist: At 5K docs, optimal chunk size FLIPPED from 256→1024. Scale changes everything.Code is MIT: github.com/metawake/ragtuneHappy to discuss methodology. patrakov|1 month ago Now that you have 5K docs, can you try estimating the statistical uncertainty of the Recall@5 and MRR metrics measured via smaller datasets? Just make some different 400-document subsets of the whole 5K HotpotQA dataset and recalculate the metrics. load replies (1)
metawake|1 month ago Author here. Built RagTune to stop guessing at RAG configs.Surprising findings:1. On legal text (CaseHOLD), 1024 chunks scored WORST (0.618). The "small" 256 chunks won (0.664). 7% swing.2. On Wikipedia text? All chunk sizes hit ~99%. No difference.3. Plot twist: At 5K docs, optimal chunk size FLIPPED from 256→1024. Scale changes everything.Code is MIT: github.com/metawake/ragtuneHappy to discuss methodology. patrakov|1 month ago Now that you have 5K docs, can you try estimating the statistical uncertainty of the Recall@5 and MRR metrics measured via smaller datasets? Just make some different 400-document subsets of the whole 5K HotpotQA dataset and recalculate the metrics. load replies (1)
patrakov|1 month ago Now that you have 5K docs, can you try estimating the statistical uncertainty of the Recall@5 and MRR metrics measured via smaller datasets? Just make some different 400-document subsets of the whole 5K HotpotQA dataset and recalculate the metrics. load replies (1)
metawake|1 month ago
Surprising findings:
1. On legal text (CaseHOLD), 1024 chunks scored WORST (0.618). The "small" 256 chunks won (0.664). 7% swing.
2. On Wikipedia text? All chunk sizes hit ~99%. No difference.
3. Plot twist: At 5K docs, optimal chunk size FLIPPED from 256→1024. Scale changes everything.
Code is MIT: github.com/metawake/ragtune
Happy to discuss methodology.
patrakov|1 month ago