(no title)
cle | 4 months ago
To solve this generally you need to chunk not by page, but by semantic chunks that don't exceed the information density threshold of the model, given the task.
This is not a trivial problem at all. And sometimes there is no naive way to chunk documents so that every element can fit within the information density limit. A really simple example is a table that spans hundreds pages. Solving that generally is an open problem.
No comments yet.