Tesseract does not understand layout. It’s fine for character recognition, but if I still have to pipe the output to a LLM to make sense of the layout and fix common transcription errors, I might as well use a single model. It’s also easier for a visual LLM to extract figures and tables in one pass.
chaps|18 days ago
kergonath|18 days ago
My documents have one or two-column layouts, often inconsistently across pages or even within a page (which tripped older layout detection methods). Most models seem to understand that well enough so they are good enough for my use case.
fudged71|18 days ago