top | item 46977346

(no title)

I tested this pretty extensively and it has a common failure mode that prevents me from using: extracting footnotes and similar from the full text of academic works. For some reason, many of these models are trained in a way that results in these being excluded, despite these document sections often containing import details and context. Both versions of DeepseekOCR have the same problem. Of the others I’ve tested, dot-ocr in layout mode works best (but is slow) and then datalab’s chandra model (which is larger and has bad license constraints).

discuss

sgc|18 days ago

I can get multiple sets of footnotes (critical + content notes) reliably recognized and categorized using gemini-3-flash-preview. I took 15-20 hours to iterate on my prompt for a specific format. Otherwise it would not produce good enough results. It was a slow process because results from batch did not mirror what I was getting from the chat mode, and you have to wait for batch results while analyzing the last set. There was also a bit of debugging of the batch protocol going on at the same time. Flash is also surprisingly affordable for the results I am getting, 4-5x less than I had anticipated. I gave up on gemini-3-pro pretty quickly because it overthinks and messes things up.

droidjj|18 days ago

I have been looking for an OCR model that can accurately handle footnotes. It’s essential for processing legal texts in particular, which often have footnotes that break across pages. Sadly I’ve yet to encounter a good solution.

kergonath|18 days ago

I found Mathpix to be quite good with this type of documents, including footnotes but to be fair my documents did not have that many. It’s also proprietary.