top | item 45640836

(no title)

yoran | 4 months ago

How does an LLM approach to OCR compare to say Azure AI Document Intelligence (https://learn.microsoft.com/en-us/azure/ai-services/document...) or Google's Vision API (https://cloud.google.com/vision?hl=en)?

discuss

ozgune|4 months ago

OmniAI has a benchmark that companies LLMs to cloud OCR services.

https://getomni.ai/blog/ocr-benchmark (Feb 2025)

Please note that LLMs progressed at a rapid pace since Feb. We see much better results with the Qwen3-VL family, particularly Qwen3-VL-235B-A22B-Instruct for our use-case.

cheema33|4 months ago

Omni OCR team says that according to their own benchmark, the best OCR is the Omni OCR. I am quite surprised.

CaptainOfCoit|4 months ago

Magistral-Small-2509 is pretty neat as well for its size, has reasoning + multimodality, which helps in some cases where context isn't immediately clear, or there are few missing spots.

daemonologist|4 months ago

My base expectation is that the proprietary OCR models will continue to win on real-world documents, and my guess is that this is because they have access to a lot of good private training data. These public models are trained on arxiv and e-books and stuff, which doesn't necessarily translate to typical business documents.

As mentioned though, the LLMs are usually better at avoiding character substitutions, but worse at consistency across the entire page. (Just like a non-OCR LLM, they can and will go completely off the rails.)

numpad0|4 months ago

Classical OCR still probably make undesirable su6stıtutìons in CJK from there being far too many of similar ones, even some absurd ones that are only distinguishable under microscope or by looking at binary representations. LLMs are better constrained to valid sequences of characters, and so they would be more accurate.

Or at least that kind of thing would motivate them to re-implement OCR with LLM.

fluoridation|4 months ago

Huh... Would it work to have some kind of error checking model that corrected common OCR errors? That seems like it should be relatively easy.

junto|4 months ago

Not sure how it compares but we did some trials with Azure AI Document Intelligence and were very surprised at how good it was. We had a document example which was a poor photograph of a document that had quite a skew, and it (too our surprise), also detected the customer’s human legible signature and extracted their name from that signature.

stopyellingatme|4 months ago

Not sure about the others but we use Azure AI Document Intelligence and its working well for our resume parsing system. Took a good bit of tuning but we havent had to touch it for almost a year now.

make3|4 months ago

aren't all of these multimodal LLM approaches, just open vs closed ones

sandblast|4 months ago

Not sure why you're being downvoted, I'm also curious.