Show HN: OCR Arena – A playground for OCR models
216 points| kbyatnal | 3 months ago |ocrarena.ai
Upload any doc, measure accuracy, and (optionally) vote for the models on a public leaderboard.
It currently has Gemini 3, dots.ocr, DeepSeek, GPT5, olmOCR 2, Qwen, and a few others. If there's any others you'd like included, let me know!
ArcaneMoose|3 months ago
I didn't expect IBM to be making relevant AI models but this thing is priced at $1 per 4,000,000 output tokens... I'm using it to transcribe handwritten input text and it works very well and super fast.
rubikscubeguy|3 months ago
intalentive|3 months ago
nicman23|3 months ago
irjustin|3 months ago
Super nice if it worked for our use case to simply get full output.
molf|3 months ago
Some results look plausible but are just plain wrong. That is worse than useless.
Example: the "Table" sample document contains chemical substances and their properties. How many numbers did the LLM output and associate correctly? That is all that matters. There is no "preference" aspect that is relevant until the data is correct. Nicely formatted incorrect data is still incorrect.
I reviewed the output from Qwen3-VL-8B on this document. It mixes up the rows, resulting in many values associated with the wrong substance. I presume using its output for any real purpose would be incredibly dangerous. This model should not be used for such a purpose. There is no winning aspect to it. Does another model produce worse results? Then both models should be avoided at all costs.
Are there models available that are accurate enough for this purpose? I don't know. It is very time consuming to evaluate. This particular table seems pretty legible. A real production grade OCR solution should probably need a 100% score on this example before it can be adopted. The output of such a table is not something humans are good at reviewing. It is difficult to spot errors. It either needs to be entirely correct, or the OCR has failed completely.
I am confident we'll reach a point where a mix of traditional OCR and LLM models can produce correct and usable output. I would welcome a benchmark where (objective) correctness is rated separately from of the (subjective) output structure.
Edit: Just checked a few other models for errors on this example.
* GPT 5.1 is confused by the column labelled "C4" and mismatches the last 4 columns entirely. And almost all of the numbers in the last column are wrong.
* olmOCR 2 omits the single value in column "C4" from the table.
* Gemini 3 produces "1.001E-04" instead of "1.001E-11" as viscosity at T_max for Argon. Off by 7 orders of magnitude! There is zero ambiguity in the original table. On the second try it got it right. Which is interesting! I want to see this in a benchmark!
There might be more errors! I don't know, I'd like to see them!
fzysingularity|3 months ago
deaux|3 months ago
rubikscubeguy|3 months ago
hdjrudni|3 months ago
cdrini|3 months ago
daemonologist|3 months ago
Also, some of the models are prone to infinite loops and I suspect this is not being punished appropriately; the frontend seems to get into a bad state after around 50k characters, which prevents the user from selecting a winner. Probably would be beneficial to make sure every model has an output length limit.
Still, a really cool resource - I'm looking forward to more models being added.
rubikscubeguy|3 months ago
codeddesign|3 months ago
kbyatnal|3 months ago
We had Mistral previously but had to remove it because their hosted API for OCR was super unstable and returned a lot of garbage results unfortunately.
Paddle, Nanonets, and Chandra being added shortly!
rubikscubeguy|3 months ago
poulpy123|3 months ago
I noticed that some models were resisting better to faking data than other, especially I saw that in a sentence cut from the document, GPT5 was inventing the end of the sentence and opus was properly showing it cut.
I didn't try with my writing but in the playground there is one example and some models read it better than me.
I wish the output would show the confidence of the model on each part. I think it would help immensely.
Note that sometimes a model get stuck in a loop, preventing to vote and to see which model is which
est|3 months ago
Working on a hobby project that interacts with user handwriting on <canvas>. Tried some CNN models for digits but had trouble with characters.
yorwba|3 months ago
I don't know what the state of the art is, but an old model for digitizer pens might not do so bad either.
tensor|3 months ago
Note that I haven't tried any of them, but tesseract is still likely the leading open source OCR that works with CPU.
zzleeper|3 months ago
But still, this is incredibly useful!
tarruda|3 months ago
ComputerGuru|3 months ago
UX on mobile isn’t great. It wasn’t obvious to me where the second model output was and I was thrown off even more so because the option to vote for model 1 output was presented without ever even seeing model two output.
Second suggestion would be to install a MathJax plugin so one can properly rate mathematical equations and formulas. Raw LATeX is easy to mistake and it makes comparing between LATeX and Unicode outputs hard.
rubikscubeguy|3 months ago
hakunin|3 months ago
I’ve had great results locally. Albeit you need macOS >=13 for this.
tethys|3 months ago
Just this morning I came across HunyuanOCR which sounded very promising. https://huggingface.co/tencent/HunyuanOCR
wener|3 months ago
rubikscubeguy|3 months ago
ianhawes|3 months ago
ajmurmann|3 months ago
rubikscubeguy|3 months ago
densekernel|3 months ago
coulix|3 months ago
prodigycorp|3 months ago
fzysingularity|3 months ago
kbyatnal|3 months ago
andrewlu0|3 months ago
mpercy123|3 months ago
timbmg|3 months ago
dahateb|3 months ago
krashidov|3 months ago
kbyatnal|3 months ago
rubikscubeguy|3 months ago
gfody|3 months ago
vdm|3 months ago
mkolodny|3 months ago
rubikscubeguy|3 months ago
deeptishukla22|3 months ago
lokl|3 months ago
arathis|3 months ago
kbyatnal|3 months ago
rubikscubeguy|3 months ago
aixpert|3 months ago
dang|3 months ago
[see https://news.ycombinator.com/item?id=45988611 for explanation]
ylhert|3 months ago
athoscouto|3 months ago
profburial|3 months ago
Andrew-Tate|3 months ago
[deleted]