top | item 46976755

(no title)

rdos | 19 days ago

Is it possible for such a small model to outperform gemini 3 or is this a case of benchmarks not showing the reality? I would love to be hopeful, but so far an open source model was never better than a closed one even when benchmarks were showing that.

discuss

order

amluto|19 days ago

Off the top of my head: for a lot of OCR tasks, it’s kind of worse for the model to be smart. I don’t want my OCR to make stuff up or answer questions — I want to to recognize what is actually on the page.

retrac|19 days ago

Sometimes what is on the page is ambiguous. Imagine a scan where the dot over the i is missing in a word like "this". What's on the page is "thls" but to transcribe it that way would be an error outside of forensic contexts.

I am reminded it's basically impossible to read cursive writing in a language you don't know even if it's the same alphabet.

vintermann|18 days ago

Yes, but that's context specific. If your goal with OCR to make text indexable and searchable with regular text search, then transcribing "lesser" as "lesfer" is bad. And handwriting can often be so bad that you need context to make the call about what the scribbles actually are trying to say.

Evaluation methods, too, are bad because they don't think critically about what the downstream task is. Word Error Rate and Character Error Rate are terrible metrics for most historical HTR, yet they're what people use because of habit.

It's a bit like how for a long time BLEU was the metric for translation quality. BLEU is based on N-gram similarity to a reference translation, so naturally translation methods based on and targeting N-gram similarity (e.g. pre NN Google translate) did well, and looked much better than they actually were.

rdos|19 days ago

Interesting. Won't stuff like entity extraction suffer? Especially in multilingual use cases. My worry is that a smaller model might not realize some text is actually a persons name because it is very unusual.