top | item 46995964

(no title)

rob-wagner | 17 days ago

I’ve been using Gemini 3 Pro on a historical document archiving project for an old club. One of the guys had been working on scanning old handwritten minutes books written in German that were challenging to read (1885 through 1974). Anyways, I was getting decent results on a first pass with 50 page chunks but ended up doing 1 page at a time (accuracy probably 95%). For each page, I submit the page for a transcription pass followed by a translation of the returned transcription. About 2370 pages and sitting at about $50 in Gemini API billing. The output will need manual review, but the time savings is impressive.

discuss

order

energy123|17 days ago

Suggestion: run the identical prompt N times (2 identical calls to Gemini 3.0 Pro + 2 identical calls to GPT 5.2 Thinking), then running some basic text post-processing to see where the 4 responses agree vs disagree. The disagreements (substrings that aren't identical matches) are where scrutiny is needed. But if all 4 agree on some substring it's almost certainly a correct transcription. Wouldn't be too hard to get codex to vibe code all this.

matjet|17 days ago

Look what they need to mimic a fraction of [the power of having the logit probabilities exposed so you can actually see where the model is uncertain]

dubeye|17 days ago

It sounds like a job where one pass might also be a viable option. Until you do the manual review you won't have a full sense of the time savings involved.

rob-wagner|17 days ago

Good idea. I’ll try modifying the prompt to transcribe, identify the language, and translate if not English, and then return a structured result. In my spot checks, most of the errors are in people’s names and if the handwriting trails into margins (especially into the fold of the binding). Even with the data still needing review, the translations from it has revealed a lot of interesting characters as well as this little anecdote from the minutes of the June 6, 1941 Annual Meeting:

It had already rained at the beginning of the meeting. During the same, however, a heavy thunderstorm set in, whereby our electric light line was put out of operation. Wax candles with beer bottles as light holders provided the lighting. In the meantime the rain had fallen in a cloudburst-like manner, so that one needed help to get one's automobile going. In some streets the water stood so high that one could reach one's home only by detours. In this night 9.65 inches of rain had fallen.

websap|17 days ago

They could likely increase their budget slightly and run an LLM-based judge.

kaveh_h|17 days ago

Have you tried providing multiple pages at a time to the model? It might do better transcription as it have bigger context to work with.

netdur|17 days ago

Gemini 3 long context is not good as Gemini 2.5