top | item 43213626

(no title)

chad1n | 1 year ago

These "OCR" tools who are actually multimodals are interesting because they can do more than just text abstraction, but their biggest flaw is hallucinations and overall the nondeterministic nature. Lately, I've been using Gemini to turn my notebooks into Latex documents, so I can see a pretty nice usecase for this project, but it's not for "important" papers or papers that need 100% accuracy.

discuss

order

thelittleone|1 year ago

How about building a tool which indexes ocr chunks / tokens and a confidence grading. Setting a tolerance level and defining actions where the token or chunk (s) fall below that level. Actions could include could include automated verification using another model or last resort human.

Eisenstein|1 year ago

How would you calculate the confidence? LLMs are notoriously bad at grading their own output.