I think i'm missing something.. why would i pay to ocr the images when i can do it locally for free? Tesseract runs pretty well on just cpu, wouldn't even need something crazy powerful.
Tesseract works great for pure label-the-characters OCR, which is sufficient for books and other sources with straightforward layouts, but doesn't handle weird layouts (tables, columns, tables with columns in each cell, etc.) People will do absolutely depraved stuff with Word and PDF documents and you often need semantic understanding to decipher it.
That said, sometimes no amount of understanding will improve the OCR output because a structure in a document cannot be converted to a one-dimensional string (short of using HTML/CSS or something). Maybe we'll get image -> HTML models eventually.
I would be extremely surprised if that's the case. There are "open-source" multimodal LLMs can extract text from images as a proof that the idea works.
Probably the model is hallucinating and adding "Hungarian language is not installed for Tesseract" to the response.
daemonologist|1 year ago
That said, sometimes no amount of understanding will improve the OCR output because a structure in a document cannot be converted to a one-dimensional string (short of using HTML/CSS or something). Maybe we'll get image -> HTML models eventually.
gregolo|1 year ago
s5ma6n|1 year ago
Probably the model is hallucinating and adding "Hungarian language is not installed for Tesseract" to the response.