I’ve done a lot of OCR work and tesseract is nearly a decade out of date at this point. It is not a serious technology for anything requiring good accuracy or minor complexity. From what I’ve seen, GPT-4V completely smokes tesseract, but then again, most modern OCR systems do. If you want fast and pretty powerful OCR, check out paddle. If you want slower but higher accuracy, check out transformer based models such as TrOCR.
nh2|1 year ago
https://news.ycombinator.com/item?id=32077375
authorfly|1 year ago
However Tesseract is quite behind still as you note, even with v5.
authorfly|1 year ago
cpursley|1 year ago
Zuiii|1 year ago
nunez|1 year ago
elanning|1 year ago