(no title)
Icko
|
3 years ago
Last I compared them, (1-2 years ago), Google OCR was much much better and supported more languages than tesseract. There was also an OCR in openCV, which was slightly better than tesseract, but not good enough to be useful.
rafram|3 years ago
Icko|3 years ago
lasagna_coder|3 years ago
spi|3 years ago
Source: I work in developing a competing OCR service and we keep an eye on competition (e.g. aside from Google, solutions by Azure, Amazon, Abbyy, Nuance, Cloudmersive, etc., as well as our internal product of course, which is not available externally), and they are (almost) all significantly better on Tesseract.
The only domain where Tesseract is competitive is for perfect "black text on white paper", it gives pretty poor performance when dealing with colored, distorted text, or even strong page structure effects (tables, etc.).
When I say "pretty poor" I mean: "with respect to the state-of-the-art", of course it's still enormously better than what was the state-of-the-art before deep learning came into the picture, roughly a decade ago. And for things like "search contents of a book" it's basically perfect already.
holbue|3 years ago
Back in the days, Cuneiform got close to Tesseract's performance, but AFAIK it wasn't developed further...
Does anyone else know other promising open-source OCR engines?