(no title)
driscoll42 | 1 year ago
Overall | Handwritten | Typed
Google Vision: 98.80% | 93.29% | 99.37%
Amazon Texttract: 98.80% | 95.37% | 99.15%
surya: 97.41% | 87.16% | 98.48%
azure: 96.09% | 92.83% | 96.46%
trocr: 95.92% | 79.04% | 97.65%
paddleocr: 92.96% | 52.16% | 97.23%
tesseract: 92.38% | 42.56% | 97.59%
nougat: 92.37% | 89.25% | 92.77%
easy_ocr: 89.91% | 35.13% | 95.62%
keras_ocr: 89.7% | 41.34% | 94.71%
Handwritten is a weighted average of Handwritten and typed, I also did Jaccard and Levenshtein distance, but the results were similar enough that just leaving them out for sake of space.Overall, of you want the best, if you're an enterprise, just use whatever AWS/GCP/Azure you're on, if you're an individual, pick between those. While some of the Open Source solutions do quite well, surya took 188 seconds to process 88 pages on my RTX 3080, while the cloud ones were a few seconds to upload the docs and download them all. But if you do want open source, seriously consider surya, tesseract, and nougat depending on your needs. Surya is the best overall, while nougat was pretty good at handwriting. Tesseract is just blazingly fast, from 121-200 seconds depending on using the tessdata-fast or best, but that's CPU based and it's trivially parallelizeable, and on my 5950X using all the cores, took only 10 seconds to run through all 88 pages.
But really, you need to generate some of your own sample test data/examples and run them through the models to see what's best. Given frankly how little this paper tested, I really should redo my study, add VLMs, and write a small blog/paper, been meaning to for years now.
pqdbr|1 year ago