top | item 43048326

(no title)

So, I did some OCR research early last year, that didn't include any VLMs, on some 1960s era English scanned documents with a mix of typed and handwritten (about 80/20), and here's what I found (in terms of cosine similarity):

                  Overall | Handwritten | Typed
  Google Vision:    98.80%  | 93.29%      | 99.37%
  Amazon Texttract: 98.80%  | 95.37%      | 99.15%
  surya:            97.41%  | 87.16%      | 98.48%
  azure:            96.09%  | 92.83%      | 96.46%
  trocr:            95.92%  | 79.04%      | 97.65%
  paddleocr:        92.96%  | 52.16%      | 97.23%
  tesseract:        92.38%  | 42.56%      | 97.59%
  nougat:           92.37%  | 89.25%      | 92.77%
  easy_ocr:         89.91%  | 35.13%      | 95.62%
  keras_ocr:        89.7%   | 41.34%      | 94.71%

Handwritten is a weighted average of Handwritten and typed, I also did Jaccard and Levenshtein distance, but the results were similar enough that just leaving them out for sake of space.

Overall, of you want the best, if you're an enterprise, just use whatever AWS/GCP/Azure you're on, if you're an individual, pick between those. While some of the Open Source solutions do quite well, surya took 188 seconds to process 88 pages on my RTX 3080, while the cloud ones were a few seconds to upload the docs and download them all. But if you do want open source, seriously consider surya, tesseract, and nougat depending on your needs. Surya is the best overall, while nougat was pretty good at handwriting. Tesseract is just blazingly fast, from 121-200 seconds depending on using the tessdata-fast or best, but that's CPU based and it's trivially parallelizeable, and on my 5950X using all the cores, took only 10 seconds to run through all 88 pages.

But really, you need to generate some of your own sample test data/examples and run them through the models to see what's best. Given frankly how little this paper tested, I really should redo my study, add VLMs, and write a small blog/paper, been meaning to for years now.

discuss

pqdbr|1 year ago

Ive been looking for handwritten benchmarks for a while and would love to read that blog post.