Huh, I tried with the version from pip (instead of my package manager) and it completes in 22s. Output on the only page I tested is considerably worse than tesseract, particularly with punctuation. The paragraph detection seemed to not work at all, rendering the entire thing on a single line.
Even worse for my uses, Tesseract had two mistakes on this page (part of why I picked it), and neither of them were correctly read by EasyOCR.
Partial list of mistakes:
1. Missed several full-stops at the end of sentences
2. Rendered two full-stops as colons
3. Rendered two commas as semicolons
4. Misrendered every single em-dash in various ways (e.g. "\_~")
5. Missed 4 double-quotes
6. Missed 3 apostrophes, including rendering "I'll" as "Il"
7. All 5 exclamation points were rendered as a lowercase-ell ("l"). Tesseract got 4 correct and missed one.
aidenn0|1 year ago
Even worse for my uses, Tesseract had two mistakes on this page (part of why I picked it), and neither of them were correctly read by EasyOCR.
Partial list of mistakes:
1. Missed several full-stops at the end of sentences
2. Rendered two full-stops as colons
3. Rendered two commas as semicolons
4. Misrendered every single em-dash in various ways (e.g. "\_~")
5. Missed 4 double-quotes
6. Missed 3 apostrophes, including rendering "I'll" as "Il"
7. All 5 exclamation points were rendered as a lowercase-ell ("l"). Tesseract got 4 correct and missed one.