(no title)
aliosm | 1 year ago
As a result, I developed a Python package called tahweel (https://github.com/ieasybooks/tahweel), which leverages Google Cloud Platform's Service Accounts to run OCR and provides page-level output. With the default settings, it can process a page per second. Although it's not open-source, it outperforms the other solutions by a significant margin.
For example, OCRing a PDF file using Surya on a machine with a 3060 GPU takes about the same amount of time as using the tool I mentioned, but it consumes more power and hardware resources while delivering worse results. This has been my experience with Arabic OCR specifically; I'm not sure if English OCR faces the same challenges.
fred123|1 year ago
aliosm|1 year ago
vikp|1 year ago
aliosm|1 year ago
For example, I'm deploying tahweel to one of my webapps to allow limited number of users to run OCR on PDF files. I'm using a small CPU machine for this, deploying Surya will not be the same and I think you are facing similar issues in https://www.datalab.to.
fred123|1 year ago
bugglebeetle|1 year ago
aliosm|1 year ago
I'm currently planning to develop a tool to correct Arabic outputs for ASR and OCR. It will function like spell-correction but with a focus specifically on these two areas. Perhaps you could start something similar for Japanese? English (and Latin languages in general) perform at a different level across multiple tasks, to be honest...