top | item 41204485

(no title)

vikp | 1 year ago

Hi, I'm the author of surya (https://github.com/VikParuchuri/surya) - working on improving speed and accuracy now. Happy to collaborate if you have specific page types it's not working on. For modern/clean documents it benchmarks very similarly to Google Cloud, but working on supporting older documents better now.

discuss

order

aliosm|1 year ago

Hello Vik, and thanks for your work on Surya, I really liked it once I found it, but my main issue now is the latency and hardware requirements, as accuracy could be fixed overtime for different page types.

For example, I'm deploying tahweel to one of my webapps to allow limited number of users to run OCR on PDF files. I'm using a small CPU machine for this, deploying Surya will not be the same and I think you are facing similar issues in https://www.datalab.to.

fred123|1 year ago

It seems to struggle with German text a lot (umlauts etc)