top | item 22477608

(no title)

equasar | 6 years ago

Any recommended library for .NET to extract text by coordinates?

discuss

order

Rury|6 years ago

There's itext7 (also for java). Not sure how it compares with other libraries, but it will parse text along with coordinates. You just need to write your own execution strategy to parse how you want.

From my experience, it seems to grab text just fine, the tricky part is identifying & grabbing what you want, and ignoring what you don't want... (for reasons mentioned in the article)

https://github.com/itext/itext7-dotnet

https://itextpdf.com/en/resources/examples/itext-7/parsing-p...

bob1029|6 years ago

I don't know that this could exist for all PDFs.

Sounds like you are in need of OCR if you want to be able to use arbitrary screen coords as a lookup constraint.