top | item 18555784

(no title)

BasHamer | 7 years ago

If this can get me tables out of pdf's generated by crystal reports it would be a godsend for testing. This has been a nightmare to try and solve, the best option so far has been adobe cloud but they don't offer an API for that. I'm excited to try it out.

discuss

order

mjt58|7 years ago

BasHamer|7 years ago

https://pdftables.com failed the test file, pretty good but inconsistent interpretation across rows, sometimes it split the cell, sometimes it did not. Tabula failed to detect multi-line rows, after manually changing the table it did do better than pdftables.com on splitting cells. Both failed the non-printable whitespace characters that created garbled outputs in the excel. The other one would take some time to rig up.

RandomBookmarks|7 years ago

How about https://ocr.space/tablerecognition

It returns table data line by line.

BasHamer|7 years ago

handled the non-printed whitespace but butchered the multi- line table headers, so re-building the headers is rough as it is line by line and you need to know what words go together and you have lost the structure.