top | item 18555967

(no title)

RandomBookmarks | 7 years ago

How about https://ocr.space/tablerecognition

It returns table data line by line.

discuss

order

BasHamer|7 years ago

handled the non-printed whitespace but butchered the multi- line table headers, so re-building the headers is rough as it is line by line and you need to know what words go together and you have lost the structure.

cdolan|7 years ago

Can you send me a copy of what you are trying to extract? We use proprietary stuff (we're in the business of extracting data and performing analysis on invoices for waste, recycling, cellular, etc... stuff that gets "lost" in the AP department.

Happy to see if our tools can help. I've tried everything on the market - DocParser, MediusFlow, KOFAX, Ephesoft, etc... none work well enough in my opinion.