Parsing pdfs (and powerpoints) and breaking them into "askable" chunks is definitely something we've been looking into and are keen to roll out. If you'd like to talk more about your use case definitely feel free to chuck us an email on the "reach out" email on the page!
giovannibonetti|3 years ago
The hard part would be parsing tables and other layout-dependent semantics. You usually start with text coordinates (like HTML elements with absolute position) and have to work backwards from that. I worked for some years in a project for a client that was full of edge cases, because whenever the input PDF (from a government agency) would have a slight layout change the parser would break. It took multiple iterations to make it robust enough.
james-revisoai|3 years ago
Would love to chat with you if you're up for it - you can test demo run our tool and contact us through the interface