Maybe my imagination is limited or our documents aren't complex enough, but are we talking about realistic written documents? I'm sure you can take a screenshot of a very complex spreadsheet and it fails, but in that case you already have the data in structured form anyway, no?
Now if someone mails or faxes you that spreadsheet? You're screwed.
Spreadsheets are not the biggest problem though, as they have a reliable 2-dimensional grid - at worst some cells will be combined. The form layouts and n-dimensional table structures you can find on medical and insurance documents are truly unhinged. I've seen documents that I struggled to interpret.
bobsmooth|4 months ago
https://chatgpt.com/share/68f5f9ba-d448-8005-86d2-c3fbae028b...
Edit: Just caught a mistake, transcribed one of the prices incorrectly.
kbumsik|4 months ago
pietz|4 months ago
kbumsik|4 months ago
Just get a DEF 14A (Annual meeting) filing of a company from SEC EDGAR.
I have seen so many mistakes when looking at the result closely.
Here is a DEF 14A filing from Salseforce. You can print it to a PDF and then try converting.
https://www.sec.gov/Archives/edgar/data/1108524/000110852425...
daemonologist|4 months ago
Spreadsheets are not the biggest problem though, as they have a reliable 2-dimensional grid - at worst some cells will be combined. The form layouts and n-dimensional table structures you can find on medical and insurance documents are truly unhinged. I've seen documents that I struggled to interpret.