top | item 43179472

(no title)

benedictevans | 1 year ago

Have a look at the previous essay. I couldn't get ChatGPT 4o to give me a number in a PDF correctly even when I gave it the PDF, the page number, and the row and column.

https://www.ben-evans.com/benedictevans/2025/1/the-problem-w...

discuss

order

simonw|1 year ago

I have a hunch that's a problem unique to the way ChatGPT web edition handles PDFs.

Claude gets that question right: https://claude.ai/share/7bafaeab-5c40-434f-b849-bc51ed03e85c

ChatGPT treats a PDF upload as a data extraction problem, where it first pulls out all of the embedded textual content on the PDF and feeds that into the model.

This fails for PDFs that contain images of scanned documents, since ChatGPT isn't tapping its vision abilities to extract that information.

Claude (and Gemini) both apply their vision capabilities to PDF content, so they can "see" the data.

I talked about this problem here: https://simonwillison.net/2024/Jun/27/ai-worlds-fair/#slide....

So my hunch is that ChatGPT couldn't extract useful information from the PDF you provided and instead fell back on whatever was in its training data, effectively hallucinating a response and pretending it came from the document.

That's a huge failure on OpenAI's behalf, but it's not illustrative of models being unable to interpret documents: it's illustrative of OpenAI's ChatGPT PDF feature being unable to extract non-textual image content (and then hallucinating on top of that inability).

benedictevans|1 year ago

Interesting, thanks. I think the higher level problem is that 1: I have no way to know this failure mode when using the product and 2: I don't really know if I can rely on Claude to get this right every single time either, or what else it would fail at instead.