top | item 43284569

(no title)

bondolo | 1 year ago

Such a shame that PDF doesn’t just, like, include the semantic structure of the document by default. It is brilliant that we standardized on an archival document format that doesn’t include direct access to the document text or structure as a core intrinsic default feature.

I say this with great anger as someone who works in accessibility and has had PDF as a thorn in my side for 30 years.

discuss

order

NeutralForest|1 year ago

I agree with this so much. I've tried to sometimes push friends and family to use text formats (at least I sent them something like Markdown), which is very easy to render in the browser anyways. But often you have to fall back to PDF, which I dislike very much. There's so much content like books and papers that are in PDF as well. Why did we pick a binary blob as shareable format again?

meatmanek|1 year ago

> Why did we pick a binary blob as shareable format again?

PDF was created to solve the problem of being able to render a document the same way on different computers, and it mostly achieved that goal. Editable formats like .doc, .html, .rtf were unreliable -- different software would produce different results, and even if two computers have the exact same version of Microsoft Word, they might render differently because they have different fonts available. PDFs embed the fonts needed for the document, and specify exactly where each character goes, so they're fully self-contained.

After Acrobat Reader became free with version 2 in 1994, everybody with a computer ended up downloading it after running across a PDF they needed to view. As it became more common for people to be able to view PDFs, it became more convenient to produce PDFs when you needed everybody to be able to view your document consistently. Eventually, the ability to produce PDFs became free (with e.g. Office 2007 or Mac OS X's ability to print to PDF), which cemented PDF's popularity.

Notably, the original goals of PDF had nothing to do with being able to copy text out of them -- the goal was simply to produce a perfect reproduction of the document on screen/paper. That wasn't enough of an inconvenience to prevent PDF from becoming popular. (Some people saw the inability for people to easily copy text from them as a benefit -- basically a weak form of text DRM.)

cess11|1 year ago

PDF is pretty strictly modeled on printed documents and their mainstream typography at the time of invention of Postscript and so on.

Printed documents do not have any structure beyond the paper and placement of ink on them.

lukasb|1 year ago

Even assuming you could get people to do the work (probably the real issue here) could a single schema syntax capture the semantics of the universe of documents that exist as PDFs? PDFs succeeded because they could reproduce anything.

andai|1 year ago

Tables? I regularly run into PDFs where even the body text is mangled!