top | item 21590317

The Editable PDF Initiative

152 points| ddb | 6 years ago |editablepdf.org | reply

159 comments

order
[+] laurent123456|6 years ago|reply
> PDF has long become the de facto format for exchanging print-oriented documents on the Web, and for a good reason: it works, and reliably so!

Perhaps it's that way because it's read-only? If PDF files had to be generated in such a way that they can later be edited, things would get a lot more complex and probably less reliable.

Also it feels like it's the wrong way to go about it, because no matter what PDF editing will never be as powerful as a proper text editor. So it would be the wrong tool to collaborate on a document (because as soon as you want to do something more advanced with layout, images, etc. you probably can't). Maybe it's good if you want to quickly amend a contract before sending it, but then you need to remember that your .doc is no longer the latest version.

Basically a PDF document shouldn't be the source of truth for document editing as that would lock you to the wrong format.

[+] pge|6 years ago|reply
While this is mostly true, it should not be relied on. If the PDF is produced from a text document or report generator, then the text as well as the charts are easy to edit with any text editor (only requires decompressing the PDF first). Obviously different if the document is a scanned image, but just saving a Word doc into PDF does not make it a read-only file. The benefit of PDF is that it is (as the name suggests) portable, and one knows that the recipient will see exactly what was sent. With a Word or Powerpoint, formatting can show up different on different machines, fonts may not be available, etc.
[+] blunte|6 years ago|reply
Your argument is similar to security through obscurity. Obviously semi-technical people know how to "modify" (add objects that obscure other objects, then flatten and cleanse the output).

Really the issue is that we need a non-MS-Word editable document format that includes hash/signature features to ensure the edit/publish state of the document.

[+] kccqzy|6 years ago|reply
Meh. Not being easily editable is a feature, not a bug.

Sometimes you want to send out a finalized document and want to make 99% of the users unable to edit them. That's what PDFs are for. Imagine lawyers needing to send out a finalized contract. Or a graphic designer sending out the finalized design. Or an electronic book that has gone through the work of the author, the editor, and the publisher and needs no more changes. PDFs give an air of permanency and stability when so many other digital formats are malleable.

[+] klint|6 years ago|reply
That objection, and others, are addressed in the project FAQ[1]. It's already possible to open and edit a PDF in various applications.

[1] https://editablepdf.org/faq/

>But isn’t the whole point of PDF that you can’t edit it? No. The fact that standard PDFs are difficult to edit is more of an accident than a feature, as PDF’s roots are in printing, where only final-form documents needed to be transmitted. Many people believe PDF to be “impossible to edit,” but beware: minor edits in PDFs, such as swapping figures on an invoice, are trivial — therefore you need other technologies, such as digital signatures, to verify that your PDFs have not been tampered with. More extensive edits, however, are more difficult, as they require the document’s logical structure to be automatically detected, and this is an error-prone task.

[...]

>Are you sure we need such a editable PDF format? I believe one of the most important benefits of PDF is its concrete, solid state. The idea of Editable PDF stems from a real-world need to improve the efficiency in the way that we work with documents. Today, the only editable file formats are those native to the applications that generated documents, and none of these formats guarantees the layout to be preserved in the same way as PDF. Furthermore, despite improvements in compatibility, using a native file format still often requires the recipient to be using the same software (and often the same version) of the application, which may not be available.

PDF’s largest asset, its rock-solid visual presentation, will remain, and editable PDFs will be backwardly compatible with the current installed base of PDF viewers such as Adobe Reader and Preview.

[+] hnick|6 years ago|reply
Absolutely agreed. My day job is in the print industry so I deal with a lot of PDFs and they are perfect for print production for this reason.

Their main complaint seems to be structural metadata (this text is a heading, this text is the same font as that text on another page, so if you can change one the other should change, etc). I don't think at that point it's worth keeping PDF in the name, it'll confuse people. I certainly don't want to receive files built like that since printers tend to have memory issues with bloated files.

You can already do minor edits anyway with some knowledge (the spec is pretty easy to read) and some programming or a hex editor. The only issue I have is fonts, they are very complicated.

[+] vbezhenar|6 years ago|reply
This is sounds like security through obscurity.
[+] chungy|6 years ago|reply
Agreed, 100%.

Literally the only reason I make and send out PDFs is because they're effectively read-only. (It's not really perfectly so, but nobody should claim it's a tamper-proof document...)

[+] MaxBarraclough|6 years ago|reply
> Sometimes you want to send out a finalized document and want to make 99% of the users unable to edit them. That's what PDFs are for. Imagine lawyers needing to send out a finalized contract.

That is not what PDFs are for. PDF is, well, a Portable Document Format. It is not convenient to modify a PDF, but PDF is not securely resistant to modification (discounting its cryptographic features [0]). Its resistance to modification is a side-effect of its design, not a primary goal.

An attacker will be able to modify your PDF. This gets easier every year, as we'd expect. That doesn't matter, though, as an attacker can always recreate the document, with whichever changes they wish. (Again, neither of these attacks will work if you use cryptographic signing.)

If you want secure assurance of authenticity, you use cryptographic signing. No excuses. If you're a lawyer, I'd hope you aren't placing any stock at all in the inconvenience of modifying a PDF.

[0] https://acrobat.adobe.com/uk/en/sign/capabilities/digital-si...

[+] AnIdiotOnTheNet|6 years ago|reply
Anyone else remember all the times a government released a censored document only to have someone discover that they could just remove the black bar layer and see the original text with hardly any difficulty at all?
[+] pdpi|6 years ago|reply
PDF is a publishing, rather than an editing format. It belongs in the same bucket as .mp3 and .jpg, rather than the .doc and .psd bucket.

This is not about how easy or how hard it is to modify a pdf, it's about the intended purpose. The fact that it's meant for publishing means we get to optimise it as such, both in terms of simplicity of the format itself, and in terms of the tools that interact with it. This makes consistent-ish rendering much easier. The features that would enable the format to be "editable" are also the sort of features that make consistency hard.

[+] DannyB2|6 years ago|reply
I think of PDF like PostScript but not Turing complete.

I think of PostScript as an "ink on paper" format.

While you can take apart the PDF / PS format, dictionaries, etc. It's not a high level representation format, like a word processor. It's a way of specifying how to draw vector shapes onto "paper".

[+] social_quotient|6 years ago|reply
Totally agree here. It would be nice to edit pdf the same way it would be nice to edit a jpg. But it’s the wrong tier to operate on. What people really need is more consistent access to source files aka design files. The only time I feel stuck needing to change a pdf or jpg is when I don’t have access to the “design file”.

If reports that come out of systems need to be edited they should dump to excel or word and not pdf.

While the P in PDF means portable I think it’s better thought of as “published” as in “published document file”.

[+] wolrah|6 years ago|reply
Exactly. PDF is digital paper. The "print to PDF" metaphor is perfect. If you've printed something and decide you want to change it, you don't grab white out, you open the source file up and change it then print again.

If editing PDFs is something you find yourself needing to do regularly, something is very wrong with the process that's leading to this. It may not be your fault, it may be an upstream party who should be providing you with the source material, but either way making PDFs easier to edit is not the correct solution.

[+] blunte|6 years ago|reply
Consistency can be solved with hashing and cryptographic signatures.

The user story here is that PDFs get sent around as forms to be filled out, and that poses a problem for non Mac users or users without sufficient technical skill.

And since you reference mp3 and jpg, you surely know that both formats can be modified in ways that many people will not recognize as modifications. It just pushes the skill level up a bit. But there's always a technically capable person available for hire to modify one of the "permanent" formats you mention.

[+] m-p-3|6 years ago|reply
Agreed, and I prefer the LibreOffice approach to embed the original file within the PDF if the author decides to.

This doesn't break the simplicity of the PDF, while making it easy to edit.

[+] ropiwqefjnpoa|6 years ago|reply
I'm not going to say the PDF format is perfect, but I do like sending out documents knowing that in general they are going to be opened by a reader in a presentation format. I'd rather they not be opened by an editor, where my viewers are immediately invited to start making changes like a word doc. I suppose having them open initially in a non-editable mode would work, Acrobat functions that way.
[+] klodolph|6 years ago|reply
It’s phenomenal for so many use cases it’s absurd. I have a huge stash of PDF files—tons of articles that I’ve saved so I can refer to them later. Meanwhile, half the links I have to blog posts are dead, if not more. I know that I’ll be able to read these PDFs 10 years from now, or 20 years from now. Plenty of them are 10 or 20 years old.

PDF is great for:

- Archiving. It’s self-contained and will work 20 years down the line.

- Math. Anything with equations.

- Printing.

I’ve tried various techniques to archive web pages with varying degrees of success. With PDFs I don’t need to think about it.

[+] taneq|6 years ago|reply
Exactly. Even if it's now fairly easy to edit a PDF (although the obvious approaches like opening them in Word still often visibly mess with the formatting), sending something as a PDF is a signal to the recipient that it's not meant to be changed.
[+] IvanK_net|6 years ago|reply
I have been working on a PDF editor for several years. It is available inside my photo editor https://www.Photopea.com (press File - Open - choose a PDF file). People open 7 000 PDF files in it every day.

Often, a PDF contains just a single raster bitmap with the whole content rasterized. Also, text is often converted to vector shapes, which also makes it non-editable (as text). But it can open / save PDFs from Google Docs and other editors quite well.

[+] piadodjanho|6 years ago|reply
The PDF file format is anachronous.

When the format was created, computers only had a few KBs of RAM. Yet the format should be capable of editing documents with thousand of pages. The format solves this issue by delegating the memory management to the user.

Also, the file was made with the assumption it was suppose to be printed, not shared. It is easier to hide parts of the document instead of removing the data.

A funny trivia. The PDF is suppose to be read from the end of file. That's why some documents need to fully downloaded before they can display the first page. Of course, nowadays most PDF are linearized and load, at least, the first page right away.

Over the years specification got so complex it became very hard to implement a minimal editor, viewer, parser or generator. If the format was simpler, it would be possible to make "save as PDF" more accessible.

I've other issues with the typesetting and the way color is handled (it has a printer first approach), but I think this post got too long already. I just want to point out the spec supports so many pointless features such drawing in 3D space, movies, audio, HTML support, etc.

Finally, I don't understand why most people are against a revision on the PDF format despite clearly having very little knowledge on how it works. I think multi person edition of the same entry with some version control can be useful. By the way, the format kinda let many people edit the document at once, as long as they are not working in the same part.

[+] kccqzy|6 years ago|reply
> When the format was created, computers only had a few KBs of RAM. Yet the format should be capable of editing documents with thousand of pages. The format solves this issue by delegating the memory management to the user.

That's a good decision. Make the file format versatile and powerful. Don't constrain it by the limitations of contemporary hardware.

> Also, the file was made with the assumption it was supposed to be printed, not shared. It is easier to hide parts of the document instead of removing the data.

I agree it's made with the assumption of being printed, but that's part of the appeal—preserving visual fidelity of how the document looks. You can't send people a docx and expect them to see the exact same thing on their screen down to every detail.

And no it's not difficult to remove data. If you know exactly what to remove, it is quite easy to remove things. To remove text, find the Tj or TJ operators, remove them and their arguments. To remove an image, find the Do operator (occasionally BI, ID, EI) and remove it. You might have to perform decompression before doing that. For images, you might have to run another pass to delete the referenced object. But all these are all very easily automated.

> Over the years specification got so complex it became very hard to implement a minimal editor, viewer, parser or generator. If the format was simpler, it would be possible to make "save as PDF" more accessible.

The reason "save to PDF" is difficult to implement from scratch is not because of its complicated specification. Indeed parsers are quite easy to write. The real reason "save to PDF" is difficult to implement is because PDF wants visual fidelity; that comes at the price of specifying where exactly text should be placed, all the way from how paragraphs are flowed to how kerning of the letter is to be handled. Most applications do not care about these details. Most developers hardly have any interest in understanding line-breaking algorithms or interpreting font files to produce the right offsets and glyphs (think ligatures). These things are, rightfully, way beyond the business domain of typical applications and beyond the knowledge of typical developers.

[+] tonyedgecombe|6 years ago|reply
Microsoft's XPS format solved many of the issues with PDF. Unfortunately it was too late and came from the wrong people so didn't succeed.

As it's a zip file it also needs to be read from the end although it can be linearised as well.

[+] rusk|6 years ago|reply
> computers only had a few KBs of RAM.

I don't think this is right. Postscript maybe ... but PDF in my experience came about in the 90s, when computers typically had between 4 and 16MB of RAM ...

[+] gpvos|6 years ago|reply
The main reason making a "save as PDF" is hard is the font handling. The rest is fairly straightforward.
[+] BEEdwards|6 years ago|reply
I think it's funny that the top two comments of this post are diametrically opposed, yet I kind of agree with both of them.

The PDF is a terrible format, yet if I'm sending an email with an attachment I want you to see exactly how it looks on my computer then I'm exporting to PDF.

However if your book is only available as a pdf I'm probably going to skip it.

PDF is good for short things, a contract maybe. The best use case is forms which this doesn't really talk about but seems to address, the web has basically solved it, but there are times you want to send people a form to fill out that you don't want the formatting to be go wacky on, but still need to be editable.

PDF can do this but isn't good at it, this seems to take that not good and make it good.

[+] enriquto|6 years ago|reply
> However if your book is only available as a pdf I'm probably going to skip it.

Wait, what format do you expext a book to be? I mostly skip any book that is not on pdf

[+] mung|6 years ago|reply
If you find PDF painful because you can't reflow text or edit it, newsflash: you are using the wrong format. Industries that use PDF extensively: legal and printing. Neither of them want to be able to change documents.

To preempt: I work within printing, yes there are tools to hack into PDFs and make certain alterations or fixes, but it's to get you out of a bind only, it's not a normal healthy workflow.

[+] 9nGQluzmnq3M|6 years ago|reply
I'm going to add an unsolicited plug for PDFEscape, which effectively lets you "edit" any PDF: https://www.pdfescape.com/

It's an online service that lets you upload PDFs, then edit fields, add text, upload and paste images like your signature, etc. Perfect for filling out tedious paper application forms without having to deal with printing & scanning.

I have no connection other than as a satisfied user, and in fact I have no idea how they make money, since the free mode features suffice for every use case I've had.

[+] scrollaway|6 years ago|reply
I use Master PDF Editor (https://code-industry.net/masterpdfeditor/). It's not free, but it's not terribly expensive either and you can probably get it expensed depending on your job.

It also does PDF editing perfectly. I really hope there will be some open source version of it at some point. Or that someone's working on one.

[+] calvinmorrison|6 years ago|reply
SO to apples PDF veiwer. I don't own a Mac but I use my coworkers. Take a paper write your signature, then the webcam will scan, de crust and add your signature on a doc!
[+] lxgr|6 years ago|reply
The proposed way to achieve editability sounds like it is inherently at odds with the page description model of PDF, which is in turn exactly what gives it its stable output on different platforms.

A PDF renderer basically needs to be able to rasterize fonts and paint glyphs on a page/screen – that's it. Layout, spacing and even kerning are left to the producing application.

The project mentions the lack of robustness inherent to web-based document formats, but I'm afraid that any alternative would either be severely limited in the range of achievable output documents or would end up reinventing the wheel.

As an analogy: SVG has been around for a while, and yet we still use PNGs and I don't see them going away anytime soon.

Maybe what we really need is just more widespread support of ePub, and maybe some extensions for more "document-like" (instead of book-like) functionality in editors for it, and potentially support for an embedded rendered PDF for layout stability?

[+] burtonator|6 years ago|reply
The fact that PDF is immutable is a huge advantage.

In Polar we have taken the perspective that immutability is an advantage and is going to be the basis for our group collaboration around documents.

We ended up building out annotations on top of PDF including text highlights and area highlights which can then be commented on:

https://getpolarized.io/docs/annotation-sidebar.html

Some of our users keep asking for editable documentation and I think the main win here could just be using markdown which I'm thinking about adding.

The biggest thing that's needed though, for scientific use, is latex. Fortunately, there are plenty of markdown implementations with latex support.

PDF is amazingly good for printing documents but honestly 90% of the complex printing requirements aren't needed for regular use.

[+] bloak|6 years ago|reply
It sounds like what they want is a bit like what you get with a word processor provided that everyone is using the same version of the same program on identical systems, so you don't have the current situation of the layout getting completely broken because different people have different fonts, different paper sizes, and so on. In which case it's an interesting idea, but they shouldn't call it "PDF".

Although it's an interesting idea, I suspect it will never work in practice because word processing is just too complex. There are just too many complex features that people expect to have available. Different implementations will never be sufficiently compatible. Perhaps the solution is to bundle your document with a WebAssembly binary of a particular version of LibreOffice? OK, maybe you could separate the rendering functionality from the UI stuff, but it's hard to see how in practice you could get documents to be editable and rendered in the same way everywhere except by having everyone run the same binary to do the rendering, and there will inevitable be a hundred versions of that binary in use as new features get added.

[+] lars-b2018|6 years ago|reply
PDF is great because of its ability to present a print oriented view of any type of information, packed in a document container in an efficient manner. This is the design goal of the format. It is the source application's responsibility to manage the semantics of the document scope, where edits to the represented information can potentially cascade across the document in non-trivial ways (think Excel for example). PDFs CAN be edited today, but those edits are made by tools that just change the visual layout vs. the information structures represented by the document. It's a rather difficult problem to overcome if the PDF format now must contain rules about the underlying information structure itself in order to maintain a consistent representation in the document.
[+] cm-t|6 years ago|reply
As far I know, LibreOffice ('Draw' if i remember well) allow you to graphically edit PDF (xourjal too, but not as rich as LibreOffice)
[+] dwheeler|6 years ago|reply
LibreOffice does let you create editable PDFs. They do this with a very elegant solution, they embed the Open document format within the PDF. This takes very little additional space, because the Open document format is compressed. I think the LibreOffice Solution is quite elegant; Open document format is already a standard, we don't need to create another one. And most important, it works today, right now.

It would be a lot of effort to create a document format with the kind of richness that PDF supports. I am dubious it would be worth it.

I think most people do not need an editable PDF in the first place, so this is a minority problem. If you do want this, for most people there is already a working solution... just store Open document format within PDFs.

[+] nxpnsv|6 years ago|reply
I prefer my PDF static, my meticulously edited LaTeX would be ruined by sticky fingers. However, something I much would like is better copy to clipboard from PDF. Non trivial input with tables and line breaks turns in to indecipherable alphabet soup...
[+] diegof79|6 years ago|reply
Adobe Illustrator files (.ai) are PDF compatible files, so you can view them with a PDF reader like preview. The file still contains all the data to be edited in AI. I guess it means that PDF format is already designed to hold extra data that can be used for editing. But since pdf has many use cases, I don’t think that it will change much for editing. You still need a tool compatible with the original editor. However it will be interesting if docx like ai files could be displayed in a pdf viewer, it will save a lot of time dedicated to export/save as pdf.
[+] VvR-Ox|6 years ago|reply
Wow this is an awesome idea!

While editing PDFs on Linux for me was always connected with pain I also had no joy using a plain macos for this. While the preview app is able to do some things it cannot do others that matter.

I wanted to copy some text just yesterday - while I could select and copy it I could not insert it as a text again in the same application.

To have to use some extremely overpriced adobe product for sometimes doing tasks like this is overkill and really unnecessary.

To all the people who like PDF because you cannot edit it like you want: This is the "obfuscation argument" because anyone who has the right tool or googles for 10 min. can somehow edit PDF - it is just a real pain to do so most of the time and the result may look like the patched overhead transparencies we saw back in school in the earlier days.

[+] superkuh|6 years ago|reply
> If anyone constructed a PDF, which was itself blank but, via embedded JavaScript, loaded parts of itself from a remote server, people would rightly balk and wonder what on earth the creator of this PDF was thinking — yet this is precisely the design of many “websites”. To put it simply, websites and webapps are not the same thing, nor should they be. Yet the conflation of a platform for hypertext and a platform for applications has confused thinking, and led developers with prodigious aptitude for JavaScript to mistakenly see mere websites of text as a like nail to their applications hammer.

This quote was supposed to be an absurd hypothetical. But I guess we'll live to see it in reality.

[+] Meph504|6 years ago|reply
I think this effort is misguided, they are attempting to take something that has a specific purpose and does it well, and subvert it into something that other applications and formats do well.

Pdf's aren't promoted as a portable editable format, but a portable, sharable, and archival format.

Why promote PDF over ODF? Is the issues of document reflow, of an editable document such an issue that they need to develop a new set of tools, and change the structure of PDF to resolve the issue, if that is the case, it seems they could contribute to resolving the issue in ODF?