top | item 46717507

ISO PDF spec is getting Brotli – ~20 % smaller documents with no quality loss

169 points| whizzx | 1 month ago |pdfa.org

111 comments

order

ericpauley|1 month ago

Some real cognitive dissonance in this article…

“The PDF Association operates under a strict principle—any new feature must work seamlessly with existing readers” followed by introducing compression as a breaking change in the same paragraph.

All this for brotli… on a read-many format like pdf zstd’s decompression speed is a much better fit.

xxs|1 month ago

yup, zstd is better. Overall use zstd for pretty much anything that can benefit from a general purpose compression. It's a beyond excellent library, tool, and an algorithm (set of).

Brotli w/o a custom dictionary is a weird choice to begin with.

mmooss|1 month ago

Note the language: "You're not creating broken files—you're creating files that are ahead of their time."

Imagine a sales meeting where someone pitched that to you. They have to be joking, right?

I have no objection to adding Brotli, but I hope they take the compatability more seriously. You may need readers to deploy it for a long time - ten years? - before you deploy it in PDF creation tools.

spider-mario|1 month ago

> on a read-many format like pdf zstd’s decompression speed is a much better fit.

brotli decompression is already plenty fast. For PDFs, zstd’s advantage in decompression speed is academic.

bhouston|1 month ago

Are they using a custom dictionary with Brotli designed for PDFs? I am not sure if it would help or not, but it seems like one of those cases it may help?

Something like this:

https://developer.chrome.com/blog/shared-dictionary-compress...

In my applications, in the area of 3D, I've been moving away from Brotli because it is just so slow for large files. I prefer zstd, because it is like 10x faster for both compression and decompression.

whizzx|1 month ago

The pdf association is still running experiments on whether or not to support custom dictionaries based on real life workloads gains.

So it might land in the spec once it has proven if offers enough value

Proclus|1 month ago

It seems they're using the standard dictionary, which is utterly bizzare.

The standard Brotli dictionary bakes in a ton of assumptions about what the Web looked like in 2015, including not just which HTML tags were particularly common but also such things as which swear words were trendy.

It doesn't seem reasonable to think that PDFs have symbol probabilities remotely similar to the web corpus Google used to come up with that dictionary.

On top of that, it seems utterly daft to be baking that into a format which is expected to fit archival use cases and thus impose that 2015 dictionary on PDF readers for a century to come.

I too would strongly prefer that they use zstd.

bobpaw|1 month ago

How can iText claim that adding Brotli is not a backward incompatible change (in the "Why keep encoding separate" table)? In the first section the author states that any new feature must work seamlessly with existing readers. New documents created that include this compression would be unintelligible to any reader that only supports Deflate.

Am I missing something? Adoption will take a long time if you can't be confident the receiver of a document or viewers of a publication will be able to open the file.

whizzx|1 month ago

It's prototypish work to support it before it land's in the official specification. But it will indeed take some adoption time.

Because I'm doing the work to patch in support across different viewers to help adoption grow. And once the big opensource ones ship it pdfjs, poppler, pdfium, adoption can quickly rise.

nialse|1 month ago

Who is responsible for the terrible decision? In the pro vs con analysis, saving 20% size occasionally vs updating ALL pdf libraries/apps/viewers ever built SHOULD be a no-brainer.

superkuh|1 month ago

This is nice, but PDF jumped the shark already. It's no longer a document format that always looks the same everywhere. The inclusion of "Dynamic XFA (XML Form Architecture) PDF" in the spec made it so PDF is an unreliable format. The aformentioned is a PDF without content that pulls down all it's content from the web. It even still, ostensibly, supports Flash (swf) animations. In practice these "PDF"s are just empty white pages with an error message like,

>"Please wait... If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document. You can upgrade to the latest version of Adobe Reader for Windows®, Mac, or Linux® by visiting http://www.adobe.com/go/reader_download. For more assistance with Adobe Reader visit http://www.adobe.com/go/acrreader. Windows is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries. Mac is a trademark of Apple Inc., registered in the United States and other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries."

kayodelycaon|1 month ago

Fortunately, XFA is deprecated. I haven’t seen one of those for a very long time.

ndriscoll|1 month ago

What is the point of using a generic compression algorithm in a file format? Does this actually get you much over turning on filesystem and transport compression, which can transparently swap the generic algorithm (e.g. my files are already all zstd compressed. HTTP can already negotiate brotli or zstd)? If it's not tuned to the application, it seems like it's better to leave it uncompressed and let the user decide what they want (e.g. people noting tradeoffs with bro vs zstd; let the person who has to live with the tradeoff decide it, not the original file author).

wongarsu|1 month ago

Few people enable file system compression, and even if they do it's usually with fast algorithms like lz4 or zstd -1. When authoring a document you have very different tradeoffs and can afford the cost of high compression levels of zstd or brotli.

Someone|1 month ago

- inside the file, the compressor can be varied according to the file content. For example, images can use jpeg, but that isn’t useful for compressing text

- when jumping from page to page, you won’t have to decompress the entire file

eru|1 month ago

Well, if sanity had prevailed, we would have likely stuck to .ps.gz (or you favourite compression format), instead of ending up with PDF.

Though we might still want to restrict the subset of PostScript that we allow. The full language might be a bit too general to take from untrusted third parties.

ksec|1 month ago

Why not zstd?

HackerThemAll|1 month ago

I think this was the main reason (from the linked article) LOL:

"Brotli is a compression algorithm developed by Google."

They have no idea about Zstandard nor ANS/FSE comparing it with LZ77.

Sheer incompetence.

gcr|1 month ago

If we're making breaking changes to PDFs, I'd love if the committee added a modern image format like JPEG-XL. In my experience, most disk usage of PDFs comes from images, not streams.

I keep a bunch of comics in PDF but JPEG-XL is by far the best way to enjoy them in terms of disk space.

Bolwin|1 month ago

Odd you should say that, as that's exactly what they've been discussing

whinvik|1 month ago

I am often frustrated by PDF issues such as how complicated it is to create one.

But reading the article I realized PDFs have become ubiquitous because of its insistence on backwards compatibility. Maybe for some things it's good to move this slow.

jhealy|1 month ago

The article is wrong, the PDF spec has introduced breaking changes plenty of times. It’s done slowly and conservatively though, particularly now that the format is an ISO spec.

The PDF format is versioned, and in the past new versions have introduced things like new types of encryption. It’s quite probable that a v1.7 compliant PDF won’t open on a reader app written when v1.3 was the latest standard.

nbevans|1 month ago

This is a really really bad idea. Don't break backwards compat. for 20% of gains. Internet connection speeds and storage capacities only go up. In a few years time, 20% of gains will seem crazy to have broken back-compat for.

cess11|1 month ago

'Your PDF:s will open slower because we decided that the CDN providers are more important than you'.

If size was important to users then it wouldn't be so common that systems providers crap out huge PDF files consisting mainly of layout junk 'sophistication' with rounded borders and whatnot.

The PDF/A stuff I've built stays under 1 MB for hundreds of pages of information, because it's text placed in a typographically sensible manner.

noname120|1 month ago

Ridiculous statement. CDN providers can already use filesystem compression and standard HTTP Accept-Encoding compression for transfers (which includes brotli by the way). This ISO provides virtually no benefit to them

avalys|1 month ago

This article is AI slop.

delfinom|1 month ago

tl;dr Commerical entity is paying to have the ISO altered to "legalize" their SDK they are pushing which is incompatible with standard PDF readers.

ISO is pay to play so :shrug:

whizzx|1 month ago

No this feature is coming straight from the PDF association itself and we just added experimental support before it's officially in the spec to help testing between different sdk processors.

So your comment is a falsehood

lmz|1 month ago

It's not even clear that they were the ones suggesting inclusion. They're just saying their library now supports the new thing.

https://pdfa.org/brotli-compression-coming-to-pdf/

> As of March 2025, the current development version of MuPDF now supports reading PDF files with Brotli compression. The source is available from github.com/ArtifexSoftware/mupdf, and will be included as an experimental feature in the upcoming 1.26.0 release.

> Similarly, the latest development version of Ghostscript can now read PDF files with Brotli compression. File creation functionality is underway. The next official Ghostscript release is scheduled for August this year, but the source is available now from github.com/ArtifexSoftware/Ghostpdl.

bhouston|1 month ago

I'm no fan of Adobe, but it is not that hard to add brotli support given that it is open. Probably can be added by AI without much difficulty - it is a simple feature. I think compared to the ton of other complex features PDF has, this is an easy one.

vgtftf|1 month ago

[deleted]