“The PDF Association operates under a strict principle—any new feature must work seamlessly with existing readers” followed by introducing compression as a breaking change in the same paragraph.
All this for brotli… on a read-many format like pdf zstd’s decompression speed is a much better fit.
yup, zstd is better. Overall use zstd for pretty much anything that can benefit from a general purpose compression. It's a beyond excellent library, tool, and an algorithm (set of).
Brotli w/o a custom dictionary is a weird choice to begin with.
Note the language: "You're not creating broken files—you're creating files that are ahead of their time."
Imagine a sales meeting where someone pitched that to you. They have to be joking, right?
I have no objection to adding Brotli, but I hope they take the compatability more seriously. You may need readers to deploy it for a long time - ten years? - before you deploy it in PDF creation tools.
Are they using a custom dictionary with Brotli designed for PDFs? I am not sure if it would help or not, but it seems like one of those cases it may help?
In my applications, in the area of 3D, I've been moving away from Brotli because it is just so slow for large files. I prefer zstd, because it is like 10x faster for both compression and decompression.
It seems they're using the standard dictionary, which is utterly bizzare.
The standard Brotli dictionary bakes in a ton of assumptions about what the Web looked like in 2015, including not just which HTML tags were particularly common but also such things as which swear words were trendy.
It doesn't seem reasonable to think that PDFs have symbol probabilities remotely similar to the web corpus Google used to come up with that dictionary.
On top of that, it seems utterly daft to be baking that into a format which is expected to fit archival use cases and thus impose that 2015 dictionary on PDF readers for a century to come.
How can iText claim that adding Brotli is not a backward incompatible change (in the "Why keep encoding separate" table)? In the first section the author states that any new feature must work seamlessly with existing readers. New documents created that include this compression would be unintelligible to any reader that only supports Deflate.
Am I missing something? Adoption will take a long time if you can't be confident the receiver of a document or viewers of a publication will be able to open the file.
It's prototypish work to support it before it land's in the official specification.
But it will indeed take some adoption time.
Because I'm doing the work to patch in support across different viewers to help adoption grow. And once the big opensource ones ship it pdfjs, poppler, pdfium, adoption can quickly rise.
Who is responsible for the terrible decision? In the pro vs con analysis, saving 20% size occasionally vs updating ALL pdf libraries/apps/viewers ever built SHOULD be a no-brainer.
This is nice, but PDF jumped the shark already. It's no longer a document format that always looks the same everywhere. The inclusion of "Dynamic XFA (XML Form Architecture) PDF" in the spec made it so PDF is an unreliable format. The aformentioned is a PDF without content that pulls down all it's content from the web. It even still, ostensibly, supports Flash (swf) animations. In practice these "PDF"s are just empty white pages with an error message like,
>"Please wait... If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document. You can upgrade to the latest version of Adobe Reader for Windows®, Mac, or Linux® by visiting http://www.adobe.com/go/reader_download. For more assistance with Adobe Reader visit http://www.adobe.com/go/acrreader. Windows is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries. Mac is a trademark of Apple Inc., registered in the United States and other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries."
What is the point of using a generic compression algorithm in a file format? Does this actually get you much over turning on filesystem and transport compression, which can transparently swap the generic algorithm (e.g. my files are already all zstd compressed. HTTP can already negotiate brotli or zstd)? If it's not tuned to the application, it seems like it's better to leave it uncompressed and let the user decide what they want (e.g. people noting tradeoffs with bro vs zstd; let the person who has to live with the tradeoff decide it, not the original file author).
Few people enable file system compression, and even if they do it's usually with fast algorithms like lz4 or zstd -1. When authoring a document you have very different tradeoffs and can afford the cost of high compression levels of zstd or brotli.
- inside the file, the compressor can be varied according to the file content. For example, images can use jpeg, but that isn’t useful for compressing text
- when jumping from page to page, you won’t have to decompress the entire file
Well, if sanity had prevailed, we would have likely stuck to .ps.gz (or you favourite compression format), instead of ending up with PDF.
Though we might still want to restrict the subset of PostScript that we allow. The full language might be a bit too general to take from untrusted third parties.
If we're making breaking changes to PDFs, I'd love if the committee added a modern image format like JPEG-XL. In my experience, most disk usage of PDFs comes from images, not streams.
I keep a bunch of comics in PDF but JPEG-XL is by far the best way to enjoy them in terms of disk space.
I am often frustrated by PDF issues such as how complicated it is to create one.
But reading the article I realized PDFs have become ubiquitous because of its insistence on backwards compatibility. Maybe for some things it's good to move this slow.
The article is wrong, the PDF spec has introduced breaking changes plenty of times. It’s done slowly and conservatively though, particularly now that the format is an ISO spec.
The PDF format is versioned, and in the past new versions have introduced things like new types of encryption. It’s quite probable that a v1.7 compliant PDF won’t open on a reader app written when v1.3 was the latest standard.
This is a really really bad idea. Don't break backwards compat. for 20% of gains. Internet connection speeds and storage capacities only go up. In a few years time, 20% of gains will seem crazy to have broken back-compat for.
'Your PDF:s will open slower because we decided that the CDN providers are more important than you'.
If size was important to users then it wouldn't be so common that systems providers crap out huge PDF files consisting mainly of layout junk 'sophistication' with rounded borders and whatnot.
The PDF/A stuff I've built stays under 1 MB for hundreds of pages of information, because it's text placed in a typographically sensible manner.
Ridiculous statement. CDN providers can already use filesystem compression and standard HTTP Accept-Encoding compression for transfers (which includes brotli by the way). This ISO provides virtually no benefit to them
No this feature is coming straight from the PDF association itself and we just added experimental support before it's officially in the spec to help testing between different sdk processors.
> As of March 2025, the current development version of MuPDF now supports reading PDF files with Brotli compression. The source is available from github.com/ArtifexSoftware/mupdf, and will be included as an experimental feature in the upcoming 1.26.0 release.
> Similarly, the latest development version of Ghostscript can now read PDF files with Brotli compression. File creation functionality is underway. The next official Ghostscript release is scheduled for August this year, but the source is available now from github.com/ArtifexSoftware/Ghostpdl.
I'm no fan of Adobe, but it is not that hard to add brotli support given that it is open. Probably can be added by AI without much difficulty - it is a simple feature. I think compared to the ton of other complex features PDF has, this is an easy one.
ericpauley|1 month ago
“The PDF Association operates under a strict principle—any new feature must work seamlessly with existing readers” followed by introducing compression as a breaking change in the same paragraph.
All this for brotli… on a read-many format like pdf zstd’s decompression speed is a much better fit.
xxs|1 month ago
Brotli w/o a custom dictionary is a weird choice to begin with.
mmooss|1 month ago
Imagine a sales meeting where someone pitched that to you. They have to be joking, right?
I have no objection to adding Brotli, but I hope they take the compatability more seriously. You may need readers to deploy it for a long time - ten years? - before you deploy it in PDF creation tools.
spider-mario|1 month ago
brotli decompression is already plenty fast. For PDFs, zstd’s advantage in decompression speed is academic.
deepsun|1 month ago
Here's discussion by brotli's and zstd's staff:
https://news.ycombinator.com/item?id=19678985
bhouston|1 month ago
Something like this:
https://developer.chrome.com/blog/shared-dictionary-compress...
In my applications, in the area of 3D, I've been moving away from Brotli because it is just so slow for large files. I prefer zstd, because it is like 10x faster for both compression and decompression.
whizzx|1 month ago
So it might land in the spec once it has proven if offers enough value
Proclus|1 month ago
The standard Brotli dictionary bakes in a ton of assumptions about what the Web looked like in 2015, including not just which HTML tags were particularly common but also such things as which swear words were trendy.
It doesn't seem reasonable to think that PDFs have symbol probabilities remotely similar to the web corpus Google used to come up with that dictionary.
On top of that, it seems utterly daft to be baking that into a format which is expected to fit archival use cases and thus impose that 2015 dictionary on PDF readers for a century to come.
I too would strongly prefer that they use zstd.
bobpaw|1 month ago
Am I missing something? Adoption will take a long time if you can't be confident the receiver of a document or viewers of a publication will be able to open the file.
whizzx|1 month ago
Because I'm doing the work to patch in support across different viewers to help adoption grow. And once the big opensource ones ship it pdfjs, poppler, pdfium, adoption can quickly rise.
nialse|1 month ago
superkuh|1 month ago
>"Please wait... If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document. You can upgrade to the latest version of Adobe Reader for Windows®, Mac, or Linux® by visiting http://www.adobe.com/go/reader_download. For more assistance with Adobe Reader visit http://www.adobe.com/go/acrreader. Windows is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries. Mac is a trademark of Apple Inc., registered in the United States and other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries."
kayodelycaon|1 month ago
ndriscoll|1 month ago
wongarsu|1 month ago
Someone|1 month ago
- when jumping from page to page, you won’t have to decompress the entire file
eru|1 month ago
Though we might still want to restrict the subset of PostScript that we allow. The full language might be a bit too general to take from untrusted third parties.
ksec|1 month ago
PunchyHamster|1 month ago
HackerThemAll|1 month ago
"Brotli is a compression algorithm developed by Google."
They have no idea about Zstandard nor ANS/FSE comparing it with LZ77.
Sheer incompetence.
gcr|1 month ago
I keep a bunch of comics in PDF but JPEG-XL is by far the best way to enjoy them in terms of disk space.
Bolwin|1 month ago
whinvik|1 month ago
But reading the article I realized PDFs have become ubiquitous because of its insistence on backwards compatibility. Maybe for some things it's good to move this slow.
jhealy|1 month ago
The PDF format is versioned, and in the past new versions have introduced things like new types of encryption. It’s quite probable that a v1.7 compliant PDF won’t open on a reader app written when v1.3 was the latest standard.
nbevans|1 month ago
cess11|1 month ago
If size was important to users then it wouldn't be so common that systems providers crap out huge PDF files consisting mainly of layout junk 'sophistication' with rounded borders and whatnot.
The PDF/A stuff I've built stays under 1 MB for hundreds of pages of information, because it's text placed in a typographically sensible manner.
noname120|1 month ago
h4x0rr|1 month ago
F3nd0|1 month ago
[1] https://news.ycombinator.com/item?id=46035817
avalys|1 month ago
jeffbee|1 month ago
delfinom|1 month ago
ISO is pay to play so :shrug:
whizzx|1 month ago
So your comment is a falsehood
lmz|1 month ago
https://pdfa.org/brotli-compression-coming-to-pdf/
> As of March 2025, the current development version of MuPDF now supports reading PDF files with Brotli compression. The source is available from github.com/ArtifexSoftware/mupdf, and will be included as an experimental feature in the upcoming 1.26.0 release.
> Similarly, the latest development version of Ghostscript can now read PDF files with Brotli compression. File creation functionality is underway. The next official Ghostscript release is scheduled for August this year, but the source is available now from github.com/ArtifexSoftware/Ghostpdl.
bhouston|1 month ago
vgtftf|1 month ago
[deleted]