I heard of pigz in the discussions following my interview of Yann Collet, creator of LZ4 and zstd.
If you'll excuse the plug, here is the LZ4 story:
Yann was bored and working as a project manager. So he started working on a game for his old HP 48 graphing calculator.
Eventually, this hobby led him to revolutionize the field of data compression, releasing LZ4, ZStandard, and Finite State Entropy coders.
His code ended up everywhere: in games, databases, file systems, and the Linux Kernel because Yann built the world's fastest compression algorithms. And he got started just making a fun game for a graphing calculator he'd had since high school.
Side note: In the 1990s, everyone in my engineering school had an HP48 calculator. There were a healthy selection of pretty decent games available.
One fine day, I finished my physics exam an hour early and so opened up an enjoyable game on my calculator. 45 minutes went by and so I went up and handed in my paper. It was at this point that the professor noted, “were you planning on leaving the second page blank?”
One wild thing is how much performance wins were available compared to ZLib. Pigz is parrellel, but what if you just had a better way to compress and decompress than DEFLATE?
When zstd came out – and Brotli before it to a certain extent – they were 3x faster than ZLib with a slightly higher compression ratio. You'd think that such performance jumps in something as well explored as data compression would be hard to come by. We weren't that close to the efficiency frontier.
Just listened to that episode, what a great story. The dry way he tells how he unexpectedly and almost accidentally transitioned from a project manager to a software engineer is really a treat. Thanks for your podcast!
Fastest open source compression algorithms. RAD game tools have proprietary ones that are faster and have better compression ratios, but since you have to pay for a license, they will never be widespread.
This episode was fascinating. I had heard of LZ4 but not Zstd. It spurred me to make changes to our system at work that are reducing file sizes by as much as 25%. It’s great to have a podcast in which I learn practical stuff!
GORDON BELL, ADAM! That was a great episode. The most amazing thing to me was how this guy was just messing around, a compression hobbyist, if you will - and then he is being courted by FAANG companies. He just walked into it, almost by accident.
I work in VMWare Fusion on a Mac, in a Mint guest OS, and zipping these huge instances for backup will take forever with a single core. Pigz punishes all 12 cores on my Mac mini and saves me a ton of time.
I loved this episode, it was very engaging to the very end, I wish there were more episodes! (I already listened to all of them so far)
Thank you for doing this podcast!
That was a great read! Very inspiring to hear about a near-middle-age person keeping the flame stoked on a little side hobby, and having it turn into something world-changing. So cool!
If you are interested in optimizing parallel decompression and you happen to have a suitable NVIDIA GPU, GDeflate [1] is interesting. The target market for this is PC games using DirectStorage to quickly load game assets. The graph in [1] shows DirectStorage maxing out the throughput of a PCIe Gen 3 drive at about 3 GiB/s when compression is not used. When GPU GDeflate is used, the effective rate hits 12 GiB/s.
If you have suitable hardware running Windows, you can try this out for yourself using Microsoft's DirectStorage GPU decompression benchmark [2].
A reference implementation of a single threaded compressor and multi (CPU) threaded decompressor can be found at [3]. It is Apache-2 licensed.
I assume this is for decompressing multiple independent deflate streams in parallel?
What's the throughput if you only have a single stream? I realise this is the unhappy-case for GPU acceleration, hence my question! (I've been thinking about some approaches to parallelize decompression of a single stream, it's not easy)
John Carmack just had a tweet today on this problem:
>I started a tar with bzip command on a big directory, and it has been running for two days. Of course, it is only using 1.07 cores out of the 128 available. The Unix pipeline tool philosophy often isn’t aligned with parallel performance.
I wasn't aware the Unix philosophy was to not use multithreading on large jobs that can be parallelized.
You can complain about philosophies but this is just using the wrong tool for the job. Complain about bzip if you feel the bzip authors should have made multithreaded implementation for you.
With all respect to John Carmack (and it is really a lot of respect!) I'm surprised he seems unaware of pbzip2? It's a parallel implementation scales almost linearly with the amount of cores, and has been around since ~2010, so it's not yet old enough to drive, but anyone dealing with bzip2'ing large amounts of data should have discovered it long ago.
And yes, use zstandard (or xz, where the default binary in your distro is already multithreaded) where you can.
But pigz shows that the unix pipeline philosophy works just fine. (of course compressing before tarring is probably better than compressing the tarred file, but that should be pipelinable as well)
How big is that file... I have 2TB files compressed down to ~300GB and gunzip'ing them takes ~2-3 hours. Granted, that's still a long ass time, but not 2-3 days.
If anything, I wonder what kind of hard drive John has. If you're reading them off a network drive backed by tape drums it's probably going to take a while ;P
Of course the problem there is that `tar` is outputting a single stream. You might, in similar situations, start multiple `tar` running on subsets of the input, which pipelines then become fully parallel again.
Unless the recipient of whatever you are compressing absolutely requires gzip, you should not use gzip or pigz.
Instead you should use zstd as it compresses faster, decompresses faster, and yields smaller files. It also supports parallelism (via “-T”) which supplants the pigz use case. There literally are no trade-offs; it is better in every objective way.
I’ve integrated pigz into different build and CI pipelines a few times. Don’t expect wonders since some steps still need to run serially, but a few seconds here and there might still add up to a few minutes on a large build.
Am I reading correctly that Docker just automatically uses pigz if it’s in the system path? I’ve used both for years and had no idea. I’m definitely going to make sure it’s installed in CI pipelines going forward, I know of some bloated image builds it will definitely help with!
Still blows my mind people still use gzip. 20 years ago I was expecting by this point in time for there to be lots of effort put into increasing compression and working towards getting that fast, instead its been a push for speed. It makes sense with how the internet has changed. These days gzip isn't even in the top 100 as far as compression goes, hell even something like RAR or 7zip are far back compared to the best.
Take something like enwik8 (100megs), gzip will get that down to 36megs, with LZMA down to ~24-25. The top of the line stuff will get it down to the ~15meg range. Thats a huge difference.
I remember moving a HUGE mysql table (>500GB) with a pipe chain of mysqldump > pigz > scp (compression disabled) > pigz > mysql
If you've ever screwed around with mysqldump -> tar -> scp -> untar -> mysql<, you'll appreciate the speedup on this, in cases where you're setting up a slave and want to have the freshest possible data before kicking off binlog replication - this is the best.
I update/upgrade/switch over to zstd (from older compressors) wherever I'm updating or revamping any of my data pipelines. Looks like a win^3 for me:
1) It's probably either in the top-X or #1 in any of the usual compression metrics size/speed/convenience/ease etc.
2) Can do --rsyncable and create rsync friendly files at tiny size cost.
3) In the rare occasion I need there's $ zstd -c file1 >file.zst; zstd -c file2 >>file.zst, then $ zstd -dc file.zst will produce out $ cat file{1,2}
Shortly after submitting a PR the code went through major surgery, and my patch then needed a similar amount of surgery. Oracle then whacked most of the Solaris org, and I don’t think this ever got updated to work with the current pigz.
> exploits multiple processors and multiple cores to the hilt when compressing data
As s side note, this isn't always desirable for this class of coders. In some scenarios (like web server) you might want to favor throughput over response time.
Similarly, for zipping files in JS, I have coded the possibility to compress zip files on several cores in zip.js [1]. The approach is simpler as it consists of compressing the entries in parallel. It still offers a significant performance gain though when compressing multiple files in a zip file, which is often the nominal case.
While pigz is great as a general replacement for gzip, for most purposes nowadays either LZ4 or zstd are better choices for fast compression+decompression.
I always install aria2c and set package manager + wget to use it for any system file downloads... basically it will open X connections to download files based on the file size and in the process give a pretty notable speed up on those slow single-connection package repos or download URLs. For reference it can cut 2-3 minutes off of an ubuntu dist-upgrade and even more if you're on a fast-but-far connection.
[+] [-] adamgordonbell|2 years ago|reply
If you'll excuse the plug, here is the LZ4 story:
Yann was bored and working as a project manager. So he started working on a game for his old HP 48 graphing calculator.
Eventually, this hobby led him to revolutionize the field of data compression, releasing LZ4, ZStandard, and Finite State Entropy coders.
His code ended up everywhere: in games, databases, file systems, and the Linux Kernel because Yann built the world's fastest compression algorithms. And he got started just making a fun game for a graphing calculator he'd had since high school.
https://corecursive.com/data-compression-yann-collet/
[+] [-] ttul|2 years ago|reply
One fine day, I finished my physics exam an hour early and so opened up an enjoyable game on my calculator. 45 minutes went by and so I went up and handed in my paper. It was at this point that the professor noted, “were you planning on leaving the second page blank?”
Oh.
[+] [-] adamgordonbell|2 years ago|reply
When zstd came out – and Brotli before it to a certain extent – they were 3x faster than ZLib with a slightly higher compression ratio. You'd think that such performance jumps in something as well explored as data compression would be hard to come by. We weren't that close to the efficiency frontier.
[+] [-] tda|2 years ago|reply
[+] [-] dralley|2 years ago|reply
[+] [-] danking00|2 years ago|reply
[+] [-] papito|2 years ago|reply
I work in VMWare Fusion on a Mac, in a Mint guest OS, and zipping these huge instances for backup will take forever with a single core. Pigz punishes all 12 cores on my Mac mini and saves me a ton of time.
[+] [-] lrobinovitch|2 years ago|reply
[+] [-] lionkor|2 years ago|reply
[+] [-] MetaWhirledPeas|2 years ago|reply
[+] [-] mgerdts|2 years ago|reply
If you have suitable hardware running Windows, you can try this out for yourself using Microsoft's DirectStorage GPU decompression benchmark [2].
A reference implementation of a single threaded compressor and multi (CPU) threaded decompressor can be found at [3]. It is Apache-2 licensed.
1. https://developer.nvidia.com/blog/accelerating-load-times-fo...
2. https://github.com/microsoft/DirectStorage/tree/main/Samples...
3. https://github.com/microsoft/DirectStorage/blob/main/GDeflat...
Disclaimer: I work for NVIDIA, have nothing to do with this, and am not speaking for NVIDIA.
Edit: oops, lost the last sentence in the first paragraph during an edit.
[+] [-] shaklee3|2 years ago|reply
[+] [-] Retr0id|2 years ago|reply
I assume this is for decompressing multiple independent deflate streams in parallel?
What's the throughput if you only have a single stream? I realise this is the unhappy-case for GPU acceleration, hence my question! (I've been thinking about some approaches to parallelize decompression of a single stream, it's not easy)
[+] [-] gumballindie|2 years ago|reply
[+] [-] nwoli|2 years ago|reply
[+] [-] jimmySixDOF|2 years ago|reply
>I started a tar with bzip command on a big directory, and it has been running for two days. Of course, it is only using 1.07 cores out of the 128 available. The Unix pipeline tool philosophy often isn’t aligned with parallel performance.
https://twitter.com/ID_AA_Carmack/status/1656708636570271768...
[+] [-] Aachen|2 years ago|reply
You can complain about philosophies but this is just using the wrong tool for the job. Complain about bzip if you feel the bzip authors should have made multithreaded implementation for you.
[+] [-] gruturo|2 years ago|reply
And yes, use zstandard (or xz, where the default binary in your distro is already multithreaded) where you can.
[+] [-] gpderetta|2 years ago|reply
[+] [-] sllabres|2 years ago|reply
http://compression.great-site.net/pbzip2/
which should solve the 'my cores are idle' issue.
[+] [-] res0nat0r|2 years ago|reply
https://linux.die.net/man/1/pbzip2
[+] [-] aigoochamna|2 years ago|reply
If anything, I wonder what kind of hard drive John has. If you're reading them off a network drive backed by tape drums it's probably going to take a while ;P
[+] [-] h2odragon|2 years ago|reply
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] jandrese|2 years ago|reply
[+] [-] stjohnswarts|2 years ago|reply
[+] [-] boomboomsubban|2 years ago|reply
[+] [-] ctur|2 years ago|reply
Instead you should use zstd as it compresses faster, decompresses faster, and yields smaller files. It also supports parallelism (via “-T”) which supplants the pigz use case. There literally are no trade-offs; it is better in every objective way.
In 2023, friends don’t let friends use gzip.
[+] [-] Twirrim|2 years ago|reply
zstd & xz support the "-T" argument for setting thread count. If you pass "-T 0" it will attempt to detect and use a thread per physical core.
[+] [-] osivertsson|2 years ago|reply
I’ve integrated pigz into different build and CI pipelines a few times. Don’t expect wonders since some steps still need to run serially, but a few seconds here and there might still add up to a few minutes on a large build.
[+] [-] macNchz|2 years ago|reply
[+] [-] mikepurvis|2 years ago|reply
[+] [-] 2809|2 years ago|reply
Take something like enwik8 (100megs), gzip will get that down to 36megs, with LZMA down to ~24-25. The top of the line stuff will get it down to the ~15meg range. Thats a huge difference.
[+] [-] stjohnswarts|2 years ago|reply
[+] [-] tgtweak|2 years ago|reply
If you've ever screwed around with mysqldump -> tar -> scp -> untar -> mysql<, you'll appreciate the speedup on this, in cases where you're setting up a slave and want to have the freshest possible data before kicking off binlog replication - this is the best.
[+] [-] ljosifov|2 years ago|reply
[+] [-] gpderetta|2 years ago|reply
This is of course more of a problem of the gz format than pigz although last time I looked hacks are possible to parallelize decompression.
[+] [-] mgerdts|2 years ago|reply
https://github.com/oracle/solaris-userland/blob/master/compo...
Shortly after submitting a PR the code went through major surgery, and my patch then needed a similar amount of surgery. Oracle then whacked most of the Solaris org, and I don’t think this ever got updated to work with the current pigz.
[+] [-] draxil|2 years ago|reply
I've got news for you buddy :)
[+] [-] ComodoHacker|2 years ago|reply
As s side note, this isn't always desirable for this class of coders. In some scenarios (like web server) you might want to favor throughput over response time.
[+] [-] sllabres|2 years ago|reply
[1] https://community.ibm.com/community/user/power/blogs/brian-v...
[+] [-] unknown|2 years ago|reply
[deleted]
[+] [-] tysam_and|2 years ago|reply
Hopefully my tweet response was the one to tip you off! ;P Though in all likelihood I'm quite sure a number of people commented pointing at pigz.
Hats off to all who write extraordinarily performant multithreaded versions of originally-slow-at-scale UNIX system tools.
[+] [-] kunley|2 years ago|reply
I am only disappointed with this one: "It is not pronounced like the plural of pig."
Me and my colleagues always pronounced it like pigs, die Schweine - and it was so much fun!
[+] [-] gildas|2 years ago|reply
[1] https://github.com/gildas-lormeau/zip.js
[+] [-] alfalfasprout|2 years ago|reply
[+] [-] vr46|2 years ago|reply
[+] [-] cmckn|2 years ago|reply
[+] [-] tgtweak|2 years ago|reply
[+] [-] PaulHoule|2 years ago|reply