Pigz: Parallel gzip for modern multi-processor, multi-core machines

[+] adamgordonbell|2 years ago|reply

I heard of pigz in the discussions following my interview of Yann Collet, creator of LZ4 and zstd.

If you'll excuse the plug, here is the LZ4 story:

Yann was bored and working as a project manager. So he started working on a game for his old HP 48 graphing calculator.

Eventually, this hobby led him to revolutionize the field of data compression, releasing LZ4, ZStandard, and Finite State Entropy coders.

His code ended up everywhere: in games, databases, file systems, and the Linux Kernel because Yann built the world's fastest compression algorithms. And he got started just making a fun game for a graphing calculator he'd had since high school.

https://corecursive.com/data-compression-yann-collet/

[+] ttul|2 years ago|reply

Side note: In the 1990s, everyone in my engineering school had an HP48 calculator. There were a healthy selection of pretty decent games available.

One fine day, I finished my physics exam an hour early and so opened up an enjoyable game on my calculator. 45 minutes went by and so I went up and handed in my paper. It was at this point that the professor noted, “were you planning on leaving the second page blank?”

Oh.

[+] adamgordonbell|2 years ago|reply

One wild thing is how much performance wins were available compared to ZLib. Pigz is parrellel, but what if you just had a better way to compress and decompress than DEFLATE?

When zstd came out – and Brotli before it to a certain extent – they were 3x faster than ZLib with a slightly higher compression ratio. You'd think that such performance jumps in something as well explored as data compression would be hard to come by. We weren't that close to the efficiency frontier.

[+] tda|2 years ago|reply

Just listened to that episode, what a great story. The dry way he tells how he unexpectedly and almost accidentally transitioned from a project manager to a software engineer is really a treat. Thanks for your podcast!

[+] dralley|2 years ago|reply

Fastest open source compression algorithms. RAD game tools have proprietary ones that are faster and have better compression ratios, but since you have to pay for a license, they will never be widespread.

[+] danking00|2 years ago|reply

This episode was fascinating. I had heard of LZ4 but not Zstd. It spurred me to make changes to our system at work that are reducing file sizes by as much as 25%. It’s great to have a podcast in which I learn practical stuff!

[+] papito|2 years ago|reply

GORDON BELL, ADAM! That was a great episode. The most amazing thing to me was how this guy was just messing around, a compression hobbyist, if you will - and then he is being courted by FAANG companies. He just walked into it, almost by accident.

I work in VMWare Fusion on a Mac, in a Mint guest OS, and zipping these huge instances for backup will take forever with a single core. Pigz punishes all 12 cores on my Mac mini and saves me a ton of time.

[+] lrobinovitch|2 years ago|reply

All the episodes of your podcast are excellent. Thank you for making it and keep it up! Just became a Patreon supporter, been meaning to for a while.

[+] lionkor|2 years ago|reply

I loved this episode, it was very engaging to the very end, I wish there were more episodes! (I already listened to all of them so far) Thank you for doing this podcast!

[+] MetaWhirledPeas|2 years ago|reply

That was a great read! Very inspiring to hear about a near-middle-age person keeping the flame stoked on a little side hobby, and having it turn into something world-changing. So cool!

[+] mgerdts|2 years ago|reply

If you are interested in optimizing parallel decompression and you happen to have a suitable NVIDIA GPU, GDeflate [1] is interesting. The target market for this is PC games using DirectStorage to quickly load game assets. The graph in [1] shows DirectStorage maxing out the throughput of a PCIe Gen 3 drive at about 3 GiB/s when compression is not used. When GPU GDeflate is used, the effective rate hits 12 GiB/s.

If you have suitable hardware running Windows, you can try this out for yourself using Microsoft's DirectStorage GPU decompression benchmark [2].

A reference implementation of a single threaded compressor and multi (CPU) threaded decompressor can be found at [3]. It is Apache-2 licensed.

1. https://developer.nvidia.com/blog/accelerating-load-times-fo...

2. https://github.com/microsoft/DirectStorage/tree/main/Samples...

3. https://github.com/microsoft/DirectStorage/blob/main/GDeflat...

Disclaimer: I work for NVIDIA, have nothing to do with this, and am not speaking for NVIDIA.

Edit: oops, lost the last sentence in the first paragraph during an edit.

[+] shaklee3|2 years ago|reply

Nvidia also has nvcomp: https://github.com/NVIDIA/nvcomp

[+] Retr0id|2 years ago|reply

> the effective rate hits 12 GiB/s

I assume this is for decompressing multiple independent deflate streams in parallel?

What's the throughput if you only have a single stream? I realise this is the unhappy-case for GPU acceleration, hence my question! (I've been thinking about some approaches to parallelize decompression of a single stream, it's not easy)

[+] gumballindie|2 years ago|reply

Posted a question (now deleted) asking if it could be done on the gpu not noticing you already posted this. Thanks for sharing.

[+] nwoli|2 years ago|reply

I wonder if and if not why not ML uses this to speed up training

[+] jimmySixDOF|2 years ago|reply

John Carmack just had a tweet today on this problem:

>I started a tar with bzip command on a big directory, and it has been running for two days. Of course, it is only using 1.07 cores out of the 128 available. The Unix pipeline tool philosophy often isn’t aligned with parallel performance.

https://twitter.com/ID_AA_Carmack/status/1656708636570271768...

[+] Aachen|2 years ago|reply

I wasn't aware the Unix philosophy was to not use multithreading on large jobs that can be parallelized.

You can complain about philosophies but this is just using the wrong tool for the job. Complain about bzip if you feel the bzip authors should have made multithreaded implementation for you.

[+] gruturo|2 years ago|reply

With all respect to John Carmack (and it is really a lot of respect!) I'm surprised he seems unaware of pbzip2? It's a parallel implementation scales almost linearly with the amount of cores, and has been around since ~2010, so it's not yet old enough to drive, but anyone dealing with bzip2'ing large amounts of data should have discovered it long ago.

And yes, use zstandard (or xz, where the default binary in your distro is already multithreaded) where you can.

[+] gpderetta|2 years ago|reply

But pigz shows that the unix pipeline philosophy works just fine. (of course compressing before tarring is probably better than compressing the tarred file, but that should be pipelinable as well)

[+] sllabres|2 years ago|reply

There is a _parallel_ bzip2:

http://compression.great-site.net/pbzip2/

which should solve the 'my cores are idle' issue.

[+] res0nat0r|2 years ago|reply

He should just be using pbzip2 :)

https://linux.die.net/man/1/pbzip2

[+] aigoochamna|2 years ago|reply

How big is that file... I have 2TB files compressed down to ~300GB and gunzip'ing them takes ~2-3 hours. Granted, that's still a long ass time, but not 2-3 days.

If anything, I wonder what kind of hard drive John has. If you're reading them off a network drive backed by tape drums it's probably going to take a while ;P

[+] h2odragon|2 years ago|reply

Of course the problem there is that `tar` is outputting a single stream. You might, in similar situations, start multiple `tar` running on subsets of the input, which pipelines then become fully parallel again.

[+] unknown|2 years ago|reply

[deleted]

[+] jandrese|2 years ago|reply

Shame he didn't discover pbzip2 before starting that job.

[+] stjohnswarts|2 years ago|reply

There are several programs to solve this problem. He just wanted to complain and write something built around that.

[+] boomboomsubban|2 years ago|reply

Is the .07 of a core a margin of error or some kind of status report done on a different core?

[+] ctur|2 years ago|reply

Unless the recipient of whatever you are compressing absolutely requires gzip, you should not use gzip or pigz.

Instead you should use zstd as it compresses faster, decompresses faster, and yields smaller files. It also supports parallelism (via “-T”) which supplants the pigz use case. There literally are no trade-offs; it is better in every objective way.

In 2023, friends don’t let friends use gzip.

[+] Twirrim|2 years ago|reply

Similarly, for bzip2 there is pbzip2 (http://compression.great-site.net/pbzip2/?i=1).

zstd & xz support the "-T" argument for setting thread count. If you pass "-T 0" it will attempt to detect and use a thread per physical core.

[+] osivertsson|2 years ago|reply

Useful with Docker, see https://github.com/moby/moby/pull/35697

I’ve integrated pigz into different build and CI pipelines a few times. Don’t expect wonders since some steps still need to run serially, but a few seconds here and there might still add up to a few minutes on a large build.

[+] macNchz|2 years ago|reply

Am I reading correctly that Docker just automatically uses pigz if it’s in the system path? I’ve used both for years and had no idea. I’m definitely going to make sure it’s installed in CI pipelines going forward, I know of some bloated image builds it will definitely help with!

[+] mikepurvis|2 years ago|reply

I built a custom dpkg with parallel xz for speeding up the compression of large omni style deb packages. Totally worth it.

[+] 2809|2 years ago|reply

Still blows my mind people still use gzip. 20 years ago I was expecting by this point in time for there to be lots of effort put into increasing compression and working towards getting that fast, instead its been a push for speed. It makes sense with how the internet has changed. These days gzip isn't even in the top 100 as far as compression goes, hell even something like RAR or 7zip are far back compared to the best.

Take something like enwik8 (100megs), gzip will get that down to 36megs, with LZMA down to ~24-25. The top of the line stuff will get it down to the ~15meg range. Thats a huge difference.

[+] stjohnswarts|2 years ago|reply

It shouldn't, it's still plenty fast for 95% of the stuff out there that you wanna make a zipped archive of. It's pretty much installed everywhere too

[+] tgtweak|2 years ago|reply

I remember moving a HUGE mysql table (>500GB) with a pipe chain of mysqldump > pigz > scp (compression disabled) > pigz > mysql

If you've ever screwed around with mysqldump -> tar -> scp -> untar -> mysql<, you'll appreciate the speedup on this, in cases where you're setting up a slave and want to have the freshest possible data before kicking off binlog replication - this is the best.

[+] ljosifov|2 years ago|reply

I update/upgrade/switch over to zstd (from older compressors) wherever I'm updating or revamping any of my data pipelines. Looks like a win^3 for me: 1) It's probably either in the top-X or #1 in any of the usual compression metrics size/speed/convenience/ease etc. 2) Can do --rsyncable and create rsync friendly files at tiny size cost. 3) In the rare occasion I need there's $ zstd -c file1 >file.zst; zstd -c file2 >>file.zst, then $ zstd -dc file.zst will produce out $ cat file{1,2}

[+] gpderetta|2 years ago|reply

The issue with pigz is that uncompressing doesn't really parallelize beyond a three stage read/uncompress/write pipeline.

This is of course more of a problem of the gz format than pigz although last time I looked hacks are possible to parallelize decompression.

[+] mgerdts|2 years ago|reply

I implemented parallel decompression a while back. It is in Solaris 11.3 and later.

https://github.com/oracle/solaris-userland/blob/master/compo...

Shortly after submitting a PR the code went through major surgery, and my patch then needed a similar amount of surgery. Oracle then whacked most of the Solaris org, and I don’t think this ever got updated to work with the current pigz.

[+] draxil|2 years ago|reply

> It is not pronounced like the plural of pig.

I've got news for you buddy :)

[+] ComodoHacker|2 years ago|reply

> exploits multiple processors and multiple cores to the hilt when compressing data

As s side note, this isn't always desirable for this class of coders. In some scenarios (like web server) you might want to favor throughput over response time.

[+] sllabres|2 years ago|reply

Hardware accelerated [1] "GZIP Acceleration with AIX on Power Systems " pigz

[1] https://community.ibm.com/community/user/power/blogs/brian-v...

[+] unknown|2 years ago|reply

[deleted]

[+] tysam_and|2 years ago|reply

I see someone else read the Carmack post complaining about single threaded compression performance on Unix.

Hopefully my tweet response was the one to tip you off! ;P Though in all likelihood I'm quite sure a number of people commented pointing at pigz.

Hats off to all who write extraordinarily performant multithreaded versions of originally-slow-at-scale UNIX system tools.

[+] kunley|2 years ago|reply

It's a very useful piece of software for over a decade.

I am only disappointed with this one: "It is not pronounced like the plural of pig."

Me and my colleagues always pronounced it like pigs, die Schweine - and it was so much fun!

[+] gildas|2 years ago|reply

Similarly, for zipping files in JS, I have coded the possibility to compress zip files on several cores in zip.js [1]. The approach is simpler as it consists of compressing the entries in parallel. It still offers a significant performance gain though when compressing multiple files in a zip file, which is often the nominal case.

[1] https://github.com/gildas-lormeau/zip.js

[+] alfalfasprout|2 years ago|reply

While pigz is great as a general replacement for gzip, for most purposes nowadays either LZ4 or zstd are better choices for fast compression+decompression.

[+] vr46|2 years ago|reply

Because I can can never remember to use pigz I have to have this in my dotfiles:

  function ccm() {
    tar -cf - $1 | pigz > $1.tar.gz
  }

[+] cmckn|2 years ago|reply

Containerd will utilize unpigz if it’s on your PATH, thank me later: https://github.com/containerd/containerd/blob/main/archive/c...

[+] tgtweak|2 years ago|reply

I always install aria2c and set package manager + wget to use it for any system file downloads... basically it will open X connections to download files based on the file size and in the process give a pretty notable speed up on those slow single-connection package repos or download URLs. For reference it can cut 2-3 minutes off of an ubuntu dist-upgrade and even more if you're on a fast-but-far connection.

[+] PaulHoule|2 years ago|reply

My understanding is that this all works because you can concat two gzip files and the outcome is the same as concatenating the original files

  $ gzip -c a > a.gz
  $ gzip -c b > b.gz
  $ cat a b > c1
  $ cat a.gz b.gz> c.gz
  $ gzip -dc c.gz > c2
  $ cmp c1 c2
  [no output, files match]

190 comments