What a great historical summary. Compression has moved on now but having grown up marveling at PKZip and maximizing usable space on very early computers, as well as compression in modems (v42bis ftw!), this field has always seemed magical.
These days it generally is better to prefer Zstandard to zlib/gzip for many reasons. And if you need seekable format, consider squashfs as a reasonable choice. These stand on the shoulders of the giants of zlib and zip but do indeed stand much higher in the modern world.
I had forgotten about modem compression. Back in the BBS days when you had to upload files to get new files, you usually had a ratio (20 bytes download for every byte you uploaded). I would always use the PKZIP no compression option for the archive to upload because Z-Modem would take care of compression over the wire. So I didn't burn my daily time limit by uploading a large file and I got more credit for my download ratios.
> These days it generally is better to prefer Zstandard to zlib/gzip for many reasons.
I'd agree for new applications, but just like MP3, .gz files (and by extension .tar.gz/.tgz) and zlib streams will probably be around for a long time for compatibility reasons.
I think zlib/gzip still has its place these days. It's still a decent choice for most use cases. If you don't know what usage patterns your program will see, zlib still might be a good choice. Plus, it's supported virtually everywhere, which makes it interesting for long-term storage. Often, using one of the modern alternatives is not worth the hassle.
Imo all file formats should be concatenable when possible. Thankfully ZStandard purposefully also supports this, which is a huge boon for combining files.
Fun fact, tar-files are also (semi-) concatenable, you'll just need to `-i` when decompressing. This also means compressed (using gz/zstd) tarfiles are also (semi-)concatenable!
WARC files (used by the Internet Archive to power the Wayback machine, among others) use this trick too to have a a compressed file format that is seek-able to individual HTTP request/response records
Is there a limit in the default gunzip implementation? I'm aware of the concept of ZIP/tar bombs, but I wouldn't have expected gunzip to ever produce more than one output file, at least when invoked without options.
Interesting -- I did not realize that the zip format supports lzma, bzip2, and zstd. What software supports those compression methods? Can Windows Explorer read zip files produced with those compression methods?
(I have been using 7zip for about 15 years to produce archive files that have an index and can quickly extract a single file and can use multiple cores for compression, but I would love to have an alternative, if one exists).
What’s even more sad is that the SO community has since consequently destroyed SO as the home for this type of info. This post would now be considered off topic as it’s “not a good format for a Q&A site”. You’d never see it happen today. Truly sad.
That's disallowed on Wikipedia. There, you must reference some "source". That "source" doesn't need to be reliable or correct, it just needs to be some random website that's not the actual person. First sources are disallowed.
Is there an archive format that supports appending diff's of an existing file, so that multiple versions of the same file are stored? PKZIP has a proprietary extension (supposedly), but I couldn't find any open version of that.
(I was thinking of a creating a version control system whose .git directory equivalent is basically an archive file that can easily be emailed, etc.)
ctur|2 years ago
These days it generally is better to prefer Zstandard to zlib/gzip for many reasons. And if you need seekable format, consider squashfs as a reasonable choice. These stand on the shoulders of the giants of zlib and zip but do indeed stand much higher in the modern world.
michaelrpeskin|2 years ago
I was a silly kid.
lxgr|2 years ago
I'd agree for new applications, but just like MP3, .gz files (and by extension .tar.gz/.tgz) and zlib streams will probably be around for a long time for compatibility reasons.
pvorb|2 years ago
emmelaich|2 years ago
koolba|2 years ago
It can be very useful: https://github.com/google/crfs#introducing-stargz
ericpauley|2 years ago
Fun fact, tar-files are also (semi-) concatenable, you'll just need to `-i` when decompressing. This also means compressed (using gz/zstd) tarfiles are also (semi-)concatenable!
billyhoffman|2 years ago
lxgr|2 years ago
Is there a limit in the default gunzip implementation? I'm aware of the concept of ZIP/tar bombs, but I wouldn't have expected gunzip to ever produce more than one output file, at least when invoked without options.
cout|2 years ago
(I have been using 7zip for about 15 years to produce archive files that have an index and can quickly extract a single file and can use multiple cores for compression, but I would love to have an alternative, if one exists).
ForkMeOnTinder|2 years ago
pixl97|2 years ago
melagonster|2 years ago
dcow|2 years ago
miyuru|2 years ago
https://stackexchange.com/users/1136690/mark-adler#top-answe...
stavros|2 years ago
dustypotato|2 years ago
> This post is packed with so much history and information that I feel like some citations need be added
> I am the reference
(extracted a part of the conversation)
tyingq|2 years ago
https://en.wikipedia.org/wiki/Mark_Adler
gmgmgmgmgm|2 years ago
whalesalad|2 years ago
FartyMcFarter|2 years ago
HexDecOctBin|2 years ago
(I was thinking of a creating a version control system whose .git directory equivalent is basically an archive file that can easily be emailed, etc.)
pizza|2 years ago
kissgyorgy|2 years ago
- https://github.com/onekey-sec/unblob/blob/main/unblob/handle...
- https://github.com/onekey-sec/unblob/blob/main/unblob/handle...
- https://github.com/onekey-sec/unblob/blob/main/unblob/handle...
unknown|2 years ago
[deleted]
wiredfool|2 years ago
o11c|2 years ago
exposition|2 years ago
Disclaimer: I'm the author.
Dwedit|2 years ago
raggi|2 years ago
Salty form: They're all quite slow compared to modern competitors.
levzettelin|2 years ago
readyplayernull|2 years ago
FOO=$(tar cf - folderToCompress | gzip | base64)
echo $FOO | base64 - d | zcat | tar xf -
encom|2 years ago
unknown|2 years ago
[deleted]
eYrKEC2|2 years ago
[deleted]