top | item 27707834

Compressing JSON: Gzip vs. Zstd

29 points| mcraiha | 4 years ago |lemire.me | reply

11 comments

order
[+] IvanK_net|4 years ago|reply
Fun Fact: a person https://github.com/101arrowz recently created a ZSTD decompressor in Javascript (without any dependencies), which is only 7.1 kB (3.6 kB zipped) - here it is: https://unpkg.com/[email protected]/umd/index.js (and it is quite fast, too).
[+] pimterry|4 years ago|reply
Thanks! This looks really useful. I've been using https://www.npmjs.com/package/zstd-codec which simply compiles the official implementation to WASM and wraps it up for JS use. It does include the compressor too, but the end result is quite dramatically larger (3MB unzipped).
[+] PaulHoule|4 years ago|reply
The whole point of gzip was to make the patented "compress" algorithm obsolete and you can't make an algorithm obsolete unless you beat it in both time and space.

Most of the gzip competitors beat it in time (LZ4) or space (LZMA or Bzip) but zstd's aim was always to replace gzip by not extending the time-space tradeoff curve but rather moving the whole curve.

[+] st_goliath|4 years ago|reply
Judging from this and other benchmarks, Zstd does seem to achieve that aim pretty well. Although the library API is IMO a bit weird in some aspects.

I actually did a simple SquashFS benchmark myself a while back[1][2] (including Zstd vs plain zlib, which SquashFS calls "gzip") and came to the same conclusion as the article: Zstd turned out to compress somewhat faster, decompress more than twice as fast and achieved a higher data density. It would be interesting how much the better optimized zlib mentioned in the article could gain on Zstd at least in terms of speed.

Interestingly, in addition to zlib, in my benchmark Zstd also clearly beats LZO in terms of speed and size.

[1] https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...

[2] https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...

[+] chrismorgan|4 years ago|reply
Summary: on Twitter API JSON, zstd is 6% smaller than gzip, and decompresses twice as fast.
[+] formerly_proven|4 years ago|reply
In a tool I wrote zstd is used for essentially all data exchange (inside an SSH connection; SSH itself does compression, but that doesn't support zstd [yet], and connection-level compression is not necessarily a good idea in the first place). It's pretty phenomenal. For stuff like backend JSON API calls I regularly see compression ratios of around 25-30x (for 20-50 MB of JSON) and for bulk data transfers it often exceeds line-rate by quite a bit.

gzip in comparison delivered very-nice but only half as good ratios of around 15x. For the "bulk transfer" case gzip went fairly slowly, it couldn't nearly reach gigabit rates - it still transferred less data, but it took significantly longer than uncompressed and still longer than compressed.

gzip slows down a lot on incompressible data (not as bad as LZMA though), so it's not a good choice as an "always-on" compression. zstd on the other hand handles incompressible data very well. Unless the I/O is quite a bit faster than 100 MB/s, using zstd will very likely only improve things.

[+] borramakot|4 years ago|reply
If I recall, the zstd binary has a mode that is basically "adapt the compression ratio so that compression isn't the bottleneck", but I didn't have a chance to try it. Have you used that?
[+] toastal|4 years ago|reply
With Brotli being supported in a major browsers since 2017, it seems odd that it's missing here for comparison
[+] borramakot|4 years ago|reply
The parent article doesn't seem to be trying to be a fully fleshed out survey of compressors, but I found a separate comparison of zstd and Brotli: https://peazip.github.io/fast-compression-benchmark-brotli-z...

tldr: zstd seems to be somewhat more flexible, generally provides slightly better performance across almost all metrics, and decompresses much more quickly than Brotli.