WASM compression benchmarks and the cost of missing compression APIs

miohtama|3 years ago

Related to compressing data before storing on SSD:

Blosc - faster than memcpy()

On right circumstances Blosc is so fast that even speed ups reading data from RAM (read less, decompress in L1 and L2 caches)

guiriduro|3 years ago

With RAM sizes not keeping up with Moore's Law, it would make sense to have superfast compression as a pervasive, transparent and Hardware-accelerated feature of a modern OS. SIMD provides hw acceleration to some extent, and is clearly well used by the compression algos, its time that compression becomes 'just part of the furniture' as we're now at the stage where compression is fast enough that it can couple with reduced latency (smaller compressed blocks to load into caches reduces total memory latency) to deliver multiplicative speedups.

unknown|3 years ago

[deleted]

dorfsmay|3 years ago

Note that both the x and the y axis for Chrome vs Firefox are significantly different.

bufferoverflow|3 years ago

Also note that it's not clear if a larger bubble is better compression or worse.

modeless|3 years ago

The web platform could really use zstd everywhere. As a content encoding for HTTP, and as an API available to JS/wasm. It's really clumsy to use a wasm version of zstd in a JS application because it's hard to get data in and out of wasm efficiently.

capableweb|3 years ago

I'd be happy if just CompressionStream could be available as an API in all browsers already, so I could at least use gzip... Once available, I'm guessing zstd could be easier to add in later, or lz4.

creatonez|3 years ago

Seconded this. I wanted to have a way to stream packets of some really repetitive data to javascript in the browser, and zstd-compressed json with preset dictionary really would have been the best way. Ran into a lot of problems getting zstd ports in web browser working nicely. Ended up just using a handcrafted binary format.

royjacobs|3 years ago

Just to add a data point, I've written a tiny Rust library [0] that can compile to WASM, that is quite effective for smaller payloads. It is based on PAQ so memory usage explodes a bit once you start compressing large files, but on smaller files it is super competitive.

As far as I can tell the blog author didn't include the 91MB file they used to test the compressors with, so I couldn't give it a try to see how it holds up. I guess 91MB would be too big anyway.

[0] https://github.com/datatrash/mashi

zokier|3 years ago

just for scale it would have been fun to see zstd/lz4 native (=non-wasm) performance

Const-me|3 years ago

I wonder how this compares to the OS built-in NTFS compression?

Windows shell has "Compress contents to save disk space" checkbox in folder properties. Usually, that compressed flag is inherited by new files created in a folder with that checkbox. OP can probably set the flag on Default\IndexedDB or Default\Service Worker folder and see whether this changes the results of that IO benchmark.

binarycrusader|3 years ago

The built-in NTFS filesystem compression is fairly limited in that it optimizes for performance over compression ratio; the more optimal compression scheme that's built-in is WOF:

https://devblogs.microsoft.com/oldnewthing/20190618-00/ https://learn.microsoft.com/en-us/windows/win32/api/wofapi/n... https://learn.microsoft.com/en-us/windows-hardware/manufactu...

It's not transparent like the filesystem compression, but it offers far more potentially beneficial compression algorithms such as LZX.

HyperSane|3 years ago

I have run SSD benchmarks with and without it and it usually makes reading faster and writing slower.

Jap2-0|3 years ago

A (hopefully) intriguing tangent: at the linked https://bench.nickb.dev/, a 10x increase in iterations on the allocation benchmark results in a change from 28ms to 280ms in Firefox but 44ms to 2800ms in Edge.

unknown|3 years ago

[deleted]

mkesper|3 years ago

Note to author: 120ms is not 66% less than 200ms but 40%.

21 comments