(no title)
jhj | 1 year ago
https://userweb.cs.txstate.edu/~burtscher/ https://computing.llnl.gov/projects/floating-point-compressi...
but it tends to be very application specific, where there tends to be high correlation / small deltas between neighboring values in a 2d/3d/4d/etc floating point array (e.g., you are compressing neighboring temperature grid points in a PDE weather simulation model; temperature differences in neighboring cells won't differ by that much).
In a lot of other cases (e.g., machine learning) the floating point significand bits (and sometimes the sign bit) tends to be incompressible noise. The exponent is the only thing that is really compressible, and the xor trick does not help you as much because neighboring values could still vary a bit in terms of exponents. An entropy encoder instead works well for that (encode closer to the actual underlying data distribution/entropy), and you also don't depend upon neighboring floats having similar exponents as well.
In 2022, I created dietgpu, a library to losslessly compress/decompress floating point data at up to 400 GB/s on an A100. It uses a general-purpose asymmetric numeral system encoder/decoder on GPU (the first such implementation of general ANS on GPU, predating nvCOMP) for exponent compression.
We have used this to losslessly compress floating point data between GPUs (e.g., over Infiniband/NVLink/ethernet/etc) in training massive ML models to speed up overall wall clock time of training across 100s/1000s of GPUs without changing anything about how the training works (it's lossless compression, it computes the same thing that it did before).
dekhn|1 year ago
One of the key innovations in the AMBER MD engine that made it work OK on cheaper systems was lossless floating point compression. It still impresses me that you can compress floats, send them over MPI, and decompress them, all faster/lower latency than the transport can send the uncompressed data.
jhj|1 year ago
The interconnects are improving at a slower rate in general than compute on the CPU/GPU is and it can be exploited.
mandarax8|1 year ago
dragontamer|1 year ago
I've noticed that array-of-structs form is harder to compress than struct-of-arrays form. This is relevant because a floating-point number is really a struct containing 3 values: 1-bit sign, 8-bit exponent, and 23-bit mantissa.
--------
If you have 1000 floats, I would expect 1000-sign bits in order to be more compressible (whatever compression algorithm you use, it would better determine the pattern of signs).
Then 8000-bits of exponent would also be more compressible, because all the exponents would likely follow its own pattern.
Finally, the 23-bits of mantissa would probably be difficult to impossible for compression. But "extracting it out" before running a compression algorithm will give the sign + exponents more opportunity for comrpession.
--------
So yeah, just think about your data. And favor structure-of-arrays form for better compression.
dahart|1 year ago
It’s worth noting that the algorithm in the article, and it’s cousin, delta encoding, both do already take advantage of any cases where the sign & exponent bits don’t change with every sample, and/or where the mantissa ULPs are constant for a while at a time. It’d be interesting to compare the explicit SoA of floats compression idea to XOR or delta compression, but I think it’s also interesting in a meta kinda of way that these two seemingly different ideas have similar outcomes.
gcr|1 year ago
I was able to get all 220MB of the default SRB2kart maps down into 12.9MB, and could even get down to 5MB with heavy texture compression. LZMA alone could only do 37MB, so I was pretty pumped.
One of the harder parts is compressing the vertex data and adjacency information (which linedefs connect which vertices). I never figured out a satisfying way of doing that, but this makes me want to try again...
throwiforgtnlzy|1 year ago
Sounds like something Numba should include perhaps.
I retired from HPC around the era of iPython, Open MPI, Infiniband, OFED, GPFS/Panasas/Lustre, Rocks Cluster, and Condor/PBS/SGE.
tines|1 year ago