(no title)
superjared | 3 years ago
Clickhouse supports Gorilla and some others[2] that might also be of use.
[1]: https://www.vldb.org/pvldb/vol8/p1816-teller.pdf [2]: https://altinity.com/blog/2019/7/new-encodings-to-improve-cl...
superjared | 3 years ago
Clickhouse supports Gorilla and some others[2] that might also be of use.
[1]: https://www.vldb.org/pvldb/vol8/p1816-teller.pdf [2]: https://altinity.com/blog/2019/7/new-encodings-to-improve-cl...
gopalv|3 years ago
Gorilla is XOR compression which is better for timeseries where the metrics change smoothly from one to the next point, because it just XOR checks against the previous value.
Floats should really not be thought of as byte streams, instead they are 3 bit fields in a single word. Sign, mantissa, exponent split up into 3 streams compresses way better than them all together. At that point you are just dealing with "how to compress integers" which is much simpler problem.
I played with zstd and it compresses way better if you take 8 float64 and shuffle bits side ways. This is a trick that blosc popularized [1].
Adding a shuffle filter ahead of the zlib or zstd worked way better for reducing the size of the data when dealing with float streams. This does group the bits in a similar fashion to splitting up the floats into components, but is much simpler on the decode path with SIMD.
[1] - https://www.slideshare.net/PyData/blosc-py-data-2014/17