(no title)
Rarebox | 4 months ago
Compared to Huff0[1] (used by Zstd), my AVX512 code is currently ~40% faster at both compression and decompression. This requires using 32 datastreams instead of 4 used by Huff0.
Rarebox | 4 months ago
Compared to Huff0[1] (used by Zstd), my AVX512 code is currently ~40% faster at both compression and decompression. This requires using 32 datastreams instead of 4 used by Huff0.
camel-cdr|4 months ago
For decode, do you use AVX512 to speedup the decode via caching the decode of small codewords?
Do you decode serially or use the self syncronizing nature of huffman codes to decode the stream from multiple offsets in parallel? I haven't seen the later done in SIMD before.
Are there any new SIMD instructions you'd like to see in future ISA extensions?
OpenPower has proposed a scalar instruction to speedup prefix-code decoding: https://libre-soc.org/openpower/prefix_codes/