I assume you wanted to link to TurboBench and not that particular issue, which for some reason also contains a link for some car listing?
Secondly, did you not see my answer to ebiggers under this comment you replied to? Yes, for Silesia, libdeflate is faster, I can confirm, but there are at least two cases for which igzip is faster and one for which igzip is twice as fast. But yes, it heavily depends on the input data.
Edit: I was then wondering why I could not find any igzip benchmarks on the repository's ReadMe and then found https://github.com/powturbo/TurboBench/issues/43 , so I guess this is the one you wanted to link to and the 3 got cut off.
Well, a correct benchmarking is not done with special data, but with datasets that represent a large set of distributions. Such datasets are for ex. einwik8/9 for text, silesia for a mixed dataset.
As a corner case example, RLE-compressible data is not representative for benchmarking compression libraries.
If you provide a link for a dataset 10-100MB, I can verify your claims, because I'm not aware of a dataset where igzip is 2 times faster than libdeflate. In TurboBench there is no I/O or other overhead involved, additionally it's single threaded.
It's also possible that you're comparing two different CLI programs, one (igzip) I/O optimized and the other as a simple CLI.
EDIT: I've seen the file you're referencing is 4Gi-Base64. This file is not very compressible (75% with gzip). It's possible that igzip is simply storing the file or some parts of it without compression. This explain why it can be faster that libdeflate, because in this case igzip is using memcpy at decompression.
mxmlnkn|2 years ago
Secondly, did you not see my answer to ebiggers under this comment you replied to? Yes, for Silesia, libdeflate is faster, I can confirm, but there are at least two cases for which igzip is faster and one for which igzip is twice as fast. But yes, it heavily depends on the input data.
Edit: I was then wondering why I could not find any igzip benchmarks on the repository's ReadMe and then found https://github.com/powturbo/TurboBench/issues/43 , so I guess this is the one you wanted to link to and the 3 got cut off.
powturbo|2 years ago
Well, a correct benchmarking is not done with special data, but with datasets that represent a large set of distributions. Such datasets are for ex. einwik8/9 for text, silesia for a mixed dataset. As a corner case example, RLE-compressible data is not representative for benchmarking compression libraries.
If you provide a link for a dataset 10-100MB, I can verify your claims, because I'm not aware of a dataset where igzip is 2 times faster than libdeflate. In TurboBench there is no I/O or other overhead involved, additionally it's single threaded. It's also possible that you're comparing two different CLI programs, one (igzip) I/O optimized and the other as a simple CLI.
powturbo|2 years ago