top | item 39872775

(no title)

tex0 | 1 year ago

Thank you for reposting. I don't want to start bashing XZ, but I honestly wonder why it's been picked up so much despite the valid criticism.

As if the compression rate increase at medium levels were so significant over bzip2 or grip.

discuss

adeon|1 year ago

If I piggyback off one of the commenters on the 11 hours ago post: https://news.ycombinator.com/item?id=39868810#39869769

> I think none of these issues really matter.

I think that's probably it. I.e. it's a nobody-cares issue. Looking at the older posts, there's some serious issues with lzip too (looks like corruption-related) that got a less than stellar response from the author.

Just quickly tested lzip vs zstd -9 on a 100 megabyte text file. zstd is almost as good but many times faster. I wonder if the lzip author's work got obsolete.

I skimmed the manual of lzip a bit. The author likes to talk a lot about how everything is done correctly in lzip and there's this line line "The lzip format specification has been reviewed carefully and is believed to be free from design errors. " If you type lzip --help it talks about how it's better than bzip2 or gzip.

Maybe they are right but ugh comes off real arrogant.

Bulat_Ziganshin|1 year ago

1. both lzip and xz are using lzma compression library internally, so there is no difference in their compression ratio/speed

2. lzma compression is LZ + markov chains, while zstd is LZ + order-0 entropy coder (similar to zlib, rar and many other popular algorithms)

markov chains are higher-order entropy coding, i.e. one using context of previous data. it's slower, but sometimes gives better compression. but text files don't get any benefit from it. OTOH, various binary formats, like executables or databases, get significantly better compression ratio. in my tests lzma-compressed binary files are ~~10% smaller on average.

so, many claims that zstd and lzma provides the same compression ratio, are based on testing on text or other files that don't benefit from higher-order entropy coding. of course, I imply maximum-compression setting and equal dictionary size, in order to make fair comparison of compression formats rather than particular implementations.

(I'm author of freearc and several LZ-based compressors, so more or less expert in this area :)

Maxious|1 year ago

Phil Katz was real arrogant but also real right

wongarsu|1 year ago

Because xz archives the best compression ratios. If you care about archive size above all else it's the best choice. If you use "medium levels" I agree there's no point, zstd is superior in that regime (achieving faster compression and decompression for these compression ratios)

lifthrasiir|1 year ago

Any compressed format is okay as long as you have a working compressor and decompressor, where the compressor can reliably compress an input and the decompressor can reliably decompress the compressed input. Everything else is not as relevant, including the exact file format (because you have the known-good decompressor).

JoshTriplett|1 year ago

xz is not worth using at medium levels. At its highest levels, it sometimes just barely has an edge over the highest levels of zstd, and it existed before zstd did.

These days, zstd is the obvious choice.