top | item 6177403

Lossless decompression and the generation of random samples

44 points| joelg236 | 12 years ago |blog.sigfpe.com

6 comments

order
[+] DanBC|12 years ago|reply
> Huffman coding tries to compress text one letter at a time on the assumption that each letter comes from some fixed and known probability distribution. If the algorithm is successful then we'd expect the compressed text to look like a uniformly distributed sequence of bits. If it didn't then there'd be patterns that could be used for further compression.

This can be gently confusing when you're using different compression systems, (bits vs bytes)

(https://groups.google.com/d/topic/lz4c/DcN5SgFywwk/discussio...)

Someone is compressing very large log files. They then compressed the output, and got further reductions in size.

> The fundamental reason is that these highly repetitive byte sequences, with very small and regular differences, produce repetitive compressed sequences, which can therefore be compressed further. - Yann Collet

[+] Chris2048|12 years ago|reply
I simply don't understand this, I don't think I have the right background. Any good starting places for finding more about this topic?
[+] susi22|12 years ago|reply
I'd suggest diving into discrete probability for a few hours. That should be enough to understand this post. Huffman coding is a little advanced but you can understand the algorithm (not the theory) with just the probability.