top | item 17660574

(no title)

aengvs | 7 years ago

>In fact, any lossless compression algorithm has the property that the output is (on average) at least as long as the input

I don't think this is true. If it was, lossless compression would be useless in a lot of applications. It's pretty easy to come up with a counter example.

E.g.

(simple huffman code off the top of my head, not optimal)

symbol -> code

"00" -> "0"

"01" -> "10"

"10" -> "110"

"11" -> "111"

If "00" will appear 99.999% of the time, and the other 3 symbols only appear 0.001% of the time, the output will "on average" be slightly more than half the length of the input.

discuss

order

OscarCunningham|7 years ago

Sure, I'm assuming that (a) you are trying to encode all strings of length at most n and (b) you have the uniform distribution over those strings. This makes sense in the original context of encoding random data.

aengvs|7 years ago

>you have the uniform distribution over those strings. This makes sense in the original context of encoding random data.

Lossless compression is nothing more than taking advantage of prior knowledge of the distribution of the data you are compressing.

Random data isn't always (or even often) uniformly distributed. Everything we compress is "random" (in the context of information theory), so I disagree that it makes sense to assume uniformly distributed data.