top | item 31928458

(no title)

dhdc | 3 years ago

The theorem you are looking for is Shannon's Source Coding Theorem[1]. It basically states that no encoding scheme can losslessly compress data beyond the given data's Shannon entropy.

[1]: https://en.m.wikipedia.org/wiki/Shannon's_source_coding_theo...

discuss

order

motohagiography|3 years ago

Amazing, thank you, it had to be something like that. It makes much more sense as an encoding limitation than as something abstract as the "shortest possible program" via Kolmolgorov complexity. I was wondering whether KC may just be an inconsistently defined thought experiment compared to concrete ideas from information theory, and if it's not, how much in common it would necessarily have with the coding theorem, as how are they not talking about the same thing.

chriswarbo|3 years ago

What you're after is "minimum description length": choose some representation, e.g. gzip, and try to find the shortest encoding of our input in that representation.

The simplest representation is just raw data. There's only one way to represent our input in this format: just write it out verbatim. Hence its minimum description length is identical to the input size.

A more powerful representation is run-length encoding, where runs of identical symbols can be stored as the symbol and the number of repetitions; everything else is stored raw. Note that there are multiple ways to represent an input in this format: e.g. "50 zeros" is the same as "25 zeros 25 zeros", "00000 45 zeros", etc. yet the latter aren't minimum descriptions.

A more powerful representation is to use back-references: this lets us compress runs of multiple repeated symbols (run-length encoding only handles runs with 1 repeated symbol), and re-use commonly-occuring 'phrases'. This is essentially how LZ works. Again, there are multiple ways to represent an input in this format; finding a minimum description is incredibly expensive; most compressors don't bother, and instead have a configurable 'effort' (e.g. 1 to 9).

Adding more power to a representation makes it able to compress more complex patterns, but also makes it harder to find a minimum description. The most powerful representation is a Universal Turing Machine, which can represent any computable pattern; its minimum description length is the Kolmogorov complexity, and finding it is uncomputable in general.

Note that both Shannon information and algorithmic information (i.e. Kolmogorov complexity) can be gamed if we focus on a single message; for example, I can define an encoding like this:

- To encode a snapshot of Wikipedia as of 2022-06-30T14:00:00Z, output a single `1`

- To encode any other data, run it through `zip` and prepend a `0`

- To decode data which begins with a `1`, output a snapshot of Wikipedia as of 2022-06-30T14:00:00Z

- To decode data which begins with a `0`, send everything after that `0` through `unzip`

Hence we need to focus on the distribution of possible messages, and the big-O behaviour after many messages have been sent, rather than individual messages themselves.

Shannon information is essentially a frequentist approach, based on counting and expectations. Algorithmic information is essentially Bayesian, where the choice of Universal Turing Machine defines a prior, the bits of a message (AKA a program) are evidence, and decoding that message (AKA running the program) updates that prior.

dhdc|3 years ago

In a sense, Kolmolgorov complexity and Shannon entropy can basically be considered as equivalent concepts: they both define the minimum amount of information required to fully define a given piece of data.