"If someone tells you a fact you already know, they’ve essentially told you nothing at all."
I think it's interesting to consider how this is/is not true in a real life context.
If someone tells you a fact you already know, then it could be communicating several different things.
1. It might mean *they think* you don't know it.
2. It might mean they want to make sure *you know* they know you know it.
3. It might mean they believe you know it and want to communicate they agree.
4. It might mean they are hoping for you to (or not to) contradict/argue/refute it.
If, however, you can't tell which of those (or some other alternative) is intended, then no information was communicated at all.
Except...there is a message communicated that they have some interest or concern regarding the topic, which is different from remaining silent.
Whenever someone says anything, true, false, nonsense, lie, they are communicating a true and possibly significant fact - that they chose to say that thing, in whatever circumstance.
I am thinking of "Gödel, Escher, Bach", if you can't tell.
> "If someone tells you a fact you already know, they’ve essentially told you nothing at all."
The reality of zero information transfer is more subtle than this quote makes it sound, and covers the point you raise. In reality, you must not just know the thing you are being told, but you also have to know in advance that it is the thing you are going to be told. So if I tell you "the sun is shining" when you already know it is, there is still information being transferred if I could also have said "yesterday it was raining". Only if "the sun is shining" is absolutely the only thing I could have said is it a zero information statement.
> If someone tells you a fact you already know, then it could be communicating several different things.
You are alluding to the "meaning of communication" while information and communication theory is strictly about bits that are stored somewhere or/and send over the channel. If you know with 100% certainty that the next bit will be zero then receiving this bit gives you no information beyond what you already knew. Of course, if your certainty is less than 100% then the bit delivered you some fractional quantity of information. Mathematically, communication channel couples together random process X at the transmitter with random process Y at the receiver. The game is to describe X while observing only Y. Mutual information is a quantitative measure of how much Y "knows" about X. There is no human meaning attached to it whatsoever.
> That you can zip a large movie file, for example, owes to the fact that pixel colors have a statistical pattern, the way English words do. Engineers can build probabilistic models for patterns of pixel colors from one frame to the next.
This is, in fact, incorrect generally. Movie files are generally compressed via _lossy_ compression, which escapes the Shannon entropy limit. Entropy based encoding, as described in this article, is generally only the final stage in a video "codec" (algorithm to compress, and decompress video). The first stage relies of finding information in the visual field that humans are unlikely to notice the loss of and discarding it. The second stage does interframe (between frames) compression by simply searching for blocks of pixels that match. Entropy coding is generally the final stage.
This, by the way, is why zipping a video file can make it... larger.
This is a pretty unkind reading. The author almost surely was using zip as a synonym for compress. Some people use "hoover" as a proxy for vacuum, even though they're not literally using a Hoover vacuum to clean.
Movies aren't "escaping" the Shannon entropy limit, they're falling well within the Shannon entropy limit for the appropriate domain of human vision.
Whether it's throwing out high frequency noise at one stage or using moving blocks and keeping delta changes at another, there's an underlying assumption of the "codeword" space is versus the source space. Even by the time it gets to the codec it's already "compressed" by throwing out almost all the spectra not visible to us. These are implicit assumptions about the domain that you've dismissed.
That lossy compression "escapes" the Shannon entropy limit doesn't seem like an accurate way of putting it. Lossy compression throws information away, so there is less information to compress. The entropy limit is still in place.
I agree with the overall point, though, that movie files aren't "zipped" up, but are compressed using lossy compression (with few exceptions).
> Engineers can build probabilistic models for patterns of pixel colors from one frame to the next. The models make it possible to calculate the Shannon entropy [...] That value tells you the limit of “lossless” compression — the absolute most the movie can be compressed before you start to lose information about its contents.
While this is generally true in practice, strictly speaking the shortest possible encoding of a sequence of bits could be smaller than this would suggest, since it's possible a pattern exists that has escaped notice.
For example: the first 10 billion digits of pi pass statistical tests for randomness, but can be generated by a short program which calculates them until the desired length is reached, in effect "compressing" them to the length of the generator program.
Because of considerations like this, Algorithmic Information Theory[1] equates the information content of a bit sequence with the length of the shortest program which will generate it - though this has the drawback of being generally uncomputable - an intriguingly different paradigm.
I think your comments on encoding a sequence of bits/pi is a little misleading. For example, in the case of the first 10 billion digits of pi, you do not even need a program to calculate it. One can just use a lookup table which maps the digits 0 through 9 to the frequency they appear in 10 billion digits of pi, which is <40 bytes of data. You are confusing the entropy of the digits of pi, for which the order does not matter, and the entropy of the sequence of pi, for which the order does matter. In the latter, you would want compare the probability of observing the exact sequence of pi compared to other sequences of numbers. This is an entirely different question. Because the sequence of digits of pi is deterministic, its entropy is 0, i.e., if one had a machine from which you could sample "10-billion digit sequences of pi" you would get the same sequence every time.
>strictly speaking the shortest possible encoding of a sequence of bits could be smaller than this would suggest, since it's possible a pattern exists that has escaped notice.
If it is a deterministic pattern, then the entropy of the sequence is actually 0. Otherwise the existence of patterns is something that the entropy computation already takes into account, and is in fact the entire point. In practice, finite sample sizes introduces noise and means that we can only ever estimate the true entropy, and Jensen's inequality implies that all such estimates are higher than the true entropy of a system, so that we observe even deterministic sequences with noisy filters. This is perhaps the clearer version of your point.
You're not considering the exchange for space versus time. A short program may be able to compute the digits of pi to the accuracy you want but you're discounting the runtime and subsequent resources it uses to compute those digits.
Readers could be left with the feeling that somehow you can get information for free, which is not the case. I can't start a program that prints all strings and claim I've achieved ideal compression because the desired output is among one of the strings printed.
It's not just incomputability but the resources involved.
I'm only barely familiar with Kolmogorov complexity and, as you point out, it looks like this is what Algorithmic Information theory is designed for but even in those scenarios, there's clearly a push and pull between run time and memory. As a heuristic, I tend to think of there being a sort of conservation law at work where sacrificing space will increase run-time and vice-versa.
I’ll ask it here since I’ve not seen it mentioned anywhere: the Kolmogorov work is interesting, but doesn’t it exclude the information necessary to produce a team of educated humans, a universal computer, a programming language, and a compiler/interpreter, at least? In a brief reading AIT seems to assume a universal Turing machine should exist by natural law.
Could a smaller program be written that could generate the code that generates the digits of pi? So the code that writes the code is the smallest information content? I wonder how many times this could recurse before there is a limit..
For anyone interested in these ideas, I'd highly recommend reading Shannon's original 1948 paper (linked in the article). While technical and precise it is actually quite digestible for someone with a bit of a quantitative background.
Information theory is fascinating to anyone who likes the mathy underpinnings of seemingly non-mathy things, and it's everywhere. Cool stuff.
Speaking of information and decompression, my brain initially interpreted the title as a breaking news headline about a female supervillain who's also a theory nerd.
It was only the added context of the domain name that fixed the error.
I'm more interested in the problem of sending strings out into the world where more strings may "stick" to them, possibly producing more strings. You know, like life. Or software. Given a particular string, can do you determine its age? Can you determine which discrete strings came into contact with it along its journey and when? And so on. I'm not sure if this is interesting to anyone else, but it sure is interesting to me! Consider how these are the boundary conditions that justify software design decisions so primordial we don't even consider them decisions anymore.
One example I like to use (when talking about entropy):
A four digit numeric PIN (that we know) has 0 bits of entropy. There is no uncertainty about what the PIN actually is. A randomly selected one (that we do not know) has just over 13 bits.
print(math.log(10)/math.log(2)*4)
13.28771237954945
The more entropy, the more uncertain we are.
However, humans are not random. We use the year we were born, some keyboard sequence or some other predictable number as our PIN. We don't know exactly how much entropy these PINS have (there is some degree of uncertainty), but we do know they are significantly less than 13 bits.
I wrote a blog post [1] with an interactive widget where you can provide an encoding for a random decimal digit and see how close you can get to the theoretical log₂(10) ≈ 3.32 bits.
If using my birthday reduces the entropy of my PIN, what does it do to its entropy if I happen to have the same birthday as one of the most famous people in the world? Does it matter if I am aware or not? Does it matter what they use for their PIN?
For the sake of argument, I'm thinking month and day, not year.
Information theory is so cool. I feel like having a solid grasp of the fundamentals is critical for achieving mastery in software development. Closely related would be digital signal processing and frequency domain techniques, both also extremely useful in many branches of software.
The actual information content of a thing has always surprised me, especially when factoring in slightly imperfect (but passable) representations such as JPEG or MP3. To lose some information seems entirely acceptable in many cases.
[+] [-] vba616|3 years ago|reply
I think it's interesting to consider how this is/is not true in a real life context.
If someone tells you a fact you already know, then it could be communicating several different things.
If, however, you can't tell which of those (or some other alternative) is intended, then no information was communicated at all.Except...there is a message communicated that they have some interest or concern regarding the topic, which is different from remaining silent.
Whenever someone says anything, true, false, nonsense, lie, they are communicating a true and possibly significant fact - that they chose to say that thing, in whatever circumstance.
I am thinking of "Gödel, Escher, Bach", if you can't tell.
[+] [-] phreeza|3 years ago|reply
The reality of zero information transfer is more subtle than this quote makes it sound, and covers the point you raise. In reality, you must not just know the thing you are being told, but you also have to know in advance that it is the thing you are going to be told. So if I tell you "the sun is shining" when you already know it is, there is still information being transferred if I could also have said "yesterday it was raining". Only if "the sun is shining" is absolutely the only thing I could have said is it a zero information statement.
[+] [-] lr1970|3 years ago|reply
You are alluding to the "meaning of communication" while information and communication theory is strictly about bits that are stored somewhere or/and send over the channel. If you know with 100% certainty that the next bit will be zero then receiving this bit gives you no information beyond what you already knew. Of course, if your certainty is less than 100% then the bit delivered you some fractional quantity of information. Mathematically, communication channel couples together random process X at the transmitter with random process Y at the receiver. The game is to describe X while observing only Y. Mutual information is a quantitative measure of how much Y "knows" about X. There is no human meaning attached to it whatsoever.
[+] [-] zbobet2012|3 years ago|reply
This is, in fact, incorrect generally. Movie files are generally compressed via _lossy_ compression, which escapes the Shannon entropy limit. Entropy based encoding, as described in this article, is generally only the final stage in a video "codec" (algorithm to compress, and decompress video). The first stage relies of finding information in the visual field that humans are unlikely to notice the loss of and discarding it. The second stage does interframe (between frames) compression by simply searching for blocks of pixels that match. Entropy coding is generally the final stage.
This, by the way, is why zipping a video file can make it... larger.
[+] [-] abetusk|3 years ago|reply
Movies aren't "escaping" the Shannon entropy limit, they're falling well within the Shannon entropy limit for the appropriate domain of human vision.
Whether it's throwing out high frequency noise at one stage or using moving blocks and keeping delta changes at another, there's an underlying assumption of the "codeword" space is versus the source space. Even by the time it gets to the codec it's already "compressed" by throwing out almost all the spectra not visible to us. These are implicit assumptions about the domain that you've dismissed.
[+] [-] domador|3 years ago|reply
I agree with the overall point, though, that movie files aren't "zipped" up, but are compressed using lossy compression (with few exceptions).
[+] [-] ttctciyf|3 years ago|reply
While this is generally true in practice, strictly speaking the shortest possible encoding of a sequence of bits could be smaller than this would suggest, since it's possible a pattern exists that has escaped notice.
For example: the first 10 billion digits of pi pass statistical tests for randomness, but can be generated by a short program which calculates them until the desired length is reached, in effect "compressing" them to the length of the generator program.
Because of considerations like this, Algorithmic Information Theory[1] equates the information content of a bit sequence with the length of the shortest program which will generate it - though this has the drawback of being generally uncomputable - an intriguingly different paradigm.
1: see https://en.wikipedia.org/wiki/Algorithmic_information_theory
[+] [-] dawnofdusk|3 years ago|reply
>strictly speaking the shortest possible encoding of a sequence of bits could be smaller than this would suggest, since it's possible a pattern exists that has escaped notice.
If it is a deterministic pattern, then the entropy of the sequence is actually 0. Otherwise the existence of patterns is something that the entropy computation already takes into account, and is in fact the entire point. In practice, finite sample sizes introduces noise and means that we can only ever estimate the true entropy, and Jensen's inequality implies that all such estimates are higher than the true entropy of a system, so that we observe even deterministic sequences with noisy filters. This is perhaps the clearer version of your point.
[+] [-] abetusk|3 years ago|reply
Readers could be left with the feeling that somehow you can get information for free, which is not the case. I can't start a program that prints all strings and claim I've achieved ideal compression because the desired output is among one of the strings printed.
It's not just incomputability but the resources involved.
I'm only barely familiar with Kolmogorov complexity and, as you point out, it looks like this is what Algorithmic Information theory is designed for but even in those scenarios, there's clearly a push and pull between run time and memory. As a heuristic, I tend to think of there being a sort of conservation law at work where sacrificing space will increase run-time and vice-versa.
[+] [-] svnt|3 years ago|reply
[+] [-] Eddy_Viscosity2|3 years ago|reply
[+] [-] rawoke083600|3 years ago|reply
[+] [-] corey_moncure|3 years ago|reply
[+] [-] acjohnson55|3 years ago|reply
Claude Shannon was a titan. He's also known for publishing what some call the most important masters thesis of all time, https://en.wikipedia.org/wiki/A_Symbolic_Analysis_of_Relay_a....
[+] [-] rkp8000|3 years ago|reply
[+] [-] ivansavz|3 years ago|reply
direct link to PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=677...
[+] [-] prophesi|3 years ago|reply
[+] [-] mpalmer|3 years ago|reply
Speaking of information and decompression, my brain initially interpreted the title as a breaking news headline about a female supervillain who's also a theory nerd.
It was only the added context of the domain name that fixed the error.
[+] [-] javajosh|3 years ago|reply
[+] [-] _wldu|3 years ago|reply
A four digit numeric PIN (that we know) has 0 bits of entropy. There is no uncertainty about what the PIN actually is. A randomly selected one (that we do not know) has just over 13 bits.
print(math.log(10)/math.log(2)*4)
13.28771237954945
The more entropy, the more uncertain we are.
However, humans are not random. We use the year we were born, some keyboard sequence or some other predictable number as our PIN. We don't know exactly how much entropy these PINS have (there is some degree of uncertainty), but we do know they are significantly less than 13 bits.
[+] [-] mnks|3 years ago|reply
[1]: https://blog.kardas.org/post/entropy/ (Average Code Length section)
[+] [-] vba616|3 years ago|reply
If using my birthday reduces the entropy of my PIN, what does it do to its entropy if I happen to have the same birthday as one of the most famous people in the world? Does it matter if I am aware or not? Does it matter what they use for their PIN?
For the sake of argument, I'm thinking month and day, not year.
[+] [-] bob1029|3 years ago|reply
The actual information content of a thing has always surprised me, especially when factoring in slightly imperfect (but passable) representations such as JPEG or MP3. To lose some information seems entirely acceptable in many cases.
[+] [-] ffhhj|3 years ago|reply
http://www-formal.stanford.edu/jmc/slides/dartmouth/dartmout...