Hi HN, I'm the author of this piece (Ethan Weinberger). I wrote this originally as a set of notes for myself when brushing up on concepts in information theory the past couple of weeks. I found the presentations I was reading of the material to be a little dry for my taste, so I tried to incorporate more visuals and really emphasize the intuition behind the concepts. Glad to see others are finding it useful/interesting! :)
spinningslate|5 years ago
An observation/suggestion. The intro is accessible to many people; that drops off a steep cliff when you hit the maths. Now, I'm not complaining about that: it's instructive and necessary to formalise things. Where I struggle is in reading the equations in my head when I don't know what words to use for the symbols. For example, that very first `X ~ p(x)`. I didn't know what to say for the tilde character, so couldn't verbalise the statement. I do know that $\in$ (the rounded 'E') means 'is a member of' so I could read the next statement. The problem gets even more confusing for a non-mathematician as the same symbol is used with different meaning in different branches of maths/science (e.g. $\Pi$).
I get that writing out every equation in English isn't feasible (or, at least, is asking a lot of the writer). But I wonder if there's middle way, e.g. through hyperlinking?
As I say: not a criticism and I don't have a good solution. Just an observation from a non-mathematician. Enjoyed the piece anyway.
jessriedel|5 years ago
Are you sure it's a matter of knowing what to say (in your head) vs knowing the definition of the notation in the first place? I am pretty familiar with this notation, but I rarely verbalize it mentally. I can tell because I read and understand it quickly without problem, but on the rare occasion when I have to read it aloud I realize I'm not sure how I should pronounce it.
windsignaling|5 years ago
Something like X ~ p(x) would be seen all over the place in probability, stats, ML, and related courses such as info theory, detection and estimation, etc. Likely by the time someone is interested in info theory this notation would be permanently etched into their minds. So for this article it is very "audience appropriate".
> not a criticism and I don't have a good solution
Having a mental map of how different subjects fit together (without actually having to studying them in-depth) is a good start.
I've seen so many people crash and burn with machine learning because they were unaware that it depends on linear algebra, calculus, and probability.
With a mental map there is less "surprise" and it's more a matter of simply understanding that they didn't have the right dependencies.
canjobear|5 years ago
sohamsankaran|5 years ago
canjobear|5 years ago
In particular the condition isn't only that it's the shortest code, it's the shortest self-delimiting code. In your example with probabilities {1/2, 1/4, 1/8, 1/8}, someone could come in and say let's code it as {0,1,01,00}, which would appear to encode the latter two outcomes 2 bits rather than 3. The problem, of course, is that {0,1,01,00} is not a self-delimiting code: after you receive the bit 0, you don't know if you're done or if you should wait for another bit to form 01. But the code {0,10,110,111} is self-delimiting, because after you get a 0 or a 10, etc., you know you're done.
I've found that when I teach this material, if I don't mention the self-delimiting condition, then a clever student in the class always thinks of the {0,1,01,00}-type code. (This can be a good way to identify clever students in an intro information theory class!)
onurcel|5 years ago
ethanweinberger|5 years ago
wcookeverton|5 years ago