top | item 23875220

(no title)

Hi HN, I'm the author of this piece (Ethan Weinberger). I wrote this originally as a set of notes for myself when brushing up on concepts in information theory the past couple of weeks. I found the presentations I was reading of the material to be a little dry for my taste, so I tried to incorporate more visuals and really emphasize the intuition behind the concepts. Glad to see others are finding it useful/interesting! :)

discuss

spinningslate|5 years ago

Thanks, I enjoyed reading. As an electronic engineering student, I remember grappling with information theory in the abstract: it was a weather example very similar to yours that gave me the intuition I was missing.

An observation/suggestion. The intro is accessible to many people; that drops off a steep cliff when you hit the maths. Now, I'm not complaining about that: it's instructive and necessary to formalise things. Where I struggle is in reading the equations in my head when I don't know what words to use for the symbols. For example, that very first `X ~ p(x)`. I didn't know what to say for the tilde character, so couldn't verbalise the statement. I do know that $\in$ (the rounded 'E') means 'is a member of' so I could read the next statement. The problem gets even more confusing for a non-mathematician as the same symbol is used with different meaning in different branches of maths/science (e.g. $\Pi$).

I get that writing out every equation in English isn't feasible (or, at least, is asking a lot of the writer). But I wonder if there's middle way, e.g. through hyperlinking?

As I say: not a criticism and I don't have a good solution. Just an observation from a non-mathematician. Enjoyed the piece anyway.

jessriedel|5 years ago

"X ~ p(x)" means "X is a random variable drawn from the probability distribution p(x)" or maybe "X is drawn from p(x)" for short.

Are you sure it's a matter of knowing what to say (in your head) vs knowing the definition of the notation in the first place? I am pretty familiar with this notation, but I rarely verbalize it mentally. I can tell because I read and understand it quickly without problem, but on the rare occasion when I have to read it aloud I realize I'm not sure how I should pronounce it.

windsignaling|5 years ago

I don't think a piece on information theory should necessarily be "accessible to many people". It's a topic which is normally taught in grad school.

Something like X ~ p(x) would be seen all over the place in probability, stats, ML, and related courses such as info theory, detection and estimation, etc. Likely by the time someone is interested in info theory this notation would be permanently etched into their minds. So for this article it is very "audience appropriate".

> not a criticism and I don't have a good solution

Having a mental map of how different subjects fit together (without actually having to studying them in-depth) is a good start.

I've seen so many people crash and burn with machine learning because they were unaware that it depends on linear algebra, calculus, and probability.

With a mental map there is less "surprise" and it's more a matter of simply understanding that they didn't have the right dependencies.

canjobear|5 years ago

I read X ~ p(x) as "X is distributed as p of x"

sohamsankaran|5 years ago

Ethan also writes about machine learning at https://honestyisbest.com/kernels-of-truth each week -- his most recent piece there (https://honestyisbest.com/kernels-of-truth/2020/Jul/14/facia...) has a neat explanation of how convolutional neural networks (CNNs) work.

canjobear|5 years ago

You might want to give more conditions for the claim that the self-information gives the length of the shortest possible code.

In particular the condition isn't only that it's the shortest code, it's the shortest self-delimiting code. In your example with probabilities {1/2, 1/4, 1/8, 1/8}, someone could come in and say let's code it as {0,1,01,00}, which would appear to encode the latter two outcomes 2 bits rather than 3. The problem, of course, is that {0,1,01,00} is not a self-delimiting code: after you receive the bit 0, you don't know if you're done or if you should wait for another bit to form 01. But the code {0,10,110,111} is self-delimiting, because after you get a 0 or a 10, etc., you know you're done.

I've found that when I teach this material, if I don't mention the self-delimiting condition, then a clever student in the class always thinks of the {0,1,01,00}-type code. (This can be a good way to identify clever students in an intro information theory class!)

onurcel|5 years ago

Thank you for the great article. I believe there is a typo in "we assign a value of 0 to p(x) log p(x) when x=0", it should be "when p(x) = 0".

ethanweinberger|5 years ago

Thanks for the catch. Fixed!

wcookeverton|5 years ago

Awesome paper Ethan!!!