Think Bayes: Bayesian Statistics Made Simple (2012)

[+] _0w8t|8 years ago|reply

For me the best so far book on Bayesian probability was "Probability Theory: The Logic of Science: Principles and Elementary Applications" by E. T. Jaynes.

The book starts from the deduction of Bayesian theorem from the first principles of logic and shows its applications to a wide range of topics. There is thorough discussion of various "paradoxes" and the author sharply criticizes the frequentist statistics. In addition there is a lot of historical references.

[+] CalChris|8 years ago|reply

Probability Theory is available as a PDF.

http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...

[+] vowelless|8 years ago|reply

Great book. Unfortunately Jaynes passed away before he could finish it. Still, a great take on probability theory.

[+] xelxebar|8 years ago|reply

I came here to recommend this book too! It's a text that definitely allows one to go as deep as they wish very fleshed out references, a good appendix, and lots of comments on directions that can be explored more deeply.

It's a book I'm happy to have in dead tree form on my shelf.

[+] boostedsignal|8 years ago|reply

For those unclear on the concrete (rather than philosophical) difference between Bayesian and frequentist statistics in the first place, I hope it's not inappropriate for me to share this 5-minute example that I wrote a while back: https://news.ycombinator.com/item?id=11096129

[+] wodenokoto|8 years ago|reply

You write that the frequentist doesn't answer the question, but it does. It answers

    P(H') = (H/H+T)^H'

You also write that the frequentist solution fails to give an error estimate, yet you don't show that the Bayesian solution does give one.

If the goal of the article is to show that Bayesian is more correct than frequentist then it leaves the reader unconvinced. If the goal is to show 3 ways of finding a probability, you should either say each is fine under its own paradigm, or argue why only one paradigm is correct.

[+] partycoder|8 years ago|reply

Another problem for the less familiar with the Bayes theorem is what is described as the "Bayesian trap", explained by the youtuber Veritasium: https://www.youtube.com/watch?v=R13BD8qKeTg

[+] WhitneyLand|8 years ago|reply

I hope people agree it’s totally appropriate, and appreciated, thank you for reposting it.

This is most of the reason I come here, because people show the good will to share bits of knowledge and experience.

Then a whole other benefit, is that when people are willing to do this, their contribution might be critiqued or corrected, which can then sharpen or polish your knowledge and thinking even in areas where you might be very qualified.

For some people this would be a nightmare, if they can easily feel angry or hurt when their intellect is challenged, especially when they are an “expert” on the subject.

But I suspect most people here feel the opposite. You found a flaw in my results or reasoning? Fucking awesome, you have just make me stronger.

edit: I don’t know many other online forums where this dynamic exists, so if anyone does please don’t keep it a secret.

[+] mycat|8 years ago|reply

For me, this book's first chapters explained nicely about ML, MAP and Bayesian using real computer vision problems. The author included helpful visual aids (gaussian plots, contour plots, filters output, etc) http://www.computervisionmodels.com

This is a rather unusual book where it gives primer on probabilistic method that is actually applicable in non computer vision problems. It is Bayesian heavy and rarely touches neural networks; the book is released in 2012, the year deep learning boom started.

[+] yalph|8 years ago|reply

This is great thanks for posting.

[+] baxtr|8 years ago|reply

I regularly forget how Bayes works. Everytime that happens I browse up to that page: https://www.bayestheorem.net/

I love the way it’s explained there.

[+] jules|8 years ago|reply

You can also think about Bayes' theorem as follows. Suppose we have a logical robot trying to learn about the world. The robot has a collection of hypotheses in its brain. Every time it observes a new fact, it deletes all hypotheses that are incompatible with that fact.

For example, suppose it is thinking about the hair colour and eye colour of Joe. It starts with these hypotheses about Joe's (eye colour, hair colour):

    (eye colour, hair colour)
    =========================
    (blue, blond)
    (blue, black)
    (brown, blond)
    (brown, black)

Suppose that it learns that blue eyed people have blond hair. It deletes hypothesis (blue, black) incompatible with it, and keeps only the hypotheses compatible with it:

    (blue, blond)
    (brown, blond)
    (brown, black)

Suppose it now learns that Joe has blue eyes. It keeps only the hypothesis compatible with it:

    (blue, blond)

So it has now learned the hair colour.

In reality it is not true that all blue eyed people have blond hair. We change the robot's brain and give a weight to each hypothesis indicating how likely it is. Equivalently, we could insert multiple copies of each hypothesis, and the likelihood of a hypothesis is equal to the number of copies of the hypothesis.

    (blue, blond):  10
    (blue, black):  2
    (brown, blond): 9
    (brown, black): 8

Blue eyed people are more likely to be blond. Those are our hypotheses about the attributes of Joe. Suppose we now learn that Joe has blue eyes. It keeps only the hypotheses compatible with it:

    (blue, blond):  10
    (blue, black):  2

So P(blond hair) = 10/12 and P(black hair) = 2/12. This is all Bayes' theorem is: you have a set of weighted hypotheses, and you delete hypotheses incompatible with the observed evidence. The extra factor in Bayes' theorem is only there to re-normalise the weights so that they sum to 1.

[+] Gravityloss|8 years ago|reply

How Bayes kinda works, or how I see it.

Conditional probability (with some caveats that someone in the comments can fill in on):

    P(a,b) = P(b,a)
    P(a|b) * P(b) = P(b|a) * P(a)
    P(a|b) = P(b|a) * P(a) / P(b)

a can be model and b can be data so it becomes

    P(model | data) =
    P(data | model) * P(model) / P(data)

We have or can estimate the things on the right side. We want to ultimately get the thing on the left side.

[+] unknown|8 years ago|reply

[deleted]

[+] submeta|8 years ago|reply

Excellent! Thanks for sharing.

[+] epalmer|8 years ago|reply

My youngest has Allen Downey as a professor this year. She says he is crazy. And she means this in the best way possible. His productivity is prolific having written Think Java in 13 days. He memorized pictures and bios of all 90 students in the first year class at Olin College of Engineering.

Edit typo

[+] keerthiko|8 years ago|reply

Allen's classes were always some of the most over-enrolled ever since I can remember at Olin =).

[+] gwern|8 years ago|reply

> He memorized pictures and bios of all 90 students in the first year class at Olin College of Engineering.

Do you know if he was using spaced repetition to do that? I know some teachers have tried that to speed up learning their students.

[+] jabretti|8 years ago|reply

>He memorized pictures and bios of all 90 students in the first year class at Olin College of Engineering.

It's impressive not so much that he did that, but that he bothered to try.

Most lecturers (myself included) will try very hard not to learn anything about their students because they consider actually dealing with undergrads (particularly first-years!) on an individual level is beneath them.

[+] partycoder|8 years ago|reply

Thanks for posting this. The Jupyter notebooks (and the fact Github has built-in support for them) really help illustrating the concepts.

The book I've used so far to study is "Probability and Statistics: The Science of Uncertainty", by Michael J. Evans and Jeffrey S. Rosenthal. This book is not being published anymore and is free in PDF form.

[+] bhattisatish|8 years ago|reply

The book you mentioned is available at http://www.utstat.toronto.edu/mikevans/jeffrosenthal/

[+] innocentoldguy|8 years ago|reply

“I broke this rule because I developed some of the code while I was a Visiting Scientist at Google, so I followed the Google style guide, which deviates from PEP 8 in a few places. Once I got used to Google style, I found that I liked it. And at this point, it would be too much trouble to change.”

Why would you write a book that targets the Python community and ignore PEP8 styling, inconveniencing an entire community, simply because it would be too much trouble for you to change?

“Also on the topic of style, I write “Bayes’s theorem” with an s after the apostrophe, which is preferred in some style guides and deprecated in others.”

It is deprecated in all modern style guides and should not be used. You’ll get dinged in college English and writing classes for using this outdated and redundant style.

I’m sure this book is great, but, as a point of constructive criticism, I would suggest the author do a better job at adhering to the styles of code and English expected by his target audience, rather than what is comfortable for him.

[+] leephillips|8 years ago|reply

From the PEP8 style guide:

"Many projects have their own coding style guidelines. In the event of any conflicts, such project-specific guides take precedence for that project."

and

"A Foolish Consistency is the Hobgoblin of Little Minds".

And, throughout, PEP8 makes it clear that it is a set of recommendations, and that if a project or community already has an established style, it need not be changed.

[+] TFortunato|8 years ago|reply

Others have already made the point about PEP8 being a guideline, so I just wanted to also point out that not all style guides would agress with you on Bayes'/'s theorem. Case in point, the APA style guide: http://blog.apastyle.org/apastyle/2013/06/forming-possessive...

[+] bubblesocks|8 years ago|reply

I'm not sure why you were down-voted. This is a valid point and as a college professor and author, I'm sure Downey would appreciate any feedback that would make his book better.

[+] emerged|8 years ago|reply

My introduction to Bayesian probability was accidentally reinventing it while trying to invent my own AI system. It naturally followed by constructing a network of information which could be queried to get back whatever had been fed into it and perform deduction/induction.

[+] nafizh|8 years ago|reply

previous discussion:

https://news.ycombinator.com/item?id=4634843

[+] stablemap|8 years ago|reply

With some comments from the author. Hopefully he’ll see this too.

[+] platz|8 years ago|reply

I was surprised to learn about the 'Likelihoodist', an interpretation of bayes that avoids ambiguities with choosing a prior.

An Introduction to Likelihoodist, Bayesian, and Frequentist Methods

http://gandenberger.org/2014/07/28/intro-to-statistical- methods-2/

[+] folksinger|8 years ago|reply

Let's take a recent election as an example:

A Bayesian pollster began with a certain set of prior probabilities. That the college educated were more likely to vote in previous elections, for example, informed the sample population, because it wouldn't make much sense to ask the opinions of those who would stay home.

Thus, based on priors that were updated with new empirical data, a new set of probabilities emerged, that gave a certain candidate a high probability of victory.

Members of the voting public, aware of this high probability, decided that this meant with certainty that this candidate would win and therefore decided to stay home on election day.

In reality the Bayesian models were incorrect as amongst other factors, a much higher number of non-college educated individuals decided to vote and to vote for the other candidate.

As it is with Bayesian intelligence, shared as much by pollsters as machine learning algorithms:

  Real-time heads up display
  Keeps the danger away
  But only for the things that already ruined your day.

[+] clircle|8 years ago|reply

I suppose you aren't talking about Andrew Gelman... https://www.nytimes.com/interactive/2016/09/20/upshot/the-er...

[+] raister|8 years ago|reply

It should also have one called "Think Markov Chain Monte Carlo" - even the simplest reference is intractable and others began very simplistically and ends incomprehensible enough to disgust the subject altogether.

56 comments