Think Bayes - Bayesian Statistics Made Simple

[+] equark|13 years ago|reply

My problem with books like this is that they have almost no connection to why Bayesian statistics is successful: Bayesian statistics provides a unified recipe to tackle complex data analysis problems. Arguably the only known unified recipe.

The Bayesian book I want should emphasize how Bayes is a recipe for studying complex problems and teach a broad range of model ingredients. Learning Bayesian statistics is about becoming fluent in describing scientific problems in probabilistic language. This requires knowing how to express and compose traditional models and build new ones based on first principles.

An unfortunate reality is that you still need to know computational methods too, but that should change soon enough.

[+] AllenDowney|13 years ago|reply

Yes, that's exactly what the objective of this book is! I am not using computation out of necessity, but rather because I think it provides leverage for understanding the concepts, and learning to (as you say) compose traditional models and build new ones.

As the book comes along, I am finding that many ideas that are hard to explain and understand mathematically can be very easy to express computationally, especially using discrete approximations to continuous distributions.

For example, I just posted a section on ABC

http://www.greenteapress.com/thinkbayes/html/thinkbayes008.h...

that (I think) really demonstrates the strength of this approach.

Of course, my premise only applies for people who are as comfortable with programming as with math, or more so.

[+] loup-vaillant|13 years ago|reply

E.T. Jaynes book, "Probability Theory: the Logic of Science" may come close to what you want. It emphasize that there are rules of thought, which lead to Bayesian statistics. As such, Bayesian statistics aren't just a recipe, but the law.

Now, I can only personally vouch for the first 2 chapters, as I haven't read the rest yet.

[+] nowarninglabel|13 years ago|reply

So, I'm going to counter here and say I don't find this to be a good intro. I started reading and had not heard of the "Girl named Florida" problem and then went to the linked to blog post http://allendowney.blogspot.com/2011/11/girl-named-florida-s...

The way he explains it I found to be confusing and counter-intuitive. I've taken basic stats in college, and learned some of the associated problems, though not this one, and learned the material though not in this particular way. I have to agree whole-heartedly with the commenter on that post "JeffJo" who stipulates why it's an ineffectual way to present the material. Furthermore, I found the author's dismissal of the valid criticism to be enough to not want me to read further.

[+] AllenDowney|13 years ago|reply

I am coming around to the conclusion that this example is more trouble than it's worth. I think it's kind of fun, but it does seem to annoy people.

This kind of feedback is exactly why I like to post drafts early. Expect this example to magically disappear very soon :)

[+] spin|13 years ago|reply

I agree that his first example, "The Girl Named Florida" was a confusing example.

I feel pretty comfortable with Bayesian statistics, and I thought the other examples that I saw were pretty clear. But his very first example jumps you out to another webpage, and then he mixes it with "the red-haired problem". It was irritating.

His next example, "The Cookie Problem" is the classic intro-to-Bayes example, IMO.

[+] jeffjo|13 years ago|reply

Oops, I only cut-and-pasted half of what I wanted. This comes after my other reply.

Yes, these sorts of problems can be confusing. But the confusion is propagated by educators who refuse to recognize that what they asked is not what they intended to ask, and so they provide inconsistent answers.

Say you are on a game show, and pick Door #1. The host opens door #3 to show that it does not have the prize, and offers to let you switch to door #2. Should you? Most people will initially reason that door #3 is prize-less 2/3 of the time, evenly split between cases where the prize is behind door #1 and door #2. So it would be pointless to switch. But that is wrong. Few educators will explain why by solving the problem rigorously. They will use an analogy like pointing out how the original choice is right only 1/3 of the time, and since the host can always open a prize-less door, that can’t change.

People don’t believe these educators because their 1/2 answer is indeed more rigorous than the analogy. It just makes a mistake. The probabilities to use are not the probabilities that the cases exist, but the probabilities that the observed result would occur. The existence probabilities are the same, but the probability of the observed result when the initial door was correct is half of what it is when the initial choice was incorrect.

[+] jeffjo|13 years ago|reply

Yes, these sorts of problems can be confusing. But the confusion is propagated by educators who refuse to recognize that what they asked is not what they intended to ask, and so they provide inconsistent answers.

Say you are on a game show, and pick Door #1. The host opens door #3 to show that it does not have the prize, and offers to let you switch to door #2. Should you? Most people will initially reason that door #3 is prize-less 2/3 of the time, evenly split between cases where the prize is behind door #1 and door #2. So it would be pointless to switch. But that is wrong. Few educators will explain why by solving the problem rigorously. They will use an analogy like pointing out how the original choice is right only 1/3 of the time, and since the host can always open a prize-less door, that can’t change.

People don’t believe these educators because their 1/2 answer is indeed more rigorous than the analogy. It just makes a mistake. The probabilities to use are not the probabilities that the cases exist, but the probabilities that the observed result would occur. The existence probabilities are the same, but the probability of the observed result when the initial door was correct is half of what it is when the initial choice was incorrect.

[+] udit99|13 years ago|reply

I'm really interested in knowing the prereqs I should have before picking up a book like this. Coming from a weak math background I find these books highly appealing but mildly intimidating. Also, could someone advise me on the preferred order of tackling the following Books?

1. Think Bayes

2. Think Stats

3. Programming Collective Intelligence by T.Segaran

[+] jey|13 years ago|reply

3, 1, 2.

3: Get (back) into the swing of thinking about mathematics and algorithms.

1: Bayesian statistics is a principled, coherent, consistent, intuitive, complete framework for reasoning about uncertainty. A good foundation.

2. Traditional statistics is more random and ad-hoc, but can be more practical than Bayesian methods. (Bayesian models are well-motivated, but it can be impractical to compute exact answers and you'll have to switch to approximation techniques, some of which are simple/universal/slow, and others get fairly complex.)

[+] AllenDowney|13 years ago|reply

I need to write a preface to answer this question, but the most important prereq is Python programming. The premise of the series is that if you can program (in any language) you can use that skill as leverage to learn about other topics.

So I would recommend

1. Think Python 2. Think Stats 3. Think Bayes

[+] spin|13 years ago|reply

If you are strong in Python but weak in math, then I would also recommend #3, (Collective Intelligence, by Segaran)

[+] pmjordan|13 years ago|reply

So this is all very well and good, I've had about 5 intros to Bayesian Statistics. But those are a fair bit away from actually applying that knowledge in practice in software.

Let's say we have N different kinds of events with unknown probabilities and unknown dependence or independence between them. The naive approach to gathering data on the probability of event n occurring following an occurrence of event m would require O(N²) in space. Let's say N ~ 10⁹~10¹⁰. Storing that much data as a raw matrix isn't practical in most cases, so we have to find a more efficient data structure - in terms of both space and the operations we need to perform. (and taking into account characteristics of the storage medium, i.e. memory or disk or a combination) What happens if the probabilistic properties of the system change over time?

Are there any introductory books or other resources on modeling this kind of problem? Clearly this has been tackled before, but I'm having a hard time making the leap from theory to practice - and I don't mean import the data into R or SPSS or whatever and let that grind out a solution, but coming up with approximations when you have runtime and space constraints that make that approach impractical.

[+] trhtrsh|13 years ago|reply

I think the general approach is dimensionality reduction: start measuring, and round down to 0 for the low-correlation pairs of events.

Do you actually have a stream of more than N^2 observations to process? If not, then most of your correlations are in fact 0, and sparse-matrix techniques apply.

[+] hessenwolf|13 years ago|reply

Sounds like you need a Bayesian network - the junction tree algorithm.

You can send the cheque in the mail. ;)

[+] PostOnce|13 years ago|reply

I'm a big fan of the author's other books, Think Python and Think Complexity (haven't had the time for Think Stats), I found them more understandable than most other books that purport to teach people of the same skill level.

I'm hoping this will be as good, but all the negative comments here leave me skeptical. Perhaps this is the crowd that would enjoy K&R C more than Think Python. The former is more of a reference to me than an introductory tome. Perhaps everyone here is just better at math than I am.

[+] AllenDowney|13 years ago|reply

Ignore the haters -- Think Bayes is going to be awesome!

Just kidding (mostly), but your point is correct: there is no book that is right for all audiences. But if you can program, and the mathematical approach to this material doesn't do it for you, this book might.

[+] SkyMarshal|13 years ago|reply

FB announcement where I found this - https://www.facebook.com/thinkstats/posts/325778617519425

[+] roryokane|13 years ago|reply

Link to Think Stats, the author’s previous book: http://thinkstats.com/

[+] hessenwolf|13 years ago|reply

Bayesian is cool because you can make arbitrarily complex models, and when you have the parameters estimated it is really easy to calculate all the cool things you want to.

Bayesian is not cool because estimating the parameters takes bloody ages on a supercomputer, unless you spend ages being really careful to specify your model.

Frequentist statistics is cool because it is a massive big bag of tricks to estimate all sorts of stuff, and pretty much all of the tricks are already in R.

Frequentist statistics is not so cool because calculating all the specific things you want to can be a pain in the ass.

Once either Quantum computers kick in or a better algorithm than MCMC for Bayesian is created, Bayesian will win.

There are some philosophical arguments about the objectivity of the prior in Bayesian statistics, but these wash out in a decision theoretic framework because of the subjectivity of the utility function at the other end of the process.

Also, less than 5% of people reporting p-values really know what a p-value is.

[+] zmjones|13 years ago|reply

Definitely agree. Do check out Stan though. HMC is pretty fast compared to BUGS/JAGS.

[+] trhtrsh|13 years ago|reply

Bayesian statistics gives you a subjective answer to your question. (conditioned on the prior you choose)

Frequestist statistics gives you an objective answer to a question that has the same words as the one you asked, but arranged differently.

[+] juanfatas|13 years ago|reply

I found Udacity's class (CS373, ST101) and the 2011 ai class also explained bayesian very well by Sebastian Thrun.

[+] Groxx|13 years ago|reply

>This HTML version of is provided for convenience, but it is not the best format for the book. In particular, some of the symbols are not rendered correctly.

I would actually recommend the opposite - they have ASCII versions of the symbols that e.g. Chrome might not render correctly, and all I checked looked fine. The PDF meanwhile copies this text from the (not linked) link to the "girl named Florida" article:

  ❤tt♣✿✴✴❛❧❧❡♥❞♦✇♥❡②✳❜❧♦❣s♣♦t✳❝♦♠✴✷✵✶✶✴✶✶✴❣✐r❧✲♥❛♠❡❞✲❢❧♦r✐❞❛✲s♦❧✉t✐♦♥s✳❤t♠❧

And the sections are linked in the HTML version, where they are not in the PDF, which seems like a simple oversight (that infects the vast majority of PDFs, sadly).

[+] ninetax|13 years ago|reply

Has anyone used the Think Stats book? Is it a good intro to stats?

[+] stdbrouw|13 years ago|reply

It's really, really good, especially if you're interested in the why and not just the "give me the damn test I need to run in SPSS and what number to look at". Plus, because you spend a lot of time coding, it's more fun and less dry than most stats books.

[+] Surio|13 years ago|reply

[deleted]

[+] jcrubino|13 years ago|reply

I think you and Prof Scott Page could make a great team combining your think series with his model thinking class content.

[+] jcrubino|13 years ago|reply

U Mich. Prof Scott E. Page http://masi.cscs.lsa.umich.edu/~spage/

[+] signa11|13 years ago|reply

thank you ! should help in coursera's pgm course somewhat i guess

46 comments