Bayesian updating of Probability Distributions

[+] bluecalm|12 years ago|reply

Once you figure out simple examples you slowly start thinking this way about the world. It's beautiful. Take people for example:

Someone with open mind has a priori with at least slight probability assigned to unlikely (for them!) hypothesis while on the other hand very religious people for example have 0 in their priori when it comes to possibility of their religion being made up so they are forced to ignore evidence to the contrary (because bayesian updating breaks for them due to division by zero and mind's way to signal this exception is denial). In general someone with a lot of weight on given hypothesis is "stubborn" or just very convinced and someone with uniform or close distribution just doesn't know anything about given problem.

Someone unable to build heavily weighted distributions is a conspiracy theorist, someone reluctant to - a sceptic and someone too much eager to a fanatic. Someone with very bad priors is un/badly educated (in given domain) or biased or maybe just stupid, someone with good priors is an expert. It's possible to combine expert with sceptic attitude or expert with fanatic or all too often stupid with fanatic (very bad and very heavily weighted priors with possible 0's on some options).

Once you start thinking this way you start expressing yourself differently, you start adding those probability qualifiers to your sentences: "I am very sure it's the way to go", "My intuition tells me this but I am not really sure", "I am very convinced and it's not worth discussing" (yes, it can be rational and good attitude) or "I would do X but I need more evidence to be reasonably sure".

It's all there in people's mind, language and interactions once you start thinking this way it's whole new world of perspective and understanding.

[+] darkxanthos|12 years ago|reply

Yup it's true. I can seem like a Bayes nut at work as I mention updating my priors during debates and when discussing hypotheses for split tests.

[+] yafujifide|12 years ago|reply

Completely agreed. Your comment is a really good advertisement for normal people to learn Bayesian probability theory. Not because it's useful in science or engineering, but because it's useful in life.

[+] spicyj|12 years ago|reply

A question for people who know more about this than I do:

Why was the uniform distribution on [0, 1] chosen initially? Choosing a different distribution would give a different result. (And it doesn't make much sense to say, "Always choose the uniform distribution!" because the choice of variable affects the meaning of the distribution -- if instead we wonder about the value of p^2 and choose a uniform distribution for it on [0, 1], won't we get a completely different result?)

[+] christopheraden|12 years ago|reply

We're interested in the probability of a coin flip yielding heads before we flip any coins. Uniform is just a really common prior to choose in this situation for a few reasons:

-It's a special case of the beta distribution, which is the conjugate prior for binomial problems. This means that the distribution of the probability of getting heads given the coin flips is in the same family as the prior itself (ie: beta priors with binomial likelihoods yield beta posteriors).

-The uniform (for this problem at least) is an "objective prior", which expresses that we don't have much information about whether the flip is biased. The example you give (modeling p^2 instead of p) is a great example of when the uniform would be a bad choice. The reason the uniform doesn't work in this case is because for binomial data (coin flips), a uniform prior is not invariant to reparametrization.

If choosing priors was so simple as always going with the uniform, there'd be little reason to go with Bayes! The choice of prior sometimes makes a radical difference in the posterior (especially with small samples), and there's many things to consider when you choose priors (computational convenience, uninformative versus informative priors, hierarchical modeling, etc).

http://en.wikipedia.org/wiki/Jeffreys_prior http://en.wikipedia.org/wiki/Beta_distribution http://en.wikipedia.org/wiki/Conjugate_prior

[+] bjterry|12 years ago|reply

If you were holding a physical coin in your hand, you would have to be crazy to select a uniform distribution unless it was shaped like a sphere. This is sort of a really minor pet peeve of mine when people use coin-flipping as an example for these things. If it's even vaguely coin-like, even the most ridiculous distortion (maybe it's made of uranium on one side and aluminum on the other) probably couldn't bring the true probability past 70% or something.

[+] bluecalm|12 years ago|reply

We would! Choosing priors is difficult and different people with different experience and views will have different (but still reasonable) priors. You may take personal (just use all your experience) or objective (try to come up with some scheme for assigning priors to types of situations) view on this problem. Work was done in both directions. It's more of a philosophical problem than math problem, although in many cases you want something specific: "I have no idea what the outcome might be but I think it's either extremely on the right or left so give me priori which converges fast to one or the other). Or in a coin example: "it's almost surely more or less fair, so I don't want to change my views after several unlucky flips".

[+] darkxanthos|12 years ago|reply

Great question!

With just the few updates I've given you're probably right that it would affect things significantly. However, the more data you have the less the prior matters. This is known as swamping the priors.

In this case a uniform prior isn't incorrect, but you could definitely say its suboptimal and that I could make my examples much more accurate by choosing a prior that most represents my initial beliefs (like the one head one tail histogram for example.

[+] davidf18|12 years ago|reply

A good (free) explanation of Bayesian Stats using Python: Downey's Think Bayes http://www.greenteapress.com/thinkbayes/

[+] darkxanthos|12 years ago|reply

This helped me the most to get started.

[+] tomrod|12 years ago|reply

I love seing people figure this out for the first time. Bayesian methods just start to make more sense after working through something like this.

[+] skybrian|12 years ago|reply

This code has no memory other than the prior probability distribution. It seems like if you had previously flipped a coin a thousand times to generate it, your prior beliefs should be more strongly held than if you had just made up some numbers. Shouldn't the number of previous trials be accounted for somehow?

[+] conjectures|12 years ago|reply

A better way of thinking about this problem instead of adding N observations in a batch is adding them one at a time. The Beta(a,b) distribution does this - Uniform is Beta(1,1). IIRC adding observations you're just ticking up a and b one at a time depending on whether the outcome was a head or a tails. This applies to exchangeable observations (if you flipped some other coin previously it doesn't work so simply in this model).

[+] dthunt|12 years ago|reply

In fact, it is accounted for. You'll notice the negative exponent on the unlikely hypotheses is getting pretty extreme after a handful of flips.

[+] maaku|12 years ago|reply

A good general introduction to Bayesian updating:

http://yudkowsky.net/rational/bayes/

[+] gtani|12 years ago|reply

for folks looking to delve into the subject, Barber and McKay's freely available content books are terrific

http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...

http://www.inference.phy.cam.ac.uk/itila/

[+] liranz|12 years ago|reply

If you want a lot of Python examples, you should check out this great book: http://camdavidsonpilon.github.io/Probabilistic-Programming-...

25 comments