top | item 6391628

Bayesian updating of Probability Distributions

45 points| darkxanthos | 12 years ago |databozo.com | reply

25 comments

order
[+] bluecalm|12 years ago|reply
Once you figure out simple examples you slowly start thinking this way about the world. It's beautiful. Take people for example:

Someone with open mind has a priori with at least slight probability assigned to unlikely (for them!) hypothesis while on the other hand very religious people for example have 0 in their priori when it comes to possibility of their religion being made up so they are forced to ignore evidence to the contrary (because bayesian updating breaks for them due to division by zero and mind's way to signal this exception is denial). In general someone with a lot of weight on given hypothesis is "stubborn" or just very convinced and someone with uniform or close distribution just doesn't know anything about given problem.

Someone unable to build heavily weighted distributions is a conspiracy theorist, someone reluctant to - a sceptic and someone too much eager to a fanatic. Someone with very bad priors is un/badly educated (in given domain) or biased or maybe just stupid, someone with good priors is an expert. It's possible to combine expert with sceptic attitude or expert with fanatic or all too often stupid with fanatic (very bad and very heavily weighted priors with possible 0's on some options).

Once you start thinking this way you start expressing yourself differently, you start adding those probability qualifiers to your sentences: "I am very sure it's the way to go", "My intuition tells me this but I am not really sure", "I am very convinced and it's not worth discussing" (yes, it can be rational and good attitude) or "I would do X but I need more evidence to be reasonably sure".

It's all there in people's mind, language and interactions once you start thinking this way it's whole new world of perspective and understanding.

[+] darkxanthos|12 years ago|reply
Yup it's true. I can seem like a Bayes nut at work as I mention updating my priors during debates and when discussing hypotheses for split tests.
[+] yafujifide|12 years ago|reply
Completely agreed. Your comment is a really good advertisement for normal people to learn Bayesian probability theory. Not because it's useful in science or engineering, but because it's useful in life.
[+] spicyj|12 years ago|reply
A question for people who know more about this than I do:

Why was the uniform distribution on [0, 1] chosen initially? Choosing a different distribution would give a different result. (And it doesn't make much sense to say, "Always choose the uniform distribution!" because the choice of variable affects the meaning of the distribution -- if instead we wonder about the value of p^2 and choose a uniform distribution for it on [0, 1], won't we get a completely different result?)

[+] christopheraden|12 years ago|reply
We're interested in the probability of a coin flip yielding heads before we flip any coins. Uniform is just a really common prior to choose in this situation for a few reasons:

-It's a special case of the beta distribution, which is the conjugate prior for binomial problems. This means that the distribution of the probability of getting heads given the coin flips is in the same family as the prior itself (ie: beta priors with binomial likelihoods yield beta posteriors).

-The uniform (for this problem at least) is an "objective prior", which expresses that we don't have much information about whether the flip is biased. The example you give (modeling p^2 instead of p) is a great example of when the uniform would be a bad choice. The reason the uniform doesn't work in this case is because for binomial data (coin flips), a uniform prior is not invariant to reparametrization.

If choosing priors was so simple as always going with the uniform, there'd be little reason to go with Bayes! The choice of prior sometimes makes a radical difference in the posterior (especially with small samples), and there's many things to consider when you choose priors (computational convenience, uninformative versus informative priors, hierarchical modeling, etc).

http://en.wikipedia.org/wiki/Jeffreys_prior http://en.wikipedia.org/wiki/Beta_distribution http://en.wikipedia.org/wiki/Conjugate_prior

[+] bjterry|12 years ago|reply
If you were holding a physical coin in your hand, you would have to be crazy to select a uniform distribution unless it was shaped like a sphere. This is sort of a really minor pet peeve of mine when people use coin-flipping as an example for these things. If it's even vaguely coin-like, even the most ridiculous distortion (maybe it's made of uranium on one side and aluminum on the other) probably couldn't bring the true probability past 70% or something.
[+] bluecalm|12 years ago|reply
We would! Choosing priors is difficult and different people with different experience and views will have different (but still reasonable) priors. You may take personal (just use all your experience) or objective (try to come up with some scheme for assigning priors to types of situations) view on this problem. Work was done in both directions. It's more of a philosophical problem than math problem, although in many cases you want something specific: "I have no idea what the outcome might be but I think it's either extremely on the right or left so give me priori which converges fast to one or the other). Or in a coin example: "it's almost surely more or less fair, so I don't want to change my views after several unlucky flips".
[+] darkxanthos|12 years ago|reply
Great question!

With just the few updates I've given you're probably right that it would affect things significantly. However, the more data you have the less the prior matters. This is known as swamping the priors.

In this case a uniform prior isn't incorrect, but you could definitely say its suboptimal and that I could make my examples much more accurate by choosing a prior that most represents my initial beliefs (like the one head one tail histogram for example.

[+] tomrod|12 years ago|reply
I love seing people figure this out for the first time. Bayesian methods just start to make more sense after working through something like this.
[+] skybrian|12 years ago|reply
This code has no memory other than the prior probability distribution. It seems like if you had previously flipped a coin a thousand times to generate it, your prior beliefs should be more strongly held than if you had just made up some numbers. Shouldn't the number of previous trials be accounted for somehow?
[+] conjectures|12 years ago|reply
A better way of thinking about this problem instead of adding N observations in a batch is adding them one at a time. The Beta(a,b) distribution does this - Uniform is Beta(1,1). IIRC adding observations you're just ticking up a and b one at a time depending on whether the outcome was a head or a tails. This applies to exchangeable observations (if you flipped some other coin previously it doesn't work so simply in this model).
[+] dthunt|12 years ago|reply
In fact, it is accounted for. You'll notice the negative exponent on the unlikely hypotheses is getting pretty extreme after a handful of flips.