How Bayesian Inference Works (2016)

[+] bjornsing|8 years ago|reply

> Bayesian inference is a way to get sharper predictions from your data.

Funny, if I had to summarize it in one sentence I'd describe it in the opposite way: Bayesian inference is a way of making less sharp predictions from your data, with quantified uncertainty.

[+] ptero|8 years ago|reply

OK, I would counter-propose (as I tend to work in a dynamic world):

Bayesian inference is an efficient way to track your estimates and uncertainties as you accumulate data.

[+] _0w8t|8 years ago|reply

Quantified error bounds can be added to most machine learning algorithms using Conformal Prediction and similar ideas, see [1]-[2]

[1] - https://scottlocklin.wordpress.com/category/tools/machine-le...

[2] - https://www.amazon.com/Algorithmic-Learning-Random-World-Vla...

[+] closed|8 years ago|reply

Sometimes you are very certain prior to updating your beliefs (in the form of a posterior), which can lead to very sharp predictions from (and possibly in spite of) your data.

[+] digitalzombie|8 years ago|reply

If your data have noises/randomness, it most likely does, and you use your statistic to reduce the variance of the noises then it would be sharper would it not?

[+] gogoengie|8 years ago|reply

Quite so. There is a difference between creating uncertainty versus revealing uncertainty.

[+] bernardlunn|8 years ago|reply

Or Bayesian inference trades certainty for speed ie how we work in real life.

[+] nightski|8 years ago|reply

Sure if having a sharp prediction means ignoring uncertainty.

[+] bernardlunn|8 years ago|reply

[deleted]

[+] chibg10|8 years ago|reply

That's a very strong statement. There's domains where machine learning tends to have better predictive performance than (Bayesian) statistics, but the converse is true in many other domains.

I would summarize it as Bayesian methods work best in areas where there's often not enough data, there exists significant expert knowledge, and you can properly specify a model. And yes, they do quantify uncertainty.

[+] robterrin|8 years ago|reply

There are lots of discussions and explanations of what it means to be "Bayesian," but I think the best thing to do is jump in and start building models. That is how I came to understand the utility of Bayes.

If you're looking for a place to start I'd go to Andrew Gelman's introduction for the Stan Language: https://www.youtube.com/watch?v=T1gYvX5c2sM

There are Stan implementations in R, Python, Julia or you can run it in C++ since it's written in C++. I think this has greater potential to change how we deal with the unknown than AI or other machine learning.

[+] nonbel|8 years ago|reply

>"There are lots of discussions and explanations of what it means to be "Bayesian," but I think the best thing to do is jump in and start building models."

I highly agree, just play with Stan or JAGs and you will figure it out. The prose descriptions just cannot convey the power and flexibility of Bayesian stats.

PS, you shouldn't be trying to do a "bayesian t-test" or anything like that. That whole way of thinking about research (asking "is there an effect?") is flawed and can't go away soon enough.

[+] marmaduke|8 years ago|reply

Those are interfaces not implementations of Stan, in R, Python et al. The whole Stan thing is C++, fwiw.

[+] amelius|8 years ago|reply

I wonder how many people have reinvented Bayesian inference without knowing it.

[+] jacquesm|8 years ago|reply

At least one guy that I know did this for risk scoring in a payment gateway. He had no clue it even had a name, it just seemed the most obvious way to solve the problem.

[+] analog31|8 years ago|reply

Loosely speaking, I've seen Bayesian inference described as a way to update your knowledge when you receive new evidence. In that sense, it's been re-invented since the time of the ancient Greeks.

[+] thyselius|8 years ago|reply

OT question: I am merging (calculating the mean of) 16 short exposures of a night photo with high ISO in order to remove noise and get wonderful night shots.

Now I'm just averaging pixelvalue = (photo1.pixelvalue + photo2.pixelvalue) / numPhotos

Is there a way to make this smarter with a bayesian approach? I'm thinking it couldmake a smarter guess what the actual pixelvalue should be rather than just the average.

Any ideas would be appreciated!

[+] balnaphone|8 years ago|reply

See https://scholar.google.com/scholar?hl=en&q=COMPARAMETRIC+IMA...

This paper discusses exactly the scenario you discuss.

[+] Houshalter|8 years ago|reply

Median would probably be better because outlier pixels could distort the mean.

But how would you use a Bayesian approach? What exactly are you trying to predict? What are the inputs? What is the model?

[+] jonloldrup|8 years ago|reply

But how do I incorporate my level of confidence in my prior? I haven't seen any treatment of this question, even though it is a quite essential one: priors that you are not so sure about should be given less weight than priors that you are very certain about.

[+] cityhall|8 years ago|reply

This is handled by choosing a smoother, higher entropy prior. If you have uncertainty about your prior then basic factorization tells us it's equivalent to integrating over your various priors with respect to the probability you assign each of them.

[+] jonloldrup|8 years ago|reply

Silly me, he actually treats this question (towards the end of the article using his dog as an example)

[+] jamii|8 years ago|reply

In case anyone is wondering how Bayesian inference works on non-trivial problems:

https://arxiv.org/pdf/1701.02434.pdf

[+] hudibras|8 years ago|reply

Or:

https://www.amazon.com/Bayesian-Analysis-Chapman-Statistical...

[+] lngnmn|8 years ago|reply

OK, but this "inference" is not a valid substitute for a logical inference because it produces a different type of result - probabilistic, not certain.

The crucial difference is that statistical inference does not consider any causation, its domain is observations only, and observations only cannot establish a causation in principle.

Correlation is not a causation. Substituting a Bayesian inference for a logical inference should result in a Type Error (where are all these static typing zealots when we need them?).

This is, by the way, one of the most important principles - universe exist, probabilities and numbers does not. Every causation in the universe is due to its laws and related structures and processes. Causation has nothing to do with numbers or observations. This is why most of modern "science" are non-reproducible piles of crap.

Any observer is a product of the universe. The Bayesian sect is trying to make it the other way around. Mathematical Tantras of the digital age.

[+] Houshalter|8 years ago|reply

Logical inference is just a special case of more general bayesian inference. Anything you can do with logical inference you can do with bayesian inference. Just imagine the probabilities are 0 and 1. Here's an entire book on the subject: http://bayes.wustl.edu/etj/prob/book.pdf

But true logical inference doesn't exist in the real world. Because you can never be 100% sure of anything, not even mathematical facts. It's just an approximation of superior bayesian inference: http://lesswrong.com/lw/mp/0_and_1_are_not_probabilities/

You can even deduce causation from purely observational data. And here's how: http://lesswrong.com/lw/ev3/causal_diagrams_and_causal_model...

[+] marmaduke|8 years ago|reply

Bayesian inference is just an extension of logical inference to probability distributions. The "correlation is not causation" reaction is off base here.

[+] ice109|8 years ago|reply

fails to mention the implicit assumption of conditional independence in the measurements (weighings)

[+] jacquesm|8 years ago|reply

There are some simple tricks that allow you to measure correlation between separate sets of evidence.

This you can then adjust for.

[+] genpfault|8 years ago|reply

Priors go in, posteriors come out. Can't explain that!

[+] anythingbot|8 years ago|reply

What you actually want in this context is some code that generates random deviates of probability distributions chosen randomly and a "guesser agent" that tries to guess which distribution was chosen. Then you can ask questions like,

> given some condition on a distribution of distributions, when do we feel that a guesser is taking too long to make a choice?

This is like a person who is taking to long to identify a color or a baby making a decision about what kind of food it wants and waiting for it to do so. For a certain interval, it makes sense, but after a point it becomes pathological.

So for example if we have two distributions,

> uniform distribution on the unit interval [0,1]; uniform distribution on the interval [1,2]

then we get impatient with a guesser who takes longer than a single guess, since we know (with probability 1) that a single guess will do.

Now, if we have two distributions that overlap, say the uniform distribution on [1,3] and [0,2], then we can quantify how long it will take before we know the choice with probability 1, but we can't say for sure how many observations will be required before any agent capable of processing positive feedback in a neural network can say for certain which one it is. As soon as an observation leaves the interval (1,2) the guesser can state the answer.

Now, things can get more interesting when the distributions are arranged in a hierarchy, say the uniform distribution on finite disjoint unions of disjoint intervals (a,b) where a < b are two dyadic rationals with the same denominator when written in lowest terms.

If a guesser is forced to guess early, before becoming certain of the result, then we can compare ways to guess by computing how often they get the right answer.

Observations now give two types of information: certain distributions can be eliminated with complete confidence (because there exists a positive epsilon such that the probability of obtaining an observation in the epsilon ball is zero) while for the others, Bayes theorem can be used to update a distribution of distributions or several distributions of distributions that are used to drive a guessing algorithm. A guess is a statement of the form "all observations are taken from the uniform distribution on subset ___ of the unit interval".

Example: take the distributions on the unit interval given by the probability density functions 2x and 2-2x. Given a sequence of observations, we can ask: what is the probability that the first distribution was chosen?

The answers to these questions can be found in a book like Probability : Theory and Examples.

[+] Dahoramano|8 years ago|reply

[deleted]

[+] awptimus|8 years ago|reply

Did he just assume their gender? Did I just assume his gender?

47 comments