> Bayesian inference is a way to get sharper predictions from your data.
Funny, if I had to summarize it in one sentence I'd describe it in the opposite way: Bayesian inference is a way of making less sharp predictions from your data, with quantified uncertainty.
Sometimes you are very certain prior to updating your beliefs (in the form of a posterior), which can lead to very sharp predictions from (and possibly in spite of) your data.
If your data have noises/randomness, it most likely does, and you use your statistic to reduce the variance of the noises then it would be sharper would it not?
That's a very strong statement. There's domains where machine learning tends to have better predictive performance than (Bayesian) statistics, but the converse is true in many other domains.
I would summarize it as Bayesian methods work best in areas where there's often not enough data, there exists significant expert knowledge, and you can properly specify a model. And yes, they do quantify uncertainty.
There are lots of discussions and explanations of what it means to be "Bayesian," but I think the best thing to do is jump in and start building models. That is how I came to understand the utility of Bayes.
There are Stan implementations in R, Python, Julia or you can run it in C++ since it's written in C++. I think this has greater potential to change how we deal with the unknown than AI or other machine learning.
>"There are lots of discussions and explanations of what it means to be "Bayesian," but I think the best thing to do is jump in and start building models."
I highly agree, just play with Stan or JAGs and you will figure it out. The prose descriptions just cannot convey the power and flexibility of Bayesian stats.
PS, you shouldn't be trying to do a "bayesian t-test" or anything like that. That whole way of thinking about research (asking "is there an effect?") is flawed and can't go away soon enough.
At least one guy that I know did this for risk scoring in a payment gateway. He had no clue it even had a name, it just seemed the most obvious way to solve the problem.
Loosely speaking, I've seen Bayesian inference described as a way to update your knowledge when you receive new evidence. In that sense, it's been re-invented since the time of the ancient Greeks.
OT question:
I am merging (calculating the mean of) 16 short exposures of a night photo with high ISO in order to remove noise and get wonderful night shots.
Now I'm just averaging
pixelvalue = (photo1.pixelvalue + photo2.pixelvalue) / numPhotos
Is there a way to make this smarter with a bayesian approach? I'm thinking it couldmake a smarter guess what the actual pixelvalue should be rather than just the average.
But how do I incorporate my level of confidence in my prior? I haven't seen any treatment of this question, even though it is a quite essential one: priors that you are not so sure about should be given less weight than priors that you are very certain about.
This is handled by choosing a smoother, higher entropy prior. If you have uncertainty about your prior then basic factorization tells us it's equivalent to integrating over your various priors with respect to the probability you assign each of them.
OK, but this "inference" is not a valid substitute for a logical inference because it produces a different type of result - probabilistic, not certain.
The crucial difference is that statistical inference does not consider any causation, its domain is observations only, and observations only cannot establish a causation in principle.
Correlation is not a causation. Substituting a Bayesian inference for a logical inference should result in a Type Error (where are all these static typing zealots when we need them?).
This is, by the way, one of the most important principles - universe exist, probabilities and numbers does not. Every causation in the universe is due to its laws and related structures and processes. Causation has nothing to do with numbers or observations. This is why most of modern "science" are non-reproducible piles of crap.
Any observer is a product of the universe. The Bayesian sect is trying to make it the other way around. Mathematical Tantras of the digital age.
Logical inference is just a special case of more general bayesian inference. Anything you can do with logical inference you can do with bayesian inference. Just imagine the probabilities are 0 and 1. Here's an entire book on the subject: http://bayes.wustl.edu/etj/prob/book.pdf
But true logical inference doesn't exist in the real world. Because you can never be 100% sure of anything, not even mathematical facts. It's just an approximation of superior bayesian inference: http://lesswrong.com/lw/mp/0_and_1_are_not_probabilities/
Bayesian inference is just an extension of logical inference to probability distributions. The "correlation is not causation" reaction is off base here.
What you actually want in this context is some code that generates random deviates of probability distributions chosen randomly and a "guesser agent" that tries to guess which distribution was chosen. Then you can ask questions like,
> given some condition on a distribution of distributions, when do we feel that a guesser is taking too long to make a choice?
This is like a person who is taking to long to identify a color or a baby making a decision about what kind of food it wants and waiting for it to do so. For a certain interval, it makes sense, but after a point it becomes pathological.
So for example if we have two distributions,
> uniform distribution on the unit interval [0,1]; uniform distribution on the interval [1,2]
then we get impatient with a guesser who takes longer than a single guess, since we know (with probability 1) that a single guess will do.
Now, if we have two distributions that overlap, say the uniform distribution on [1,3] and [0,2], then we can quantify how long it will take before we know the choice with probability 1, but we can't say for sure how many observations will be required before any agent capable of processing positive feedback in a neural network can say for certain which one it is. As soon as an observation leaves the interval (1,2) the guesser can state the answer.
Now, things can get more interesting when the distributions are arranged in a hierarchy, say the uniform distribution on finite disjoint unions of disjoint intervals (a,b) where a < b are two dyadic rationals with the same denominator when written in lowest terms.
If a guesser is forced to guess early, before becoming certain of the result, then we can compare ways to guess by computing how often they get the right answer.
Observations now give two types of information: certain distributions can be eliminated with complete confidence (because there exists a positive epsilon such that the probability of obtaining an observation in the epsilon ball is zero) while for the others, Bayes theorem can be used to update a distribution of distributions or several distributions of distributions that are used to drive a guessing algorithm. A guess is a statement of the form "all observations are taken from the uniform distribution on subset ___ of the unit interval".
Example: take the distributions on the unit interval given by the probability density functions 2x and 2-2x. Given a sequence of observations, we can ask: what is the probability that the first distribution was chosen?
The answers to these questions can be found in a book like Probability : Theory and Examples.
[+] [-] bjornsing|8 years ago|reply
Funny, if I had to summarize it in one sentence I'd describe it in the opposite way: Bayesian inference is a way of making less sharp predictions from your data, with quantified uncertainty.
[+] [-] ptero|8 years ago|reply
Bayesian inference is an efficient way to track your estimates and uncertainties as you accumulate data.
[+] [-] _0w8t|8 years ago|reply
[1] - https://scottlocklin.wordpress.com/category/tools/machine-le...
[2] - https://www.amazon.com/Algorithmic-Learning-Random-World-Vla...
[+] [-] closed|8 years ago|reply
[+] [-] digitalzombie|8 years ago|reply
[+] [-] gogoengie|8 years ago|reply
[+] [-] bernardlunn|8 years ago|reply
[+] [-] nightski|8 years ago|reply
[+] [-] bernardlunn|8 years ago|reply
[deleted]
[+] [-] chibg10|8 years ago|reply
I would summarize it as Bayesian methods work best in areas where there's often not enough data, there exists significant expert knowledge, and you can properly specify a model. And yes, they do quantify uncertainty.
[+] [-] robterrin|8 years ago|reply
If you're looking for a place to start I'd go to Andrew Gelman's introduction for the Stan Language: https://www.youtube.com/watch?v=T1gYvX5c2sM
There are Stan implementations in R, Python, Julia or you can run it in C++ since it's written in C++. I think this has greater potential to change how we deal with the unknown than AI or other machine learning.
[+] [-] nonbel|8 years ago|reply
I highly agree, just play with Stan or JAGs and you will figure it out. The prose descriptions just cannot convey the power and flexibility of Bayesian stats.
PS, you shouldn't be trying to do a "bayesian t-test" or anything like that. That whole way of thinking about research (asking "is there an effect?") is flawed and can't go away soon enough.
[+] [-] marmaduke|8 years ago|reply
[+] [-] amelius|8 years ago|reply
[+] [-] jacquesm|8 years ago|reply
[+] [-] analog31|8 years ago|reply
[+] [-] thyselius|8 years ago|reply
Now I'm just averaging pixelvalue = (photo1.pixelvalue + photo2.pixelvalue) / numPhotos
Is there a way to make this smarter with a bayesian approach? I'm thinking it couldmake a smarter guess what the actual pixelvalue should be rather than just the average.
Any ideas would be appreciated!
[+] [-] balnaphone|8 years ago|reply
This paper discusses exactly the scenario you discuss.
[+] [-] Houshalter|8 years ago|reply
But how would you use a Bayesian approach? What exactly are you trying to predict? What are the inputs? What is the model?
[+] [-] jonloldrup|8 years ago|reply
[+] [-] cityhall|8 years ago|reply
[+] [-] jonloldrup|8 years ago|reply
[+] [-] jamii|8 years ago|reply
https://arxiv.org/pdf/1701.02434.pdf
[+] [-] hudibras|8 years ago|reply
https://www.amazon.com/Bayesian-Analysis-Chapman-Statistical...
[+] [-] lngnmn|8 years ago|reply
The crucial difference is that statistical inference does not consider any causation, its domain is observations only, and observations only cannot establish a causation in principle.
Correlation is not a causation. Substituting a Bayesian inference for a logical inference should result in a Type Error (where are all these static typing zealots when we need them?).
This is, by the way, one of the most important principles - universe exist, probabilities and numbers does not. Every causation in the universe is due to its laws and related structures and processes. Causation has nothing to do with numbers or observations. This is why most of modern "science" are non-reproducible piles of crap.
Any observer is a product of the universe. The Bayesian sect is trying to make it the other way around. Mathematical Tantras of the digital age.
[+] [-] Houshalter|8 years ago|reply
But true logical inference doesn't exist in the real world. Because you can never be 100% sure of anything, not even mathematical facts. It's just an approximation of superior bayesian inference: http://lesswrong.com/lw/mp/0_and_1_are_not_probabilities/
You can even deduce causation from purely observational data. And here's how: http://lesswrong.com/lw/ev3/causal_diagrams_and_causal_model...
[+] [-] marmaduke|8 years ago|reply
[+] [-] ice109|8 years ago|reply
[+] [-] jacquesm|8 years ago|reply
This you can then adjust for.
[+] [-] genpfault|8 years ago|reply
[+] [-] anythingbot|8 years ago|reply
> given some condition on a distribution of distributions, when do we feel that a guesser is taking too long to make a choice?
This is like a person who is taking to long to identify a color or a baby making a decision about what kind of food it wants and waiting for it to do so. For a certain interval, it makes sense, but after a point it becomes pathological.
So for example if we have two distributions,
> uniform distribution on the unit interval [0,1]; uniform distribution on the interval [1,2]
then we get impatient with a guesser who takes longer than a single guess, since we know (with probability 1) that a single guess will do.
Now, if we have two distributions that overlap, say the uniform distribution on [1,3] and [0,2], then we can quantify how long it will take before we know the choice with probability 1, but we can't say for sure how many observations will be required before any agent capable of processing positive feedback in a neural network can say for certain which one it is. As soon as an observation leaves the interval (1,2) the guesser can state the answer.
Now, things can get more interesting when the distributions are arranged in a hierarchy, say the uniform distribution on finite disjoint unions of disjoint intervals (a,b) where a < b are two dyadic rationals with the same denominator when written in lowest terms.
If a guesser is forced to guess early, before becoming certain of the result, then we can compare ways to guess by computing how often they get the right answer.
Observations now give two types of information: certain distributions can be eliminated with complete confidence (because there exists a positive epsilon such that the probability of obtaining an observation in the epsilon ball is zero) while for the others, Bayes theorem can be used to update a distribution of distributions or several distributions of distributions that are used to drive a guessing algorithm. A guess is a statement of the form "all observations are taken from the uniform distribution on subset ___ of the unit interval".
Example: take the distributions on the unit interval given by the probability density functions 2x and 2-2x. Given a sequence of observations, we can ask: what is the probability that the first distribution was chosen?
The answers to these questions can be found in a book like Probability : Theory and Examples.
[+] [-] Dahoramano|8 years ago|reply
[deleted]
[+] [-] awptimus|8 years ago|reply