Confidence intervals are weird because of their very minimal definition. My favorite confidence interval procedure for iid data demonstrates why you need a tighter definition for a useful interval.
For a 93.75% confidence interval, draw 5 points (iid). If the last four are all greater than the first one, your CI is the whole real number line, otherwise it’s the empty set.
Once you draw some actual data and get a specific interval, you want to ask about some degree of belief that your specific interval contains the actual parameter. In the case that your CI is all numbers, you know for a fact that it contains the true parameter value. In the case that your CI is the empty set, you know for a fact that it doesn’t contain the true parameter value.
I like this CI procedure because it demonstrates two things. 1) The kind of reasoning for going forward from an unknown parameter to a random interval is very different than what you have working backwards from a specific interval back to the parameter. That asymmetry can be WEIRD. 2) The weirdness is possible if you limit yourself to only the CI definition, meaning that if you want it to be useful, you need something that rules out weird shit like my example.
The properties of specific CI procedures people actually use are generally much much better than what is allowed by the definition of a CI. If you want useful reasoning backwards from the interval, don’t try to reason solely from the definition of a CI.
I'm having trouble understanding your example. What value is the 93.75% confidence interval for? Is it for the population mean? If so, why does the sample order influence the result?
I posted the following as a comment on the article, but it's stuck in moderation there...
I'm not clear on what it is that you [the post's author] don't understand about interpretation of Bayesian credible intervals.
Both "objective" and "subjective" Bayesians interpret them as degrees of belief - that, for instance, one would use to make bets (supposing, of course, that you have no moral objection to gambling, etc.).
The difference is that that "objective" Bayesians think that one can formalize "what one knows" and then create an "objective" prior on that basis, that everyone "with the same knowledge" would agree is correct. I don't buy this. Formalizing "what one knows" by any means other than specifying a prior (which would defeat the point) seems impossible. And supposing one did, there is disagreement about what an "objective" prior based on it would be. To joke, "The best thing about objective priors is there are so many of them to choose from!".
Many simple examples can illustrate that the objective Bayesian framework just isn't going to work. One example is the one-way random effects model, where the prior on the variance of the random effects will sometimes have a large influence on the inference (eg, on the posterior probability that the overall mean is positive), but where there is no sensible "objective" prior - you just have to subjectively specify how likely it is that the variance is very close to zero. Another even simpler example is inference for theta given an observation x~N(theta,1), when it is known (with certainty) that theta is non-negative, and the observed x is -1. There's just no alternative to subjectively deciding how likely a priori it is that theta is close to zero.
Frequentist methods also don't give sensible answers in these examples. Subjective Bayesianism is the only way.
It appears to me that the reason Bayesian probability is somewhat elusive is due to its metaphysical underpinning: what do we mean by probability?
If we live in a materialist deterministic world - which many would cite as an axiom for simplicity - then there really is no probability. Everything happens with 100% certainty.
Then, what is probability? If everything will happen with 100% certainty, but probability certainly appears to exist, then probability must reflect something about our information about something occurring.
The author refers to two foundational approaches to our state of knowledge. The first is the objectivist approach, which states that everyone who has the same state of knowledge about a system will evaluate the same probability of something occurring. The second is the subjectivist approach, which states that a given individual with a certain state of knowledge will evaluate some probability of something occurring. To me, these appear to be the same thing except insofar as the former requires a consensus of many while the latter a consensus of one.
The author asks how we might actually define Bayesian probability without resorting to the frequentist approach (i.e. hypothetically simulating many trials of the same event, however infrequent in reality it may be).
First, he says this would mean "interpreting [the credible interval] like a confidence interval". I am no statistician, but is that necessarily true? I don't see why confidence intervals would suddenly emerge in order to interpret a credible interval.
Second, I am not sure the frequentist interpretation is so problematic. When we interpret the plain-English definition of a probability, are we not mentally simulating repeated trials in order to evaluate something's occurrence? What else could a probability imply? If something has a 20% chance of occurring, then it does not occur 80% of the time, and so we must envision 80% of universes (part of the hypothetical trials) where it does not occur. I don't see any other way around this, frequentist or not.
(Note: I am not a statistician, while the author is, and the above is simply my layman's understanding of the article.)
There's no definition of probability that doesn't involve philosophy or metaphysics [0]. Calling frequentist stats "objective" really bugs me. There is no such thing. Every inference procedure involves subjective choices.
Of course, frequentist and Bayesian stats are completely mathematically equivalent. The choice just affects our mental patterns.
It seems a Bayesian interpretation of probability is more general. The frequency of events over an infinite number of trials is one way of interpreting probability for things that are able to be repeated. But this wouldn’t make sense to apply for an election that is only going to happen once and yet one still wants to be able to quantify uncertainty in these situations.
You’re correct that probability still works in a hypothesized deterministic universe. It’s a point that’s often too often forgotten, causing discussions to go down unnecessary rabbit holes debating foundations of quantum mechanics when discussing the roll of a six sided die.
Statisticians and mathematicians have gone very far down the path you’re discussing, and you might be interested in some sets of axioms that have come up around probability and relaxations of true/false logic.
The Kolmogorov axioms [0] are the “standard” probability axioms, and are phrased in terms of set theory and measure theory (not requiring any mention of physics or a physical universe!).
There are other ways to quantify degree of belief, however, and they are very interesting. Apparently Cox’s theorem [1] justifies a popular probability framework for Bayesians. But there are many more interesting ways to do degree of belief, like Dempster-Schafer theory [2], which I understand to be a plausibility calculus.
Everybody seems to find a single system and decide it’s the only one out there,
> If we live in a materialist deterministic world - which many would cite as an axiom for simplicity - then there really is no probability. Everything happens with 100% certainty.
I don't know who "the many" are - but I thought determinism had already been disproved.
I am not a physicist so I will not go into quantum mechanics - but I will take a simple example from Science Fiction, and that is the Temporal Paradox. https://en.wikipedia.org/wiki/Temporal_paradox
There seems to be a nice duality between Bayesian and Frequentist inference [1]:
Assume that both the system state and the observation are drawn from some joint probability distribution.
There is some function γ of the system state which we seek to estimate. The experimentator applies some decision procedure d to the observation to get their result.
A Frequentist will analyze the situation by conditioning on the the model parameter θ. As a result, we get a single target value γ and probability distributions for the observation and decision, depending on θ.
If d results in an interval, the Frequentist calculates the confidence level as the probability that the descision procedure d produces an interval containing γ, under worst-case assumptions for θ. Unbiasedness of the decision procedure means that γ is indeed the function it estimates the best, and it is not a better estimator for any other function γ'(θ).
A Bayesian, on the other hand, will condition the joint distribution on the observation. Consequently, γ is a random variable, while the observation and decision are known.
If d is an interval, its credibility is the probability that γ is within this interval, given the observation. Optimality of the decision procedure means that no other estimator d' produces better results.
[1]: S. Noorbaloochi, Unbiasedness and Bayes Estimators, users.stat.umn.edu/~gmeeden/papers/bayunb.pdf
> Unbiasedness of the decision procedure means that γ is indeed the function it estimates the best, and it is not a better estimator for any other function γ'(θ).
Are you suggesting that unbiased estimators are necessarily better than biased ones? If so, check out Stein’s phenomenon for a counterexample. It’s common for biased estimators to dominate unbiased ones in terms of error rates. That’s where the bias variance trade off in ML comes from.
Is this at all related to the debate around 538's use of probability in their forecasts? I've been peeking at some of that debate and curious how it will turn out.
This article kind of helps in establishing that it is a hard question to answer. Clearly harder with intervals.
I can't help but think much of this gets overcomplicated because we don't take everything in intervals. In large because it is hard, yes; but we should be more comfortable with things not getting known to an exact value.
[+] [-] 6gvONxR4sf7o|5 years ago|reply
For a 93.75% confidence interval, draw 5 points (iid). If the last four are all greater than the first one, your CI is the whole real number line, otherwise it’s the empty set.
Once you draw some actual data and get a specific interval, you want to ask about some degree of belief that your specific interval contains the actual parameter. In the case that your CI is all numbers, you know for a fact that it contains the true parameter value. In the case that your CI is the empty set, you know for a fact that it doesn’t contain the true parameter value.
I like this CI procedure because it demonstrates two things. 1) The kind of reasoning for going forward from an unknown parameter to a random interval is very different than what you have working backwards from a specific interval back to the parameter. That asymmetry can be WEIRD. 2) The weirdness is possible if you limit yourself to only the CI definition, meaning that if you want it to be useful, you need something that rules out weird shit like my example.
The properties of specific CI procedures people actually use are generally much much better than what is allowed by the definition of a CI. If you want useful reasoning backwards from the interval, don’t try to reason solely from the definition of a CI.
[+] [-] hexane360|5 years ago|reply
[+] [-] radford-neal|5 years ago|reply
I'm not clear on what it is that you [the post's author] don't understand about interpretation of Bayesian credible intervals.
Both "objective" and "subjective" Bayesians interpret them as degrees of belief - that, for instance, one would use to make bets (supposing, of course, that you have no moral objection to gambling, etc.).
The difference is that that "objective" Bayesians think that one can formalize "what one knows" and then create an "objective" prior on that basis, that everyone "with the same knowledge" would agree is correct. I don't buy this. Formalizing "what one knows" by any means other than specifying a prior (which would defeat the point) seems impossible. And supposing one did, there is disagreement about what an "objective" prior based on it would be. To joke, "The best thing about objective priors is there are so many of them to choose from!".
Many simple examples can illustrate that the objective Bayesian framework just isn't going to work. One example is the one-way random effects model, where the prior on the variance of the random effects will sometimes have a large influence on the inference (eg, on the posterior probability that the overall mean is positive), but where there is no sensible "objective" prior - you just have to subjectively specify how likely it is that the variance is very close to zero. Another even simpler example is inference for theta given an observation x~N(theta,1), when it is known (with certainty) that theta is non-negative, and the observed x is -1. There's just no alternative to subjectively deciding how likely a priori it is that theta is close to zero.
Frequentist methods also don't give sensible answers in these examples. Subjective Bayesianism is the only way.
[+] [-] alexpetralia|5 years ago|reply
If we live in a materialist deterministic world - which many would cite as an axiom for simplicity - then there really is no probability. Everything happens with 100% certainty.
Then, what is probability? If everything will happen with 100% certainty, but probability certainly appears to exist, then probability must reflect something about our information about something occurring.
The author refers to two foundational approaches to our state of knowledge. The first is the objectivist approach, which states that everyone who has the same state of knowledge about a system will evaluate the same probability of something occurring. The second is the subjectivist approach, which states that a given individual with a certain state of knowledge will evaluate some probability of something occurring. To me, these appear to be the same thing except insofar as the former requires a consensus of many while the latter a consensus of one.
The author asks how we might actually define Bayesian probability without resorting to the frequentist approach (i.e. hypothetically simulating many trials of the same event, however infrequent in reality it may be).
First, he says this would mean "interpreting [the credible interval] like a confidence interval". I am no statistician, but is that necessarily true? I don't see why confidence intervals would suddenly emerge in order to interpret a credible interval.
Second, I am not sure the frequentist interpretation is so problematic. When we interpret the plain-English definition of a probability, are we not mentally simulating repeated trials in order to evaluate something's occurrence? What else could a probability imply? If something has a 20% chance of occurring, then it does not occur 80% of the time, and so we must envision 80% of universes (part of the hypothetical trials) where it does not occur. I don't see any other way around this, frequentist or not.
(Note: I am not a statistician, while the author is, and the above is simply my layman's understanding of the article.)
[+] [-] cultus|5 years ago|reply
Of course, frequentist and Bayesian stats are completely mathematically equivalent. The choice just affects our mental patterns.
[0] https://en.wikipedia.org/wiki/Probability_interpretations#Ph...
[+] [-] outlace|5 years ago|reply
[+] [-] 6gvONxR4sf7o|5 years ago|reply
Statisticians and mathematicians have gone very far down the path you’re discussing, and you might be interested in some sets of axioms that have come up around probability and relaxations of true/false logic.
The Kolmogorov axioms [0] are the “standard” probability axioms, and are phrased in terms of set theory and measure theory (not requiring any mention of physics or a physical universe!).
There are other ways to quantify degree of belief, however, and they are very interesting. Apparently Cox’s theorem [1] justifies a popular probability framework for Bayesians. But there are many more interesting ways to do degree of belief, like Dempster-Schafer theory [2], which I understand to be a plausibility calculus.
Everybody seems to find a single system and decide it’s the only one out there,
[0] https://en.wikipedia.org/wiki/Probability_axioms
[1] https://en.wikipedia.org/wiki/Cox's_theorem
[2] https://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory
[+] [-] maire|5 years ago|reply
I don't know who "the many" are - but I thought determinism had already been disproved.
I am not a physicist so I will not go into quantum mechanics - but I will take a simple example from Science Fiction, and that is the Temporal Paradox. https://en.wikipedia.org/wiki/Temporal_paradox
[+] [-] yccs27|5 years ago|reply
Assume that both the system state and the observation are drawn from some joint probability distribution. There is some function γ of the system state which we seek to estimate. The experimentator applies some decision procedure d to the observation to get their result.
A Frequentist will analyze the situation by conditioning on the the model parameter θ. As a result, we get a single target value γ and probability distributions for the observation and decision, depending on θ. If d results in an interval, the Frequentist calculates the confidence level as the probability that the descision procedure d produces an interval containing γ, under worst-case assumptions for θ. Unbiasedness of the decision procedure means that γ is indeed the function it estimates the best, and it is not a better estimator for any other function γ'(θ).
A Bayesian, on the other hand, will condition the joint distribution on the observation. Consequently, γ is a random variable, while the observation and decision are known. If d is an interval, its credibility is the probability that γ is within this interval, given the observation. Optimality of the decision procedure means that no other estimator d' produces better results.
[1]: S. Noorbaloochi, Unbiasedness and Bayes Estimators, users.stat.umn.edu/~gmeeden/papers/bayunb.pdf
[+] [-] 6gvONxR4sf7o|5 years ago|reply
Are you suggesting that unbiased estimators are necessarily better than biased ones? If so, check out Stein’s phenomenon for a counterexample. It’s common for biased estimators to dominate unbiased ones in terms of error rates. That’s where the bias variance trade off in ML comes from.
https://en.wikipedia.org/wiki/Stein's_example
[+] [-] taeric|5 years ago|reply
This article kind of helps in establishing that it is a hard question to answer. Clearly harder with intervals.
I can't help but think much of this gets overcomplicated because we don't take everything in intervals. In large because it is hard, yes; but we should be more comfortable with things not getting known to an exact value.