top | item 38749833

(no title)

vismwasm | 2 years ago

I never really got Bayesian statistics to be honest.

- When sample size grows, frequentist and bayesian (if the prior is not too restrictive) point estimates seem to converge to each other anyway

- The distribution of your point estimate (frequentist) vs. the estimated distribution (bayesian) also don't seem to differ too much either

- When the sample size is small the Bayesian prior dominates

- Interestingly, when I see Bayesians simulate random data (to introduce the concepts on this data) they usually assume a true parameter value. E.g. when sampling from Y = a + b * X + e, they'll assume fixed, true values of a and b and not random variables - which is a frequentist assumption! So far I've never seen e.g. b being sampled from Normal(mu=2, sigma=1) instead of just setting b=2.

- The frequentist assumption of a true population value which we try to estimate just makes sense to me. For example there is a true mean income over the working population. It's not a random variable but a fixed value which can be computed if we just asked every single working person for their income and then compute the mean over all values.

I tried getting into Bayesian stats but honestly it just seems overkill for most cases. For a simple regression computing b_hat = inv(XX')Y is just faster and easier than numerically sampling traces. Bayesian forces you to think about the data generating process - I appreciate that, but you need to the same when it comes to frequentist stats, it's just a little less obvious.

discuss

order

enriquto|2 years ago

> When sample size grows, frequentist and bayesian [...] estimates seem to converge to each other anyway

Yes. And so? Bayesians would argue (and I quote) that "the interesting limit in statistics is when the number of samples tends to one. The limit when the number of samples tends to infinity is completely useless."

> I tried getting into Bayesian stats but honestly it just seems overkill for most cases.

There are 3 black balls and 7 white balls in an opaque bag. How likely is it to pick a black ball? Bayesian statistics gives a straightforward answer (you just assume an uninformative prior and perform a computation). But frequentist statistics starts to argue about an infinite number of replicas of your own universe and other nonsensical constructions. Not sure that the Bayesian approach is overkill in that case...

Dylan16807|2 years ago

> Yes. And so? Bayesians would argue (and I quote) that "the interesting limit in statistics is when the number of samples tends to one. The limit when the number of samples tends to infinity is completely useless."

The "and so?" is answered right after that. The prior dominates, which is a bad thing.

nextos|2 years ago

Bayesian statistics, the way Andrew Gelman practices it, comes naturally when you are interested in generative models of data. You can still use maximum likelihood estimates, but these become fragile when you have hierarchical / multilevel models.

Multilevel models are fantastic to address a problem that is often ignored by frequentist approaches, the need for shrinkage and information sharing. This pops up all the time in modern statistics. For example, if you test 1000 hypotheses, calculating p-values and adjusting these with some multiplicity correction scheme is not sufficient.

You should borrow information across random variables with a multilevel model to avoid estimating exaggerated effects in tests whose outcome is deemed to be significant. Andrew Gelman's post is concerned with this topic.

Another point is that Gelman et al. use weakly informative hyperpriors. These are not really subjective. If anything, they usually regularize solutions by pushing effects towards zero. Plus, on multilevel models, priors are only needed on hyperparameters.

SubiculumCode|2 years ago

I use mixed level models for longitudinal analysis pretty regularly. There the point has been to account for correlated dependent observations (e.g. repeated variables within a participant.

However it seems that you are suggesting another use. If I have 10 cognitive measures each measured once in my subjectd, the default has been to do a multiple comparison correction, either FDR or FWER on 10 tests. We know that the 10 tests are not truly independent, so Bonferroni is probably too conservative.

It seems here you suggest running this with test being a random effect. I've seen this approach with item level data in a task, but I didn't really think to do it when the tests are not from the same battery, construct. And more to the point, this fixed effect model would be of no particular interest, while random effect CIs are difficult to estimate. So I am left a bit confused.

im3w1l|2 years ago

I think the attraction of Bayesianism is kind of philosophical / aesethetical, it is is principled and sound and beautiful approach. It's kinda nice that it kinda extends and translates occams razors into numbers.

Yes frequentist statistics work very well in practice, but it's a bit adhoc and suffers from various problems like say if you estimate velocity and estimate kinetic energy, you get values that are incompatible which is kinda ugly and non-intuitive and makes you want to dig deeper into how such a thing happened.

Bayesianism has the answers.

Also sometimes it really does matter like in medicine, where some conditions have a very low prior probability.

jgalt212|2 years ago

> but it's a bit adhoc

That's how many people feel about Bayesian methods when trying to pick an initial prior.

pks016|2 years ago

> The distribution of your point estimate (frequentist) vs. the estimated distribution (bayesian)

Ideally one should use the whole posterior distribution of your model parameters which is not the case for point estimates.

>So far I've never seen

Because people are lazy.

Bayesian works great if you have great knowledge in your field and you can fine tune everything. Frequentist stats just works and easily interpretable but easy to make mistakes esp. when starting out.

ivansavz|2 years ago

> Ideally one should use the whole posterior distribution of your model parameters which is not the case for point estimates.

This is a historical issue because of some hard-headed frequentist founders, but in modern days the frequentist concept of confidence distribution is gaining acceptance, which is the proper frequentist equivalent of the posterior, so this distinction between Bayesian and Frequentist is disappearing.

Rather than giving specific point estimates or interval estimates, calculating a frequentist confidence distribution allows you to compute confidence intervals for all possible confidence levels, just like the posterior does. See this excellent review paper for more info on this: https://statweb.rutgers.edu/mxie/RCPapers/insr.12000.pdf

The key insights is that a confidence distribution is an estimator for the parameter of interest, instead of an inherent distribution of the parameter.

KRAKRISMOTT|2 years ago

The Bayesian formula is dreadfully useful in machine learning, in the modeling of generative problems. However because integration (in calculus) is computationally intractable, we usually have to use approximations instead of true bayesian stats.

clircle|2 years ago

> Interestingly, when I see Bayesians simulate random data (to introduce the concepts on this data) they usually assume a true parameter value. E.g. when sampling from Y = a + b * X + e, they'll assume fixed, true values of a and b and not random variables - which is a frequentist assumption! So far I've never seen e.g. b being sampled from Normal(mu=2, sigma=1) instead of just setting b=2.

The Bayesian philosophy of "random parameters" does not mean that Bayesian methods cannot be assessed for frequentist properties or compared against frequentist procedures.