top | item 39485160

(no title)

jawarner | 2 years ago

Isn't that Edwin T. Jaynes example just p-hacking? If only 1 out of 100 experiments produces a statistically significant result, and you only report the one, I would intuitively consider that evidence to be worth less. Can someone more versed in Bayesian statistics better explain the example?

discuss

order

skulk|2 years ago

I find the original discussion to be far more interesting than whatever I just read in TFA: https://books.google.com.mx/books?id=sLz0CAAAQBAJ&pg=PA13&lp...

abeppu|2 years ago

> One who thinks that the important question is: "Which quantities are random?" is then in this situation. For the first researcher, n was a fixed constant, r was a random variable with a certain sampling distribution. For the second researcher, r/n was a fixed constant (approximately), and n was the random variable, with a very different sampling distribution. Orthodox practice will then analyze the two experiments in different ways, and will in general draw different conclusions about the efficacy of the treatment from them.

But so then the data _are_ different between the two experiments, because they were observing different random variables -- so why is it concerning if they arrive at different conclusions? In fact, the _fact that the 2nd experiment finished_ is also an observation on its own (e.g. if the treatment was in fact a dangerous poison, perhaps it would have been infeasible for the 2nd researcher to reach their stopping criteria).

usgroup|2 years ago

Yeah generally Jaynes book is very nice and easy to read for this sort of material.

Terr_|2 years ago

I think the point is that the different planned stopping rules of each researcher--their subjective thoughts--should not affect what we consider the objective or mathematical significance of their otherwise-identical process and results. (Not unless humans have psychic powers.)

It's illogical to deride one of those two result-sets as telling us less about the objective universe just because the researcher had a different private intent (e.g. "p-hacking") for stopping at n=100.

_________________

> According to old-fashioned statistical procedure [...] It’s quite possible that the first experiment will be “statistically significant,” the second not. [...]

> But the likelihood of a given state of Nature producing the data we have seen, has nothing to do with the researcher’s private intentions. So whatever our hypotheses about Nature, the likelihood ratio is the same, and the evidential impact is the same, and the posterior belief should be the same, between the two experiments. At least one of the two Old Style methods must discard relevant information—or simply do the wrong calculation—for the two methods to arrive at different answers.

lalaithion|2 years ago

If you have two researchers, and one is "trying" to p-hack by repeating an experiment with different parameters, and one is trying to avoid p-hacking by preregistering their parameters, you might expect the paper published by the latter one to be more reliable.

However, if you know that the first researcher just happened to get a positive result on their first try (and therefore didn't actually have to modify parameters), Bayesian math says that their intentions didn't matter, only their result. If, however, they did 100 experiments and chose the best one, then their intentions... still don't matter! but their behavior does matter, and so we can discount their paper.

Now, if you _only_ know their intentions but not their final behavior (because they didn't say how many experiments they did before publishing), then their intentions matter because we can predict their behavior based on their intentions. But once you know their behavior (how many experiments they attempted), you no longer care about their intentions; the data speaks for itself.

usgroup|2 years ago

Well no because it’s talking about either a fixed sample size or stopping when a % total is reached. Neither imply a favourable p-value necessarily.

I think the author means to say that it’s two methods incidentally equivalent in the data they collect that may draw different conclusions based on their initial assumptions. Question is how do you make coherent sense of it.

At level 1 depth it’s insightful.

At level 2 depth it’s a straw man.

At level 3 depth, just keep drinking until you’re back at level 1 depth.

tech_ken|2 years ago

> The other ... decided he would not stop until he had data indicating a rate of cures definitely greater than 60%

I believe that "definitely greater than 60%" is supposed to imply that the researcher is stopping when the p-value of their HA (theta>=60%) is below alpha, so an optional stopping (ie. "p-hacking") situation.