It would be interesting to know how reproducible the reproducibility test turns out to be.
Edit: Just to expand a bit --
Suppose I flip a fair coin 100 times, and publish each result as its own god-given Truth. Now somebody comes along and questions my 100 Truths, so they flip a coin 100 times; The expectation value is that 50 of my Truths reproduce. Naturally there is a sigma (of about 10 I think), so the reproducibility study shows 50 ± 10.
The linked reproducibility study is right around expection - sigma. I realize that's numerology, but kinda funny. Nevertheless, my original point: The reproducibility study should be run several times to understand the random nature of the results. (Do the same studies reproduce their results? Or is it a different 50-ish results?)
What you're asking is basically the concept of statistical power. Assume the original study found an effect size E, and we take that as the truth. How likely is the replication attempt to find a statistically significant effect?
The Reproducibility Project calculated the sample sizes necessary in advance, so if the effects are the size the original researchers claim, they'd have good power to detect them.
Their power could be worse than they expect, though; pioneering studies tend to overestimate effect sizes, because their sample sizes are too small and they filter for statistical significance. I call the problem truth inflation: http://www.statisticsdonewrong.com/regression.html#truth-inf...
Anyway: understanding the random nature of results is exactly the job of statistics, and the Reproducibility Project researchers are being very careful with their statistics.
tldr; Efforts to reproduce 100 psychological findings. Only 39 were reproducible. And 61 were not.
Hal Pashler, “A lot of working scientists assume that if it’s published, it’s right,” he says. “This makes it hard to dismiss that there are still a lot of false positives in the literature.”
* "Of the 61 non-replicated studies, scientists classed 24 as producing findings at least “moderately similar” to those of the original experiments, even though they did not meet pre-established criteria"
* "Daniele Fanelli, who studies bias and scientific misconduct at Stanford University in California, says ... psychology does not necessarily lag behind ... other sciences. ... earlier studies have suggested that reproducibility rates in cancer biology and drug discovery could be even lower"
There seems to be two factors at play here that are creating almost nonsensical results.
1. The results, when filtered through the replicability guidelines rendered a clear verdict - 61 to 39 against.
2. The question of "how closely did the findings resemble the original study?" flips the findings. Moderately similar findings are the majority - 58 to 42 in the other direction.
How can you have a study that has "virtually identical" findings that doesn't replicate the original?
should be pointed out that just as 100 findings doesn't mean that 100 are true, it's also the case that 61 non-reproducible doesn't mean that 61 are false
Thanks for that. That's Psychology in a nutshell for you folks, this amazing "science". Now let's wait for a similar study for economics, the other amazing "science".
Great to see psychology going after this in an open forum. The current publishing systems are likely to generate some false positives even assuming no bad actors. Replication by independent 3rd parties is a great way to confirm important results. I wish the nutrition community would do this for diets and nutrition before changing the guidelines all the time.
They've been transparent from the beginning, by posting study protocols and analysis plans. You can think of this like posting a physics preprint on the arXiv before peer review. Nothing wrong with that.
[+] [-] jpmattia|11 years ago|reply
Edit: Just to expand a bit --
Suppose I flip a fair coin 100 times, and publish each result as its own god-given Truth. Now somebody comes along and questions my 100 Truths, so they flip a coin 100 times; The expectation value is that 50 of my Truths reproduce. Naturally there is a sigma (of about 10 I think), so the reproducibility study shows 50 ± 10.
The linked reproducibility study is right around expection - sigma. I realize that's numerology, but kinda funny. Nevertheless, my original point: The reproducibility study should be run several times to understand the random nature of the results. (Do the same studies reproduce their results? Or is it a different 50-ish results?)
[+] [-] capnrefsmmat|11 years ago|reply
The Reproducibility Project calculated the sample sizes necessary in advance, so if the effects are the size the original researchers claim, they'd have good power to detect them.
Their power could be worse than they expect, though; pioneering studies tend to overestimate effect sizes, because their sample sizes are too small and they filter for statistical significance. I call the problem truth inflation: http://www.statisticsdonewrong.com/regression.html#truth-inf...
Anyway: understanding the random nature of results is exactly the job of statistics, and the Reproducibility Project researchers are being very careful with their statistics.
[+] [-] heimatau|11 years ago|reply
Hal Pashler, “A lot of working scientists assume that if it’s published, it’s right,” he says. “This makes it hard to dismiss that there are still a lot of false positives in the literature.”
[+] [-] hackuser|11 years ago|reply
* "Of the 61 non-replicated studies, scientists classed 24 as producing findings at least “moderately similar” to those of the original experiments, even though they did not meet pre-established criteria"
* "Daniele Fanelli, who studies bias and scientific misconduct at Stanford University in California, says ... psychology does not necessarily lag behind ... other sciences. ... earlier studies have suggested that reproducibility rates in cancer biology and drug discovery could be even lower"
[+] [-] nzseth|11 years ago|reply
1. The results, when filtered through the replicability guidelines rendered a clear verdict - 61 to 39 against.
2. The question of "how closely did the findings resemble the original study?" flips the findings. Moderately similar findings are the majority - 58 to 42 in the other direction.
How can you have a study that has "virtually identical" findings that doesn't replicate the original?
[+] [-] plg|11 years ago|reply
[+] [-] innguest|11 years ago|reply
[+] [-] chuckcode|11 years ago|reply
[+] [-] Retric|11 years ago|reply
[+] [-] capnrefsmmat|11 years ago|reply
[+] [-] unknown|11 years ago|reply
[deleted]