top | item 6675843

Statistics Done Wrong – The woefully complete guide

331 points| bowyakka | 12 years ago |refsmmat.com | reply

70 comments

order
[+] capnrefsmmat|12 years ago|reply
Hey everyone, I'm the author of this guide. It's come full circle -- I posted it a week ago in a "what are you working on?" Ask HN post, someone posted it to Metafilter and reddit, and it made its way to Boing Boing and Daily Kos before coming back here.

I'm currently working on expanding the guide to book length, and considering options for publication (self-publishing, commercial publishers, etc.). It seems like a broad spectrum of people find it useful. I'd appreciate any suggestions from the HN crowd.

(A few folks have already emailed me with tips and suggestions. Thanks!)

(Also, I'm sure glad I added that email signup a couple weeks ago)

[+] craigyk|12 years ago|reply
As a scientist I think you are addressing a very important problem with this book. I've taken two statistics classes, one graduate level, and even I am plagued with doubt as to wether the statistics I've used have all been applied and interpreted "correctly". That said, I think the recent spate of "a majority of science publications are wrong" stories is incredible hyperbole. Is it the raw data that is wrong (fabricated)? The main conclusions? One or two minor side points? What if the broad strokes are right but the statistics are sloppy?

People also need to realize what while the Discussion and Conclusion section of publications may often read like statements of truth, they're usually just a huge lump of spinoff hypotheses in prose form. Despite my frequent frustrations with the ways science could be better, the overall arrow of progress points in the right direction. Science isn't a process where the goal is to ensure that 100% of what gets published is correct, but whereby previous assertions can be refuted and corrected.

Edit:

To be more specific, I think the statement in your Introduction is overly critical: "The problem isn’t fraud but poor statistical education – poor enough that some scientists conclude that most published research findings are probably false". I would change it to say: "conclude that most published research findings contain (significant) errors", or something along those lines.

[+] edwintorok|12 years ago|reply
I like how you present various ways that you can make mistakes with statistics.

One thing that is missing is like a Summary/Checklist chapter that tells you what you SHOULD do, in a few common scenarios to avoid all the mistakes presented in the previous chapter. I know its not that simple, and it depends a lot on how you are testing and what you're actually trying to achieve, but a few examples wouldn't hurt.

For example: I have two sets of measurements, and I want to know: * is there a statistically significant difference between them * if yes, how much is the difference

A somewhat simplistic way I'd do that is to do a two-sample t-test for question #1, and to compute statistics for the difference between the samples (mean, median, confidence intervals), but doing just that I might've already committed some of the mistakes that your site warns about, for example I completely disregarded the power of the test.

FWIW I like parts of this book on statistics which focuses on statistics in the domain of computer systems / network, although it is rather too long: http://perfeval.epfl.ch/

[+] mey|12 years ago|reply
Is there any way to print the entire book as a document? Do you accept donations?
[+] chris_va|12 years ago|reply
I've been reading this on/off for the last day. One random UX suggestion: Don't use black for your text, use something close (e.g. #333).

There are a million other UX tips that you can probably get from a real expert, but the black one I noticed.

[+] chris_va|12 years ago|reply
Now if you could just send a copy of the book to every newspaper in the US...

I would donate to that goal.

[+] jrochkind1|12 years ago|reply
awesome, thanks for writing this, and for whoever posted it here.
[+] Tycho|12 years ago|reply
If I was a billionaire, I would set up some sort of screening lab for scientific/academic/research papers. There would be a statistics division for evaluating the application of statistical methods being used; a replication division for checking that experiments do actually replicate; and a corruption division for investigating suspicious influences on the research. It would be tempting to then generate some sort of credibility rating for each institution based on the papers they're publishing, but that would probably invite too much trouble, so best just to publish the results and leave it at that.

Arguably this would be a greater benefit to humanity than all the millions poured charitably into cancer research etc.

[+] davmre|12 years ago|reply
Something like that idea has actually already been the inspiration for at least one startup: MetaMed (http://en.wikipedia.org/wiki/MetaMed, http://nymag.com/health/bestdoctors/2013/metamed-personalize...) does meta-level analysis of the medical literature to determine which treatments seem effective for rare conditions, taking into account the sample size, statistical methodology, funding sources, etc. of each study.

Of course, medicine might be unique as a domain in which individuals are willing to pay vast sums of money to obtain slightly more trustworthy research conclusions, and the profit motive has obvious conflicts with "benefit to humanity" (if someone pays you to research a treatment for their disease, do you post the findings when done? Or hold them privately for the next person with the same problem?). But maybe there are other domains in which the market could support a (non-billionaire's) project for better-validated research.

[+] neoterics|12 years ago|reply
Doesn't the Cochrane Institute already do something similar?

They perform meta analysis of studies and talk about the validity of their statistical methods. http://summaries.cochrane.org/

Read about it in the book Bad Science by Ben Goldacre.

[+] bjoernd|12 years ago|reply
SIGMOD (a database research conference) has set up a reproducibility committee [1]. Their goal is to ensure that the results can be reproduced by someone from the outside. If they succeed, you get an additional label for your graphs saying "Approved by the SIGMOD reproducibility committee."

Notably, this is easier in computer science as you don't need to wait for hundreds of patients to turn up having a certain condition.

[1] http://www.sigmod.org/2012/reproducibility.shtml

[+] mathattack|12 years ago|reply
I think you'd get very depressed just by the statistics, let alone the reproduction. Especially if you included journals of econometrics.
[+] anaphor|12 years ago|reply
Fuck yes, I would love to help do something like that. I'm not a statistician though, so I'm probably not very qualified.
[+] daughart|12 years ago|reply
As a graduate student in the life sciences, I was required to take a course on ethical conduct of science. This gave me the tools to find ethical solutions to complex issues like advisor relations, plagiarism, authorship, etc. We were also taught to keep good notes and use ethical data management practices - don't throw out data, use the proper tests, etc. Unfortunately, we weren't really taught how to do statistics "the right way." It seems like this is equally important to ethical conduct of science. Ignorance is no excuse for using bad statistical practices - it's still unethical. By the way, this is at (what is considered to be) one of the best academic institutions in the world.
[+] dbaupp|12 years ago|reply
> Unfortunately, we weren't really taught how to do statistics "the right way."

Learning the right way takes a lot of work, there's a lot of ways to analyse things, each one wrong/right in different situations. (Even teaching something as "simple" as the correct interpretation of a p-value is hard.)

[+] jimmar|12 years ago|reply
One of the many challenges in science is that there is no publication outlet for experiments that just didn't pan out. If you do an experiment and don't find statistical significance, there aren't many journals that want to publish your work. That alone helps contribute to a bias toward publishing results that might have been found by chance. If 20 independent researchers test the same hypothesis, and there is no real effect, 1 might find statistical significance. That 1 researcher will get published. The 19 just move on.
[+] RogerL|12 years ago|reply
This is called the "file drawer effect".
[+] jmatt|12 years ago|reply
Norvig's "Warning Signs in Experimental Design and Interpretation" is also worth reading and covers the higher level problem of bad research and results. Including mentioning bad statistics.

http://norvig.com/experiment-design.html

[+] Paradigma11|12 years ago|reply
Quite a few years ago i devised an ambitious method to achieve significance while sitting through another braindead thesis presentation (psychology):

If you are interested in the difference of a metric scaled quantity between two groups do the following:

1.) Add 4-5 plausible control variables that you do not document in advance (questionaire, sex, age...).

2.) Write a r-script that helps you do the following: Whenever you have tested a person increment your dataset with the persons result and run a:

t-test

u-test

ordinal logistic regression over some possible bucket combinations.

3.) Do this over all permutations of the control variables. Have the script ring a loud bell when significance is achieved so data collection is stopped immediately. An added bonus is that you will likely get a significant result with a small n which enables you to do a reversed power analysis.

Now you can report that your theoretical research implied a strong effect size so you choose an appropriate small n which, as expected, yielded a significant result ;)

[+] ultrasaurus|12 years ago|reply
One thing that constantly saddens me about statistics is that a large amount of energy is expended using is almost correctly to "prove" something that was already the gut feel. Even unbiased practitioners can be lead astray [1] but standards on how not to intentionally lie with statistics are very useful.

[1] http://euri.ca/2012/youre-probably-polluting-your-statistics...

[+] onion2k|12 years ago|reply
There's no way to tell whether or not that "gut feel" is accurate without proof. Often it's right, but occasionally it's very, very wrong (cancer risk and Bayes theory provides a good illustration: http://betterexplained.com/articles/an-intuitive-and-short-e...). Consequently it's still worthwhile proving things even when they're seemly obvious.
[+] Homunculiheaded|12 years ago|reply
One of the functions of the prior in Bayesian analysis is to incorporate this "gut feel" into your calculations. Given that you have a strong prior belief and weak data (ie not much data) your belief will strongly influence the posterior. As you collect more data your belief will be increasingly overridden by the reality.
[+] tokenadult|12 years ago|reply
I see the author of this interesting site is active in this thread. You may already know about this, but for onlookers I will mention that Uri Simonsohn and his colleagues

http://opim.wharton.upenn.edu/~uws/

have published a lot of interesting papers advising psychology researchers how to avoid statistical errors (and also how to detect statistical errors, up to and including fraud, by using statistical techniques on published data).

[+] capnrefsmmat|12 years ago|reply
Thanks. I had seen some of his work, but browsing his list of publications I found a few more interesting papers. I've already worked one into my draft.
[+] glutamate|12 years ago|reply
One way to do statistics less wrong is to move from statistical testing to statistical modelling. This is what we are trying to support with BayesHive at https://bayeshive.com

Other ways of doing this include JAGS (http://mcmc-jags.sourceforge.net/) and Stan (http://mc-stan.org/)

The advantage of statistical modelling is that it makes your assumptions very explicit, and there is more of an emphasis on effect size estimation and less on reaching arbitrary significance thresholds.

[+] thenomad|12 years ago|reply
BayesHive is very interesting! I couldn't find any details on pricing, though?
[+] mathattack|12 years ago|reply
I like that he references Huff's "How to lie with statistics" in the first sentence of the intro. That was the book that came to mind when I saw the subject. Also reminds me of the Twain quote, "There are three types of lies: Lies, Damned Lies, and Statistics."

But despite this, statistics done well are very powerful.

[+] neuralk|12 years ago|reply
With respect to that Twain/Disraeli quote, my friend who is a professor of statistics tells me that he cannot go to a party and say what he does for a living without someone repeating it smirkingly.
[+] WettowelReactor|12 years ago|reply
What is puzzling to me is that many of the statistical errors showing up in all the science literature are well understood. The problem is not all the junk science that is being generated but that the current tools and culture are not readily naming and shamming these awful studies. Just as we have basic standards in other fields such as GAAP in finance why can' we have an agreed upon standard for data collection and analysis of scientific data?
[+] Aloisius|12 years ago|reply
If you want to see truly egregious uses of statistics, take a look at any paper on diet or nutrition. Be prepared to be angry.

At this point, if someone published a study stating that we needed to eat not to die, I'd be skeptical of it.

[+] capnrefsmmat|12 years ago|reply
You might enjoy this:

Schoenfeld, J. D., & Ioannidis, J. P. A. (2013). Is everything we eat associated with cancer? A systematic cookbook review. American Journal of Clinical Nutrition, 97(1), 127–134. doi:10.3945/ajcn.112.047142

They did a review of cookbook ingredients and found that most of them had studies showing they increased your risk of cancer, while also having studies showing they decrease your risk of cancer.

I think bacon was a notable exception -- everyone agreed that it increases your cancer risk.

[+] ambiate|12 years ago|reply
The greatest problem of statistical analysis is throwing out observations which do not fit the bill. All analysis should be thoroughly documented with postmortems.
[+] VladRussian2|12 years ago|reply
whenever there is discussion about statistics role in science (sometimes even going as far as crossing into how science is statistics) i always remember this:

http://en.wikipedia.org/wiki/Oil_drop_experiment#Fraud_alleg...

[+] evacuationdrill|12 years ago|reply
More revealing than the fraud allegations is the next section discussing the way results had pressure from his and other experiments' results for years, delaying our arrival at a more precise measurement. It reads as though it wasn't so much malice as it was self-doubt that lead to the scientists' actions.
[+] knassy|12 years ago|reply
That was an excellent read. Thank you. I'll admit I'm often reluctant to read to much in to data I deal with daily (web analytics), as I'm unsure of how to measure its significance accurately. I'm going to dive in and learn more about this.
[+] Anon84|12 years ago|reply
"Statistical significance does not mean your result has any practical significance."