top | item 18079134

Congratulations, Your Study Went Nowhere

106 points| cpeterso | 7 years ago |nytimes.com | reply

61 comments

order
[+] carlmr|7 years ago|reply
>Researchers should embrace negative results instead of accentuating the positive

The problem starts here. Most researchers would love to show their negative results, they're well aware of the problem. But they need to publish, they need grants, they need money which is outcome independent. They need an alternative reality where negative results and reproducibility studies make money.

As it stands you get money for publishing praise, a lot of industrially sponsored medical research is just "9/10 doctors recommend x" advertising.

We need to instate the rule that only preregistered studies can be published or at least used by the FDA as for decisions.

Right now you can conduct 1000 studies until you find a handful which randomly show the result you want.

[+] 08-15|7 years ago|reply
> We need to instate the rule that only preregistered studies can be published or at least used by the FDA as for decisions.

That won't help, not by itself. Preregistration helps with "p hacking", the practice of moving the goal posts until some result become significant. A bigger problem remains.

Medicine and Biology accepted the framework of Statistical Hypothesis Inference Testing, where a null hypothesis is rejected at some p-value, usually 0.05. Ignoring many faults of this framework (for example that the alternative hypothesis is not tested at all, that it is a bad caricature of the bad statistics R. A. Fisher introduced, that the numerical outcomes depend on the probability of events that didn't happen, etc.), the logic is that you limit the false positive rate to below the p-value threshold.

Unfortunately, journals consistently reject submission, unless there is a significant p-value somewhere in there. ("Highly significant" is better, even though the term makes no sense.) So if plenty of researchers investigate some random nonsense where the null hypothesis is actually true, 100% of published studies will be spurious results, preregistration or not.

To make progress, this kind of statistics has to go. Journals would have to change their policies, which probably means peer reviewers have to change theirs. I have no idea how to make that happen.

[+] the6threplicant|7 years ago|reply
The number of discussions I have had with the staff in mathematics and computer science about the need for such journals would last years. Everyone pretty much wants them because they know the need of saying "hey I already did this and we got nowhere - try something else" is so crucial for science and mathematics.
[+] denzil_correa|7 years ago|reply
> The problem starts here. Most researchers would love to show their negative results

There's another core issue which most people miss. In science, you need to publish results AND explain WHY. That last part of WHY is crucial. The amount of work required to prove a negative - or explaining WHY - is very very high and in many cases close to impossible.

[+] maxxxxx|7 years ago|reply
"We need to instate the rule that only preregistered studies can be published or at least used by the FDA as for decisions."

I like that idea a lot. And if the study finds something unexpected there should be a thorough explanation why they deviated from the original plan.

[+] snarf21|7 years ago|reply
Like all human systems, you get what you optimize for.

Universities want paid grants for money and prestige, so the that's what the researchers go find. I've said this before on here but the solution to this problem is the universities like Harvard and foundations like RWJ which have near infinite money and don't need to operate like this. They need to push forward real research that is reproducible (maybe several times) and shows whatever the data really shows. The researchers who create reproducible studies get bonuses. Again, you get what you optimize for.

[+] petercooper|7 years ago|reply
I got put off science at high school when I'd get marked down for having "wrong" but truthful results to physical experiments, so I ended up going around and averaging out other people's results (not their conclusions or write-ups, just the raw numerical measurements) which yielded better marks. I wonder if an element of that is at play in corners of the broader scientific community.
[+] Leszek|7 years ago|reply
High school physics experiments are somewhat of a different case, as "wrong but truthful" results are probably evidence of an incorrectly performed experiment, performing which correctly is part of what one is assessed on. That, or bad luck, but presumably the experiments are sufficiently simple that performed correctly they all but guarantee a "correct" result.
[+] air7|7 years ago|reply
I've had an idea about this problem that I'd be happy to hear your thoughts about:

Simply put, a promise for an independant future reproduction study should be part of the published paper.

Once a researcher achieves a publishable result, she looks for a peer-researcher that will commit to perform a pre-determined reproduction study in the near future. This promise is written in the original paper.

This ensures that a negative reproduction would definitely be published. It incentivises the original researcher to not mess too much with the data post-hoc, and to be as helpful as possible to their reproducing peer. The peer gets a citing before writing the paper, and all the help they'd want to get the study done as quickly and easily as possible (q&a, analysis code etc).

[+] Bartweiss|7 years ago|reply
This is a really interesting idea.

Adversarial collaboration has produced some interesting results, and is probably our best bet for settling arguments on topics where results are consistently rejected over methodology. It's done good work on ESP, and shows some promise on priming if anyone will actually sign on.

But that basically requires finding fields with conflicting viewpoints and well-understood methodology spats, which means established debates. This idea would get the same effects - experiment design that's accessible and verifiable - on untested topics, while simultaneously baking replication attempts into initial publication.

The more I think about this, the more impressed I am. It guarantees replications, it guarantees data and methodology availability, and it makes replications a publication-worthy step by making them part of the initial 'success'. It doesn't solve the file drawer or salami slicing problems, but it does huge work to sidestep them by forcing another p<.05 which isn't subject to them. And wildest of all, it might even be acceptable to journals in a way that "publish replications and null results" isn't. Thanks for the most creative approach to the replication crisis I've heard in ages!

[+] GauntletWizard|7 years ago|reply
This will significantly bias the second study towards getting the same result. Providing help and assistance beyond the paper itself will create experimental controls that are undocumented - assumptions that were shared between researchers but not published in the papers themselves, because humans are bad at that. Further, giving any sort of relationship between the experimenter and the reproducer will taint them - people are friendly and want to see their friends "succeed" - they will taint their own work out of empathy.
[+] Balgair|7 years ago|reply
Lets look at an example to ferret out some stuff. Assume that the experiment is looking at some disease, like Gastroesophageal reflux disease (GERD) [0]. It's not a glamorous disease, but it's worth looking into. Say you are looking at the way the nervous system interacts with the sphincter to un/control it. Further, you are looking at some specific synapse. Say, also, that you have a mouse model for this as a test bed (incredibly unlikely, btw). You induce GERD in the mice, wait for erosion of the esophagus, sac the mice, and study the synapses, get the results, negative or positive.

How long do you think it'll take a grad student to do this, to get to that point?

1st year is basically wasted, research-wise, as you've got classes and exams. 2nd year is a little better, as you finally understand all the issues with GERD in mice v. humans, the vagal nerves, etc. Still you may be taking classes and you are prepping for your quals. 3rd year is finally when you can get into the nitty-gritty of the experiment. You finally learn how to sac the mice properly, dissect, etc. You fail, a LOT. Lets say, research-wise, you finally have 1 year's worth of 'real' research under your belt.

4th-7th years are ~80% research work, 20% funding search (at ~90hrs/wk of work). The mice cooperate alright, there are breeding issues, random infections in the vivarium, minimal genetic drift, etc. Finally, you can get 'clean' data that other people may actually believe. You've controlled for sex, for food intake, for stomach acidity, for the phase of the moon, for how much coffee is in the room, etc. You've gone half mad getting controls done. You've learned how to dissect the mice carefully, that you can't have coffee for 3 days beforehand to calm your hands down, how to properly positions and craft the micro-pipette tips, how to run all the custom software you've hacked together badly. You've learned a LOT about this tiny portion of a mouse and how to uncover it's secrets. A lot of what you have learned is only applicable to your hands in your lab with your mice.

After, say, 5 years or 'real' research, you find that those neurons that command that particular sphincter of those mice (experimental controls to hell and back) have something to do with GERD. But only to p=0.0482. A positive result, but juuuuust so. You find something to write about and publish. 7 years of your life, and it's just there if you squint at the data a lot. You get out of grad school with a PhD and open a bakery. The research, what you found, is largely forgotten all the same.

Mind you, this is in a stable lab with a grad student with a stable life. No pregnancies, no marriages, no advisor conflict, no sexual harassment, no funding issues, etc.

Now, the parent comment suggests that we should have some other lab come along and replicate the experiment, do the same thing, just to be certain that the 7 years of that grad studen't life were really doing something, not a statistical fluke. How is that going to actually be accomplished?

Say, yeah, you have the funding already set-up via the NIH, fine. Who is going to do that second round of experiments? What grad student is going to get trained up to do someone else's experiments again? How long will it take them to get to that same point, can they even do it the same way? What about their own original research and their career path? Will their advisors be similarly alright with this? Will their life in grad school be just as stable? Who will teach them how to position the electrodes, to dissect the mice, to deal with vivarium issues, to pass their quals?

It's not that the scientists don't want to re-run experiments, it's that doing then is HARD. Technically, yeah, science is not easy. But logistically? Man, even getting one student through grad school is an accomplishment. If you want to re-run each experiment and paper, you somehow have to deal with re-running all the extra jetsam and flotsam too, and I don't think we have any idea how to do that successfully.

[0]https://www.mayoclinic.org/diseases-conditions/gerd/symptoms...

[+] afpx|7 years ago|reply
Furthermore, it would be extremely useful if scientific research was transparant from inception to outcome.

Scientific ‘papers’ are archaic forms of knowledge transfer suited for a time when physical paper was the only way. These days, science would serve us better if we could see everything involved in the process. I don’t want to see only the results, I want to see the whole notebook and all the hurdles along the way. Why can’t we follow the researchers and the accumulation of evidence in real time?

[+] radarsat1|7 years ago|reply
Sounds nice, but as a scientist I can say that it would be a huge overhead to have to constantly prepare half finished work for public consumption throughout the entire process of a study. Scientific papers make sense because the point is to summarize your results, discuss what they may imply, e.g what theories they support or deny, and describe what you did in a reproducible way so that the work can be verified. The writing of a paper is an act of communication. It's important that it is an explicit thing that researchers are trained to do properly, not something that they just shit out at the end of a poorly done study that didn't work, or worse, in the middle of something where they can't draw any conclusions yet. E.g. journalists already pick up on wrong or hype-inducing interpretations of poorly written press releases of published papers. Imagine if they could also pick up on half-finished work. "Study being done could mean something amazing, maybe, if it works!" You think fake news is a problem now? Imagine the world you propose.

Now, if researchers were encouraged more to describe in detail their process and everything that went right and wrong throughout a study, along with data and algorithms and everything, perhaps as appendices or in supplemental material like blogs or videos etc., and and I find this is what is happening lately, that would be fine, and a nice ideal to strive for.

But realize that scientists are already required to not only perform the study but write about it, and convince every skeptic that they are right, go to conferences and get an article accepted by a journal which can take a year or more. And now add to this that they are required to prepare the data and software for public consumption, make videos and blog posts that describe everything, answer all questions that the public has. Think about all that overhead you are demanding that goes so far above and beyond doing the actual science. It's not small. And they are not paid extra for it, in fact their paycheck is probably half what they could make doing closed science for a for-profit company. Meanwhile their job as an academic is only to explore new ideas and convince their peers of their worth. Why should they go the extra mile, for free, for every member of the public who demands answers and transparency? Sorry, but it's too. much. work.

[+] jhauris|7 years ago|reply
A growing number of people seem to agree with this ([0],[1]). There are a couple of places ([2],[3]) where one can preregister a hypothesis and experiment before collecting data. There are also some journals that are starting to encourage preregistration ([4],[5]). It would be nice if this phenomenon spread beyond the few fields it's started in.

I don't know about needing to see all the intermediate data, that seems extreme if the methodology is properly described and the raw data is available.

0: https://www.bitss.org/2014/06/13/preregistration-controversy...

1: https://www.apa.org/science/about/psa/2015/08/pre-registrati...

2: https://aspredicted.org/

3: https://osf.io/

4: https://www.psychologicalscience.org/publications/psychologi...

5: https://www.journals.elsevier.com/cortex/news/registered-rep...

[+] samontar|7 years ago|reply
Because they rightly fear that someone will scoop them. The Nash equilibrium is to wait till you’re done to publish.
[+] anonytrary|7 years ago|reply
I fear that science is slowly going awry, particularly in fields where outcomes have grave and immediate implications for businesses. In this respect, physics is simpler than social science. Not getting the results you expect is fine and often teaches you something.
[+] grandmczeb|7 years ago|reply
I don't think it's going awry but rather that certain fields have always been suspect and we're just now realizing how bad it is. In mathematics, there are proofs that have stood up for thousands of years; in physics there are models that have remained useful for hundreds. How many fields can claim that kind of longevity?
[+] grigjd3|7 years ago|reply
Physics has all the same hackery as anything else, often times in a less robust fashion. I saw plenty of dirty behaviour in the physics world.
[+] evandijk70|7 years ago|reply
Posts like this are often put on hacker news. No one doubts that cherry-picking from hypothesis is real. The same goes for 'spinning' negative results.

However, the solution: "pre-register trials and only do what you say you are going to do" oversimplifies things a lot. Testing and rejecting your hypothesis is a very real part of doing science. But its also a scientists' job to come up with a new hypothesis that explains the data better. I think the real problem here is that writing something up and saying: "our initial hypothesis was wrong, we suggest this and this factor is at play" is not the way science is done currently.

[+] empath75|7 years ago|reply
If your study took a lot of wrong turns, include that in an appendix at least.
[+] wsy|7 years ago|reply
It is not 'oversimplification', but pointing out the traps and pitfalls of empirical validation.

If you gain new hypotheses in the process of validating and falsifying your old ones, you need a second, independent study to validate the new ones. You anyway always run the danger that the result you found is a random artefact of your study participant group. If you validate the new insights with the same group, this risk becomes uncontrollable (e.g., confidence intervals become meaningless in such a setting).

[+] tells|7 years ago|reply
I've been kinda thinking this for more than a decade after working at one of the big pharmas. I witnessed several trials with subpar results that would not go on to be published. I think all studies should undergo a simple national pre-registration and require a summary at the end of each study. One of the things that makes humans special is our ability to store information and pass it to further generations and just throwing away unwanted results is not helping anyone.
[+] qubax|7 years ago|reply
The problem with "studies/research" today is that most of it cannot reproduced, not that it went nowhere. It's not really a matter of "cleaning up" the research to make it "positive". In other words, most science today isn't real science.

Throw in the issues of funding ( government - political issues, private - corporate issues ) and there is very little incentive for real research. And with the current academic environment at leading institutions like Yale, scientists probably are too afraid to do research honestly on sensitive topics.

Also, isn't this just a rehash of another nytimes article from last year?

https://www.nytimes.com/2017/05/29/upshot/science-needs-a-so...

There are 3 or 4 nytimes articles on the frontpage. At this rate, how long before the entire frontpage is just nytimes? Just redirect hn to nytimes and be done with it?

[+] thrower123|7 years ago|reply
The NY Times just rotates through the same double-handful of subjects on something like a six week timer. It does get a bit tedious, because there's very little that's actually new to be said, and everyone just rehashes the same tired arguments over and over and over.
[+] Rainymood|7 years ago|reply
I still think that science should be automated in some way, shape or form. I'm imagining something like you have a dataset and you have to upload that dataset to some third party that checks it for it's validity. Then you write exactly WHAT you are going to do with the data and then you send the proposed "routines" in, this should give automated generated output and an automated report of what was done and why. Of course, this is a completely silly idea but I'd love to know if someone has like any tangential related thoughts on this
[+] MaxBarraclough|7 years ago|reply
> I still think that science should be automated in some way, shape or form.

Meaning what?

> I'm imagining something like you have a dataset and you have to upload that dataset to some third party that checks it for it's validity.

The results are what they are. What is 'validity' meant to mean?

> Of course, this is a completely silly idea but I'd love to know if someone has like any tangential related thoughts on this

Quantitative studies are already published with the proper analyses, which are invariably produced 'automatically' using software, not manual methods.

I imagine there might be some value in publishing raw data, though. There may sometimes be questions like privacy, but I don't imagine they'll always be show-stoppers.

[+] shoo|7 years ago|reply
ignoring the automation aspect,

> Then you write exactly WHAT you are going to do with the data and then you send the proposed "routines"

this roughly matches up with the idea of "preregistration" of research, i.e. you define and share what your experimental method is going to be before you start looking at the data and performing analysis, to help guard against some unconscious or conscious decisions to adjust the method after you have observed the experimental data.

i've never heard of cos.io before but they're a top hit for me when searching for "preregistration" https://cos.io/prereg/

Andrew Gelman has written quite a lot about related topics in the past (the "replication crisis" - especially in the social sciences, "p hacking", "the garden of forking paths" http://www.stat.columbia.edu/~gelman/research/unpublished/p_... , https://andrewgelman.com/2017/03/09/preregistration-like-ran... )

I quite like how Gelman theoretically frames this in his "forking paths" paper:

  Consider the following testing procedures:

  1. Simple classical test based on a unique test
  statistic, T, which when applied to the observed
  data [ y ] yields T(y).
  
  2. Classical test pre-chosen from a set of
  possible tests:  thus, T(y; phi), with
  preregistered phi. For example, phi might
  correspond to choices of control variables in a
  regression, transformations, and data coding and
  excluding rules, as well as the decision of
  which main effect or interaction to focus on.
  
  3. Researcher degrees of freedom without fishing:
  computing a single test based on the data, but in
  an environment where a different test would have
  been performed given different data; thus
  T(y; phi(y)), where the function phi(.) is
  observed in the observed case.
  
  4. "Fishing": computing T(y;phi_j) for j=1,...,J:
  that is, performing J tests and then reporting
  the best results given the data, thus
  T(y; phi^{best}(y)).
[+] Miltnoid|7 years ago|reply
My research area has actually started doing that. Sure it's a CS discipline, but yeah we have a second component of our conferences where we automate our benchmarks, and other people run those benchmarks and validate the results.
[+] interfixus|7 years ago|reply
If only Newton, Ørsted, Becquerel, Planck, Fleming, Feynman, and so many others of their ilk had known to do that.
[+] tuxt|7 years ago|reply
So, who is paying the bills?
[+] MrEfficiency|7 years ago|reply
For Engineers- Capitalism/Customers

For Academics- Whoever tells you to run the study.

Which one do you think is more often corrupted for centuries at a time?

[+] julienreszka|7 years ago|reply
Can't learn from failure. Stop wasting people's time