No publication without confirmation

[+] beloch|9 years ago|reply

"Confirmatory labs would be less dependent on positive results than the original researchers, a situation that should promote the publication of null and negative results. They would be rewarded by authorship on published papers, service fees, or both. They would also be more motivated to build a reputation for quality and competence than to achieve a particular finding."

Sounds great, but how would this actually work. Nobody is going to get juicy grants from existing funding agencies for being a "confirmatory" lab. Nature sure as hell isn't going to pay for this. Most researchers probably can't afford to pay an outside lab to duplicate their research. Is Nature going to suddenly start refusing papers whose results haven't been reproduced elsewhere? That's basically suicide for their journal because researchers are frequently in a race with other researchers to publish first, so why publish with a journal that requires you to double your budget to pay a confirmatory lab and wait months or years for them to do the job? The pressure will be intense to publish elsewhere first.

I have a simpler solution.

Don't just slap the names of confirmatory lab authors onto other papers. Publish original papers and publish confirmatory papers with equal prominence to the original papers. Hell, devote a portion of Nature to doing just that. Currently, if you want to publish a paper about confirming someone else's original findings, not even a third rate journal will touch it unless you put at least some kind of novel-sounding spin on it. Nature should use all that scummy impact factor gaming they do to make confirmatory papers respectable. Only when the work of reproducing results gains labs respect will funding agencies start supporting "confirmation labs". At present, such "unoriginal", "hack" work is not respected at all, and Nature is a big part of the reason why.

[+] setrofim_|9 years ago|reply

> Most researchers probably can't afford to pay an outside lab to duplicate their research.

Even if they could, we probably don't want the researchers paying for their results to be duplicated. This would create perverse incentives, similar to what happened with investment banks and credit rating agencies. If the original researchers must get their results confirmed in order to get published, and it is them who are paying for the confirmation, they will naturally tend to choose confirmatory labs that are more likely to confirm their findings. Since the labs would then rely on the researchers for funding, that would create pressure on the confirmatory labs to adapt their methodologies in ways that make it more likely that results get confirmed (even when the original study may not warrant it).

We want confirmatory labs to have no special interest in either confirming or disproving a particular study, but in improving the overall quality of research.

Since a journal's reputation depends (at least in part) on the quality of research it publishes, the journals would seem to be the natural candidates for the source of funding of confirmatory labs. Whether they'd actually be willing to do it another matter...

[+] susi22|9 years ago|reply

That won't get you a PhD though. All PhD programs usually require original research. And the PhD students are usually the ones doing those experiments.

[+] neltnerb|9 years ago|reply

I think the best balance would be something like:

"You must include a confirmatory study by an independent lab in order to publish this research, BUT we will give you an accept/reject decision on your paper prior to doing that study."

That way, there's much reduced bias in the confirmatory results since the paper gets published either way. And if the paper would get rejected even if the confirmation is successful, then that's a ton of wasted effort and money that are major disincentives to trying at all.

As it should be, science is about experiments and predictive results, not outcomes being "desirable". Let's incentivise that.

[+] forkandwait|9 years ago|reply

> Most researchers probably can't afford to pay an outside lab to duplicate their research.

This cost would be built into the grant that funds the lab, and I bet everyone would get used to it after a few years of complaining.

[+] unknown|9 years ago|reply

[deleted]

[+] q_revert|9 years ago|reply

anonymous messages are sometimes very powerful, measurements that are easy to make are not necessarily the right ones to make

sometimes you get snowed in, often you get snowed under

[+] caseysoftware|9 years ago|reply

I think we've conflated terms.

The lay public thinks "peer reviewed" means that others have tried it and validated the results. What it really tends to mean is that a peer looked at the procedures and results and that it passes the "sniff test" and generally doesn't have any glaring errors.

The more subtle problem is that in some circles, it isn't even that. Since fewer and fewer people want to be the person who damaged someone else's work and/or career, it's a blanket pass.

We're drifting away from scientific study and critical thinking to "reasonable" approaches and not upsetting doctrine and/or your superiors. That looks less and less like science and more like religion.

[+] hackuser|9 years ago|reply

Here's a leading scientist's description of peer review:

Peer review works superbly to separate valid science from nonsense, or, in [Thomas] Kuhnian terms, to ensure that the current paradigm has been respected. It works less well as a means of choosing between competing valid ideas, in part because the peer doing the reviewing is often a competitor for the same resources ... sought by the authors. It works very poorly in catching cheating or fraud, because all scientists are socialized to believe that even their toughest competitor is rigorously honest in the reporting of scientific results ... It certainly does not ensure that the work has been fully vetted in terms of the data analysis and the proper application of research methods.

From: Reference Manual on Scientific Evidence [for U.S. federal judges], Third Edition; How Science Works section by David Goodstein, CalTech Physics Professor and former Provost; published by National Academies Press (2011)

https://www.nap.edu/catalog/13163/reference-manual-on-scient...

[+] mattkrause|9 years ago|reply

> It's a blanket pass.

This has not been even close to true in my experience. Reviews are usually unsigned, so there's very little social pressure to "let things slide". On the other hand, a non-trivial number of reviewers seem to think the review process is an opportunity to "rough up the competition" instead of an opportunity to offer constructive feedback.

[+] jMyles|9 years ago|reply

> What it really tends to mean is that a peer looked at the procedures and results and that it passes the "sniff test" and generally doesn't have any glaring errors.

It sometimes means that.

But there are studies that fail the smell test like a refuse heap and still somehow pass a "rigorous" peer-review process.

Remember when George Ricaurte, who by the way was already pretty obviously a charlatan ONDCP whore at the time, injected baboons with what he said was a normal dose of MDMA (2mg /kg) and found severe neurotoxicity? [0]

Yeah, well two of the five of the baboons died. I remember literally the day that study was published - in effing Science. It was all over the news, including the front page of the NYT.

But plenty of us in the drug policy reform movement (and, for that matter, those of us who had used MDMA a few times) knew immediately (and said so) that this study was obviously flawed because, well, people don't die from a normal dose of MDMA. Sure enough, it later turned out that Ricaurte had injected those poor baboons with a 2mg/kg dose of methamphetamine, not MDMA. He said that there had been a "labeling error," which his supplier denied.

There are examples like this every day.

The peer review process is only as good as the political will toward righteous honesty - the state has muscled out-and-out deceit through this system often enough to make any thinking person doubt its capacity even as an effective "sniff test."

0: https://erowid.org/chemicals/mdma/references/journal/2002_ri...

[+] thaw13579|9 years ago|reply

> The lay public thinks "peer reviewed" means that others have tried it and validated the results. What it really tends to mean is that a peer looked at the procedures and results and that it passes the "sniff test" and generally doesn't have any glaring errors. > The more subtle problem is that in some circles, it isn't even that. Since fewer and fewer people want to be the person who damaged someone else's work and/or career, it's a blanket pass.

From my experience in the biomedical review process, I would characterize the process as brutal, at least for top venues and federal grants.

> We're drifting away from scientific study and critical thinking to "reasonable" approaches and not upsetting doctrine and/or your superiors. That looks less and less like science and more like religion.

I mostly agree that there is friction with established doctrine/superiors, but hasn't this always been there? It seems hard to find a major scientific discovery that didn't have some established concept (and proponents) to push against.

[+] pyrale|9 years ago|reply

I reject the idea of a religion-like science. I would say it has become what it is now because of the economic view the society has adopted to manage it, rather than because of irrational thinking.

Apparently, science production doesn't scale well, because scientists, when asked to compete for their bread-winning, find it easier to fool their managers than to produce legit science.

[+] golergka|9 years ago|reply

Isn't the approach different in different fields? I remember reading about recent advanced papers in mathematics that have been published but then left "in the void" a little bit because it took such a long time for peers to actually read, understand and try to challenge the proofs.

[+] unknown|9 years ago|reply

[deleted]

[+] StClaire|9 years ago|reply

I have an idea: if a research study doesn't go the way you thought it would, put it out there.

We need a central repository like Arxiv where we dump the experiments that didn't work out so that we can quickly compare a "successful" one to ones done before. That gives us a better idea of if the data is just a fluke.

The papers wouldn't have to be super involved. What did you do? What were statistical conclusions. Give an upper-level undergraduate or an early masters student some experience writing up a procedure. Shouldn't take more than a couple hours but it could save a lot of time dealing with publication bias

[+] beambot|9 years ago|reply

> put it out there.

To write a good, technical blog post takes 10+ hours. To write a complete academic paper with references, related work, methods, graphics, etc can take days, weeks, or even months (eg. for thesis work).

It's not just that negative results largely go un-valued by the academic status quo... it's also often just not worth the effort to write-up beyond simple documentation in your lab notebook. As a researcher, the lasting impact that we care about is (simply put) in the positive results, and being the first to achieve them.

[+] untilHellbanned|9 years ago|reply

Not going to happen. You don't understand the forces at play when biomedical researchers (I'm one too) do the work. A major reason more specific to this type of researcher that he/she wants to hold back the data is in the event it can be re-purposed for later goals. It's capital. The time and money spent acquiring the types of negative data the article talks about (mouse experiments) take big dollars. I'm not going to throw it away in an archive. It's like donating to a Goodwill bin some really expensive clothes you saved your money to buy only to realize after you bought it that it's the wrong size. You're totally going to try to make it work often for years before you give up on it.

If you take issue with my comment and have ever worked for a startup that didn't pan out (should be many of us here on HN), think of the scenario if someone told you, "hey, why don't you just be a good person and open source your app and give all your customers to XYZ?" (Setting aside the armchair quarterback guilt imposed on you) There are a few who do that, but if you've spent any length of time on your business, you're going to think of ways to re-purpose the investment you made it in other ways before you just go dumping it in an archive like Github or whatever.

[+] djsumdog|9 years ago|reply

Negative results are one of the biggest things missing from publications. I highly suggest the book "The Antidote" by Burkeman. There are several chapters on our current western view of failure and how our attempt to distance ourselves from it can hinder research, product development and even personal development.

I think there was even an article posted here a few months back on how the lack of negative results in science really hurts the community. People may work on something for half a year, consider it a failure, and just dump it; and other people go down the same exact path with the same methods. (If you publish a negative result, someone may pick it up and ask, "I wonder if they tried x or y" and attempt the experiment again).

[+] jonlucc|9 years ago|reply

BioRxiv [1] attempts to be this, but it isn't widely used as it is in the physics/math/compSci world.

[1] http://biorxiv.org/

[+] chrisseaton|9 years ago|reply

How can an experiment 'not work out'? Do you mean a negative result? Not getting evidence for your hypothesis is not 'not working'. That's a crazy way to approach science. It is more information to adjust your hypothesis. Or do you mean a failure such as broken equipment or an infected sample meaning you have no data? Well then what would you put in the paper?

[+] arthur_pryor|9 years ago|reply

a friend of mine (who just finished a clinical psych PhD) had exactly this idea a couple years ago. a null results archive, so that experiments that don't "work out" don't just get thrown away.

[+] mtdewcmu|9 years ago|reply

What if you basically take your lab notes and blog them?

[+] tdaltonc|9 years ago|reply

The obvious questions is "who's going to do the confirmation work?"

I think that masters/bachelor students should be able to handle that work. A new grant mechanism for masters/bachelor training grants that fund replication would get the job done with a lot of nice side effects.

[+] untilHellbanned|9 years ago|reply

As a middle american, I'm actually serious when I say this could be a way middle america gets on it's feet again. Being a biomedical researcher at a university hospital, the hospital has proliferated with all types of trainees. What's holding back the basic research side? It's an assembly line in the same way the midwest is familiar, the automobile industry. However, universities aren't good businesses, and what's missing is big pharma companies paying employees fair wages to do this work. Though companies saving the day won't work either because putting people to work we'll mean they won't get sick which is bad for business. So oh well on to the next approach.

[+] pcrh|9 years ago|reply

The article refers chiefly to repeating mouse or rat experiments.

The obstacle there isn't what level of training researcher has (as long as it's sufficient), but who is going to pay for it.

At the scale proposed (a 6-fold greater number of mice per experiment than is usual) the cost of testing only the core hypothesis is easily over $100K. In addition there is the time involved, which can be from months to years, depending on the experiment.

[+] untilHellbanned|9 years ago|reply

No thanks. Papers involving animals are already backbreakingly slow compared with cell-based or in vitro work. I know because I've been lapped by my colleagues using more simple systems as I slog through our paper we got rejected from Nature because the reviewers suggested another 3 years worth of experiments. Yep, year 5 into this single project, which we knew the outcome for 4 years ago. Not excited about this proposal at all.

Look I'm all for rigor but how about the people trying to make money off the deal pay for all the work and keep people like me out of it. Or don't allow the people trying to make money interpret the results of such preliminary studies so liberally. It's like the education system. Scientists like teachers, both of which don't make much money and do all the labor, don't want more hoops jump through.

[+] jessriedel|9 years ago|reply

Sorry, which parts of the proposal are you responding to? The article is more specific than "more rigor". Are you objecting to the higher p-value threshold? The independent confirmation?

The author argues that a single higher quality confirmatory experiment will be able to replace gathering lots of statistics for exploratory experiments:

> Unlike clinical studies, most preclinical research papers describe a long chain of experiments, all incrementally building support for the same hypothesis. Such papers often include more than a dozen separate in vitro and animal experiments, with each one required to reach statistical significance. We argue that, as long as there is a final, impeccable study that confirms the hypothesis, the earlier experiments in this chain do not need to be held to the same rigid statistical standard.

Do you disagree?

[+] djsumdog|9 years ago|reply

The editorialization of publications also has a detrimental effect. There's a great PhD comic about this:

https://robertcargill.com/2012/04/21/the-science-news-cycle-...

[+] tonto|9 years ago|reply

I don't think they want this to apply to all papers, just a certain class of papers. But maybe you have already considered that(as your reviewers obviously are already pinging you about more testing) and you still disagree?

[+] bloaf|9 years ago|reply

I don't think this is a good idea because it would increase the politicking in scientific publication. Specifically, no one is going to want to do the reproduction work, so reproduction work will be seen as a favor from one scientist to another. Moreover, in specialized fields, scientists just as frequently see each other as competitors as collaborators. I strongly suspect there would be a lot of gamesmanship where scientists refuse to (or drag their feet) do reproduction work on new studies that threaten to disrupt the status quo that has made them successful.

[+] disgruntledphd2|9 years ago|reply

I would absolutely kill to get a job doing replication.

What I always hated about science was the inability for things not to work out. Even if you find something directly opposed to your hypothesis, you are somehow supposed to pretend that it worked out "just as planned".

It's toxic, boring and leads to bad science.

And so, for me, I would absolutely adore to be in a place where I got to run well-powered studies and aim to just figure out the right answer rather than build my career on a bunch of unrepeatable statistical flukes.

That being said, my PhD is in Psychology, so they probably won't be hiring me to run animal-model studies.

I really like this idea, as long as Nature put their space where their mouth is (which they won't, as they have at least one of these articles per year and it doesn't appear to have made any impact).

[+] Fomite|9 years ago|reply

Pretty much this. I cannot help but see a replication requirement like this turning things even more political. Young New Investigator's Lab, who has very little to trade, is going to struggle, while someone who can conjure a postdoc out of thin air for someone's student will probably be able to find someone.

[+] dorianm|9 years ago|reply

I applaud for the P < 0.01.

There are too many non reproducible results with real life harm: http://infoproc.blogspot.com/2017/02/perverse-incentives-and...

[+] feral|9 years ago|reply

Just to note, there's a tradeoff here - not publishing work until you are massively certain of it would also cause real life harm. Reducing the p value doesn't automatically reduce harm.

Physicists require extremely low values before confirming a discovery has been made, but that's different from requiring it before publishing.

The problem is with people interpreting published work as if once its published, its completely certain.

Maybe each publication should come with a headline 'confidence' stat beside the title. I guess this is a step in that direction.

[+] mattkrause|9 years ago|reply

You really think so?

I'd argue the fetishization of a specific p-value threshold, be it p<0.05, p<0.01, or even p<0.0001, is a much bigger problem. There is an excellent quote from Rosnow and Rosenthal:

     [D]ichotomous significance testing has no ontological  
     basis. That is, we want to underscore that, surely,
     God loves the .06 nearly as much as the .05. Can 
     there be any doubt that God views the strength of 
     evidence for or against the null as a fairly 
     continuous function of the magnitude of p?”

Wouldn't you prefer a few correct but tenative studies that hint at an effect while paving the way for larger, more expensive replications to a scenario where the data is sliced, diced, and tortured to hit some arbitrary p-value threshold?

[+] rdlecler1|9 years ago|reply

The problem here is that publishing a single paper is often the product of months if not years of work. Saying: now add more work without more grant money, is going to be difficult to swallow. Even worse, it departments need to hire even more PhDs who are unemployable after they graduate.

[+] ramblenode|9 years ago|reply

Pre-registration is nice, larger samples/greater power is necessary, and increasing the p-threshold may indirectly filter out some false positives but kind of misses the underlying issue of p-hacking, some of which would be solved by pre-registration.

The authors suggestions are preventative in nature but what I would like to see above all else is requiring researchers to publish the raw data and to make their statistical analyses minimally reproducible--something which could be satisfied by publishing scripts or Excel macros along with instructions for any non-automated data stitching. Experiments frequently implode at the analysis phase which then gets intentionally or unintentionally masked in ambiguous, poorly written methods sections. Giving others access to the data allows errors to be spotted earlier after publication and alternative hypotheses and analyses to be tested against the published results. It's also sometimes the only way of spotting abnormalities resulting from the data collection process itself. Again, not a means of preventing errors, but a low friction way of discovering them. Maybe having everything in the open would light a fire under some researchers to be more thorough, though.

[+] lngnmn|9 years ago|reply

Again, statistics applied to a partially observable and partially understood phenomena yield nonsense. If not all the variables are controlled or not all possible causes has been taken into account the result will be a mere aggregation of observations.

What's true for coins and dices does not applicable for partially observable environments with multiple causation and yet unknown control mechanisms.

Statistics is not applicable to imaginary models based on unproven assumptions or premises.

[+] jeffdavis|9 years ago|reply

Dumb outsider question: why not just mark studies that have been reproduced versus ones that still have not been reproduced?

The way I view it is two steps of publication: the cutting-edge (not reproduced yet) versus independently reproduced.

[+] untilHellbanned|9 years ago|reply

Figuring out which is which is easy enough. The real issue is people not over-interpreting the results.

[+] lutusp|9 years ago|reply

Quote: "Our proposal is a new type of paper for animal studies of disease therapies or preventions: one that incorporates an independent, statistically rigorous confirmation of a researcher's central hypothesis."

This probably won't happen right away, but it's a terrific and necessary idea that we need to to move forward. It will revolutionize biology and medicine, and it will end the field of psychology as we know it.

http://arachnoid.com/psychology_and_alchemy

[+] mattkrause|9 years ago|reply

Why would this end psychology?

It might make it better, but we're nowhere near being able to describe (e.g.) group behavior from first principles or ion channel kinetics. Despite your link, there is a lot of solid psych research. There are (obviously) discredited theories and cranks too, but psychologists characterized rods and cones way before the biologists found them, for one example.

[+] misnome|9 years ago|reply

How about always having a professional, non-field statistician on the review panel?

Non-reproduceability should probably be interpreted as criticism of the reported certainly of the results.

[+] unknown|9 years ago|reply

[deleted]

[+] Ranlot|9 years ago|reply

A simple discussion to find out more about the "statistical power" mentioned in the article: https://p-value-convergence.herokuapp.com/

[+] clamprecht|9 years ago|reply

It's interesting the parallel between the blockchain (requiring confirmations by peers) and this discussion.

111 comments