top | item 27241715

A new replication crisis: Research that is less likely to be true is cited more

602 points| hhs | 4 years ago |ucsdnews.ucsd.edu | reply

319 comments

order
[+] eob|4 years ago|reply
I will never forget the day a postdoc in my lab told me not to continue wasting time trying (and failing) to reproduce [Top Institution]’s “Best Paper Award” results from the year prior. He had been there when the work was done and said they manipulated the dataset until they got the numbers they wanted. The primary author is now a hot shot professor.

My whole perception of academia and peer review changed that day.

Edit to elaborate: like many of our institutions, peer review is an effective system in many ways but was designed assuming good faith. Reviewers accept the author’s results on faith and largely just check to make sure you didn’t forget any obvious angles to cover and that the import of the work is worth flagging for the whole community to read. Since there’s no actual verification of results, it’s vulnerable to attack by dishonesty.

[+] kqr|4 years ago|reply
When I was in school pre-university, this type of "crap we can't get the what we wanted to happen so let's just fiddle around with it until it seems about right" was very common. I was convinced this was how children learned, so that as adults they wouldn't have to do things that way.

When I got into university and started alternating studying and work, I realised just how incredibly clueless even adults are. The "let's just try something and hope nothing bad happens" attitude permeates everything.

It's really a miracle the civilisation works as well as it does.

The upshot is that if something seems stupid, it probably is and can be improved.

[+] kashyapc|4 years ago|reply
I'm just done with a 3-hour reading session of an evolutionary psychology book by one of the leading scientists in the field. The book is extremely competently written, and is awash with statistics on almost every page, "70% of men this; 30% of women that ... on and on". And much to my solace, the scientist was super-careful to distinguish studies that were replicable, and those that were not.

Still, reading your comment makes me despair. It plants a nagging doubt in my mind, "how many of these zillion studies cited that are actually replicable?" This doubt remains despite knowing that the scientist is one of the leading experts in the field, and very down-to-earth.

What are the solutions here? A big incentive-shift to reward replication more? Public shaming of misleading studies? Influential conferences giving more air-time for talks about "studies that did not replicate"? I know some of these happen at a smaller-scale[1], but I wonder about the "scaling" aspect (to use a very HN-esque term).

PS: Since I read Behave by Sapolsky — where he says "your prefrontal cortex [which plays critical role in cognition, emotional regulation, and control of impulsive behavior] doesn't come online until you are 24" — I tend to take all studies done on university campuses with students younger than 24 with a good spoon of salt. ;-)

[1] https://replicationindex.com/about/

[+] Rygian|4 years ago|reply
Honest question: could you go ahead and publish an article titled "Failure to replicate 'Top Institution's Best Paper Award'"?
[+] jhgb|4 years ago|reply
> a postdoc in my lab told me not to continue wasting time trying (and failing) to reproduce [Top Institution]’s “Best Paper Award” results from the year prior. He had been there when the work was done and said they manipulated the dataset until they got the numbers they wanted.

Isn't that the moment where you try even harder to falsify the claims in that paper? You already know that you'll succeed so it wouldn't be a waste of time in your effort.

[+] andi999|4 years ago|reply
It is not possible (in principle) and it was never intended for peer review to protect against fraud. And this is ok. Usually if a result is very important and forged, other groups try to replicate and fail, after some time the original dataset (which needs to be kept for 10 years I think) will be requested and then things go done from there.

Assuming not good faith for peer review would make academia more interesting, only way would probably for the peer reviewer go to the lab and get live measurements shown. Then check the equipment...

[+] fallingknife|4 years ago|reply
I wonder if it's a better system to just hire smart professors and give them tenure immediately. The lazy ones in it just for the status won't do any work, but the good ones will. Sure, there will be dead weight that gets salaries for life, but I feel like that's a lesser problem than incentivizing bad research.
[+] JumpCrisscross|4 years ago|reply
Honest question: how do we fix this? The obvious solution, prosecuting academics, has an awful precedence attached to it.
[+] Wowfunhappy|4 years ago|reply
I can understand why journals don’t publish studies which don’t find anything. But they really should publish studies that are unable to replicate previous findings. If the original finding was a big deal, its potential nullification should be equally noteworthy.
[+] dekhn|4 years ago|reply
this was exactly my experience and I remember the paper that I read that finally convinced me. It turns out the author had intentionally omitted a key step that made it impossible to reproduce the results, and only extremely careful reading and some clever guessing found the right step.

There are several levels of peer review. I've definitely been a reviwer on papers where the reviewers requested everything required and reproduced the experiment. That's extremely rare.

[+] Bellamy|4 years ago|reply
Why are so afraid to reveal the name and institution?
[+] DoreenMichele|4 years ago|reply
From what I have read, peer review was a system that worked when academia and the scientific world were much smaller and much more like "a small town." It seems to me like growth has caused sheer numbers to make that system game-able and no longer reliable in the way it once was.
[+] amvalo|4 years ago|reply
Why not just name the paper :)
[+] MichaelMoser123|4 years ago|reply
may i ask what field of knowledge the manipulated paper was from? Your page lists CS/NLP, so that field may also be linguistics or neurology (linguistics which would be easier to swallow for me) https://scholar.google.com/citations?user=FMScFbwAAAAJ&hl=en

Some wider questions would be: Are there similar problems in Mathematics/physics versus the life sciences/other social sciences? Are there the same kind of problems across different fields of study?

Also i wonder if replication issues would be less severe if there was a requirement to publish the software and raw data that any study is based on as open source / data. It is possible that a change in this direction would make it more difficult to manipulate the results (after all it's the public who paid for the research, in most cases)

[+] achillesheels|4 years ago|reply
Frankly, sir, it is the reason you wish your anecdote to remain anonymous that such perfidy survives. If these traitors to human reason and the public’s faith in their interests serving the general welfare - after all who is the one feeding them? - became more public, perhaps there would be less fraudulence? But I suppose you have too much to lose? If so, why do you surround yourself in the company of bad men?
[+] lasfter|4 years ago|reply
The issue is that the authors of bad papers still participate in the peer-review process. If they are the only expert reviewers and you do not pay proper respect to their work, they will squash your submission. To avoid this, papers can propagate mistakes for a long time.

Personally, I'm always very careful to cite and praise work by "competing" researchers even when that work has well-known errors, because I know that those researchers will review my paper and if there aren't other experts on the review committee the paper won't make it. I wish I didn't have to, but my supervisor wants to get tenured and I want to finish grad school, and for that we need to publish papers.

Lots of science is completely inaccessible for non-experts as a result of this sort of politics. There is no guarantee that the work you hear praised/cited in papers is actually any good; it may have been inserted just to appease someone.

I thought that this was something specific to my field, but apparently not. Leaves me very jaded about the scientific community.

[+] LetThereBeLight|4 years ago|reply
More specifically, this paper is focused on the social sciences. That's not to say that this isn't present in the basic sciences either.

But one other thing to note here is that these headlines about a "replication crisis" seems to imply that this is a new phenomenon. Let's not forget the history of the electron charge. As Feynman said:

"We have learned a lot from experience about how to handle some of the ways we fool ourselves. One example: Millikan measured the charge on an electron by an experiment with falling oil drops, and got an answer which we now know not to be quite right. It's a little bit off because he had the incorrect value for the viscosity of air. It's interesting to look at the history of measurements of the charge of an electron, after Millikan. If you plot them as a function of time, you find that one is a little bit bigger than Millikan's, and the next one's a little bit bigger than that, and the next one's a little bit bigger than that, until finally they settle down to a number which is higher. Why didn't they discover the new number was higher right away? It's a thing that scientists are ashamed of—this history—because it's apparent that people did things like this: When they got a number that was too high above Millikan's, they thought something must be wrong—and they would look for and find a reason why something might be wrong. When they got a number close to Millikan's value they didn't look so hard. And so they eliminated the numbers that were too far off, and did other things like that ..."

https://en.wikipedia.org/wiki/Oil_drop_experiment#Millikan.2...

[+] analog31|4 years ago|reply
Something that I think the physical sciences benefit from is the ability to look at a problem from more than one angle. For instance, the stuff that we think is the most important, such as the most general laws, is supported by many different kinds of measurements, plus the parallel investigations of theoreticians. A few scattered experiments could bite the dust, like unplugging one node in a mesh network, and it could either be ignored or repaired.

The social sciences face the problem of not having so many different possible angles, such as quantitative theories or even a clear idea of what is being tested. Much of the research is engaged in the collection of isolated factoids. Hopefully something like a quantitative theory will emerge, that allows these results to be connected together like a mesh network, but no new science gets there right away.

The other thing is, to be fair, social sciences have to deal with noisy data, and with ethics. There were things I could do to atoms in my experiments, such as deprive them of air and smash them to bits, that would not pass ethical review if performed on humans. ;-)

[+] pavon|4 years ago|reply
> More specifically, this paper is focused on the social sciences.

No, it isn't. It looked at a few different fields, and found that the problem was actually worse for general science papers published in Nature/Science, where non-reproducible papers were cited 300 times more often as reproducible ones.

[+] lupire|4 years ago|reply
Feynman's examples is of people being more critical about certain issues. A better example is the case of "radiation" that could only be seen in a dark room in the corner of your eye, which turned out to be a human visual artifact and wishful thinking.
[+] qalmakka|4 years ago|reply
I worked in the academic world for two years. What I saw was that lots of people are under a constant pressure to publish, and quantity is often put above quality.

I've seen papers without any sort of value or reason to exist being bruteforced through reviewing just to avoid some useless junk data of no value whatsoever being wasted, all to just add a line on someone's CV.

This is without saying that some Unis are packed of totally incompetent people that only got to advance their careers by always finding a way to piggyback on someone else's paper.

The worst thing I've seen is that reviewing papers is also often offloaded to newly graduated fellows, which are often instructed to be lenient when reviewing papers coming from "friendly universities".

The level of most papers I have had the disgrace to read is so bad it made me want to quit that world as soon as I could.

I got to the conclusion the whole system is basically a complex game of politics and strategy, fed by a loop in which bad research gets published on mediocre outlets, which then get a financial return by publishing them. This bad published research is then used to justify further money being spent on low quality rubbish work, and the cycle continues.

Sometimes you get to review papers that are so comically bad and low effort they almost feel insulting on a personal level.

For instance, I had to reject multiple papers not only due their complete lack of content, but also because their English was so horrendous they were basically unintelligible.

[+] ta8645|4 years ago|reply
Until this is fixed, people need to stop saying "listen to The Science", in an attempt to convince others of a given viewpoint. Skeptical people already distrust our modern scientific institutions; not completely obviously, but definitely when they're cited as a cudgel. Continued articles like this should make everyone stop and wonder just how firm the supposed facts are, behind their own favoured opinions. We need a little more humility about which scientific facts are truly beyond reproach.
[+] TimPC|4 years ago|reply
We also need to listen to the science on things that are clearly established. The replication crisis is not something that affects almost anything in public debate. Evolution is well established science. Large parts of Climate Change are well established science. Etc.
[+] throwkeep|4 years ago|reply
"Believe science" is incredibly destructive to the entire field. It is quite literally turning science into a religion. Replacing the scientific method with "just believe what we say, how dare you question the orthodoxy". We're back to church and priests in all but name.
[+] koheripbal|4 years ago|reply
For social sciences, I generally disregard those published papers - most are just confirmation biased.

.... but each field is different. For those that are more quantitative, it's harder to deviate you conclusion from the data.

Bias is not binary, so it's a sliding scale between the hard sciences and the squishy ones.

[+] sebastialonso|4 years ago|reply
No thank you. This "argument" is precisely where anti-science people come from.

You have to listen to the science, and also use the common sense that "this is as far as we know" and that knowledge today may change tomorrow.

Two comments below you use this "argument" to ask "evidence" for evolution and climate change. Big red flag.

[+] endisneigh|4 years ago|reply
Perhaps the government should have a team of people who randomly try to replicate science papers that are funded by the government.

The government can then reduce funding to institutions that have too high a percentage of research that failed to be replicated.

From that point the situation should resolve itself as institutions wouldn’t want to lose funding - so they’d either have an internal group replicate before publishing or coordinate with other institutions pre-publish.

Anything I’m missing?

[+] Matumio|4 years ago|reply
This sounds like doubling down on the approach was causing the problems.

The desire to control and incentivize researchers to compete against each other in order to justify their salary is understandable, but it looks like it has been blown so out of proportions lately that it's doing active harm. Most researchers start their career pretty self-motivated to do good research.

Installing another system to double-check every contribution will just increase the pressure to game the system in addition to doing research. And replicating a paper may sometimes cost as much as the original research, and it's not clear when to stop trying. How much collaboration with the original authors are you supposed to do, if you fail to replicate? If you are making decisions about their career, you will need some system to ensure it's not arbitrary, etc.

[+] pomian|4 years ago|reply
This is what industry does though. That is in the less theoretical fields. If you actually want to make something that works, then you need to base your science on provable fact. Produce oil, build a cool structure, generate electricity. Based on amazing and complex science, but it has to work. Conclusion is that the science that is done needs to be provable, but that means practical. Which is unfortunate. Because what about all that science that may be, or one day may be, practical?
[+] visarga|4 years ago|reply
Research is non-linear and criteria based evaluation is lacking in perspective. You might throw away the baby with the bathwater. Advancement of science follows a deceptive path. Remember how the inventor of the mRNA method was shunned at her university just a few years ago? Because of things like that millions might die, but we can't tell beforehand which scientist is a visionary and who's a crackpot. If you close funding to seemingly useless research you might cut the next breakthrough.
[+] jacksonkmarley|4 years ago|reply
I don't really like the idea of 'replication police', I think it would increase pressure on researchers who are doing their job of pushing the boundaries of science.

However, I think there is potential in taking the 'funded by the government' idea in a different direction. Having a publication house that was considered a public service, with scientists (and others) employed by the government and working to review and publish research without commercial pressures could be a way to redirect the incentives in science.

Of course this would be expensive and probably difficult to justify politically, but a country/bloc that succeeded in such long term support for science might end up with a very healthy scientific sector.

[+] smlss_sftwr|4 years ago|reply
A few thoughts playing the devil's advocate:

- You would need some sort of barrier preventing movement of researchers between these audit teams and the institutions they are supposed to audit otherwise there would be a perverse incentive for a researcher to provide favorable treatment to certain institutions in exchange for a guaranteed position at said institutions later on. You could have an internal audit team audit the audit team, but you quickly run into an infinitely recursive structure and we'd have to question whether there would even be sufficient resources to support anything more than the initial team to begin with.

- From my admittedly limited experience as an economics research assistant in undergrad, I understood replication studies to be considered low-value projects that are barely worth listing on a CV for a tenure-track academic. That in conjunction with the aforementioned movement barrier would make such an auditing researcher position a career dead-end, which would then raise the question of which researchers would be willing to take on this role (though to be fair there would still be someone given the insane ratio of candidates in academia to available positions). The uncomfortable truth is that most researchers would likely jump at other opportunities if they are able to and this position would be a last resort for those who aren't able to land a gig elsewhere. I wouldn't doubt the ability of this pool of candidates to still perform quality work, but if some of them have an axe to grind (e.g. denied tenure, criticized in a peer review) that is another source of bias to be wary of as they are effectively being granted the leverage to cut off the lifeline for their rivals.

- You could implement a sort of academic jury duty to randomly select the members of this team to address the issues in the last point, which might be an interesting structure to consider further. I could still see conflict-of-interest issues being present especially if the panel members are actively involved in the field of research (and from what I've seen of academia, it's a bunch of high-intellect individuals playing by high school social rules lol) but it would at least address the incentive issue of self-selection. Perhaps some sort of election structure like this (https://en.wikipedia.org/wiki/Doge_of_Venice#:~:text=Thirty%....) could be used to filter out conflict of interest, but it would make selecting the panel a much more involved and time-consuming process.

[+] fighterpilot|4 years ago|reply
Depending how big the stick is and how it's implemented, this might push people away from novel exploratory research that has a lower chance of replicating despite best efforts.
[+] rossdavidh|4 years ago|reply
Huh, so untrue (or grossly exaggerated) results are more interesting, and that matters more for getting talked about than truth.

Thank goodness our newsmedia business doesn't work that way, or we would be poorly-informed in multiple ways.

[+] ramblerman|4 years ago|reply
Pulling up the actual paper, there is an added part the article doesn't mention.

> Prediction markets, in which experts in the field bet on the replication results before the replication studies, showed that experts could predict well which findings would replicate (11).

So it's even stating that this isn't completely innocent, given different incentives most reviewers identify a suspicious study, but under current incentives it seems letting it through due to the novelty is somehow warranted.

[+] abandonliberty|4 years ago|reply
This is almost a tautology. Unlikely/unexpected findings are more noteworthy, so they're more likely to be both cited and false, perhaps based on small sample sizes or p-hacking.

People love this stuff. Malcolm Gladwell's made a career on it: half of the stuff he writes about is disproven before he publishes. It's very interesting that facial microexpressions analysis can predict relationship outcome with 90% certainty. Except it's just an overfit model, it can't, and he's no longer my favorite author. [0]

Similarly, Thomas Erikson's "Surrounded by Idiots" also lacks validation. [1]

Both authors have been making top 10 lists for years, and Audible's top selling list just reminded me of them.

Similarly, shocking publications in Nature or Science are to be viewed with skepticism.

I don't know what I can read anymore. It's the same with politics. The truth is morally ambiguous, time consuming, complicated, and doesn't sell. I feel powerless against market forces.

[0] https://en.wikipedia.org/wiki/John_Gottman#Critiques

[1] https://webcache.googleusercontent.com/search?q=cache:5Z7JiC...

[+] mc32|4 years ago|reply
One my pet peeves is when the local NPR station will advocate some position or policy based on a recent small study (usually by/at some state school), sometimes they'll couch it saying it's not peer reviewed, it's preliminary, or something, but it's too late, they already planted the seed, had their talking point -all with a study to back up their position and listeners just go along with it.
[+] azhenley|4 years ago|reply
So this is why my papers are cited so little!
[+] bonoboTP|4 years ago|reply
I mean, both strongly correlate with "surprising finding" so it's no surprise they correlate with each other too.
[+] maxnoe|4 years ago|reply
"We failed to reproduce the results published in [1]" is a citation.

"Our findings directly contradict [1]" is a citation.

Without context, number of citations doesn't tell you anything.

[+] for_i_in_range|4 years ago|reply
They state the cause for replication being the "findings" in the papers "are interesting".

Is this really the case? And is this actually a "new" phenomenon?

It seems like it could be a disguised version of the Availability Cascade. [1] In other words, when we encounter a simple-to-understand explanation of something complex, the explanation ends up catching on.

Then, because the explanation is simple, its popularity snowballs. The idea cascades like a waterfall throughout the public. Soon it becomes common sense—not because of sense, but because of common.

[1]: https://en.wikipedia.org/wiki/Availability_cascade

[+] gumby|4 years ago|reply
I don’t mean to pick on one field in particular, but last year I made the throwaway comment to a FB post “arXiv is the new businesswire”.

The number of academic “big shots” (friends of the poster, not of me) who “liked” the comment was a bit alarming.

There’s too much incentive for fudging things (depending on your field, either grants or company funding).

The degree of fraud in Chinese journals is high and well discussed (as it should be). But apart from a small amount of hand-wringing over “the replication crisis” there is no similar condemnation of the work in the rest of the world.

[+] anshumankmr|4 years ago|reply
There was a comment I read somewhere (not sure where exactly) but they stated that the modern peer review process would never let someone like Einstein who was a patent clerk ever get the limelight.
[+] fastaguy88|4 years ago|reply
The paper implies that less reproducible papers have a greater influence on science because they are more highly cited. But an alternate explanation suggests the opposite -- less reproducible papers are more highly cited because people are publishing papers pointing out the results are false.

It is also quite telling that the biggest differences in citation counts are for papers published in Nature and Science. But in discipline specific journals (Figs. 1B,C), the effect is very modest. Practicing scientists know that Science and Nature are publish the least reproducible results, in part because they like "sexy" (surprising, less likely to be correct) science, and in part because they provide almost no detail on how experiments were performed (no Materials and Methods).

The implication of the paper is that less reproducible science has more impact than reproducible science. But we know this is wrong -- reproducible results persist, while incorrect results do not (we haven't heard much more about the organisms that use arsenic rather than phosphorus in their DNA -- https://science.sciencemag.org/content/332/6034/1163 )

[+] admissionsguy|4 years ago|reply
When I started a PhD in a biomedical field at a top institution, we were told that our results are not that important; what's important is the ability to "sell them". This focus on presentation over content permeated the whole establishment. I remember sitting with a professor trying to come up with plausible buzzwords for a multi-million grant application.

The phenomenon described in the articles sounds like a natural consequence of this attitude.

[+] cycomanic|4 years ago|reply
I'm not really surprised about the results related to Nature (and Science to a lesser degree). I have seen it multiple times that Nature editors (who are not experts in the field) have overruled reviewer recommendations to reject, because the results were exciting.

The incentives for Nature are not to produce great science, but to sell journals and that requires them to give the impression of being on the forefront of "scientific discovery". I've in fact been told by an editor "our job is to make money, not great science)".

The irony is that their incentives also make them very risk averse, so they will not publish results which don't have a buzz around them. I know of several papers which created new "fields" which were rejected by the editors. The incentive also results in highly cited authors having an easier time to get published in Nature.

I should say that this is typically much better in the journals run by expert editors, published by the technical societies like e.g. IEEE.