I don't see how this could ever work, and non-scientists seem to often dramatically underestimate the amount of work it would be to replicate every published paper.
This of course depends a lot on the specific field, but it can easily be months of effort to replicate a paper. You save some time compared to the original as you don't have to repeat the dead ends and you might receive some samples and can skip parts of the preparation that way. But properly replicating a paper will still be a lot of effort, especially when there are any issues and it doesn't work on the first try. Then you have to troubleshoot your experiments and make sure that no mistakes were made. That can add a lot of time to the process.
This is also all work that doesn't benefit the scientists replicating the paper. It only costs them money and time.
If someone cares enough about the work to build on it, they will replicate it anyway. And in that case they have a good incentive to spend the effort. If that works this will indirectly support the original paper even if the following papers don't specifically replicate the original results. Though this part is much more problematic if the following experiments fail, then this will likely remain entirely unpublished. But the solution here unfortunately isn't as simple as just publishing negative results, it take far more work to create a solid negative result than just trying the experiments and abandoning them if they're not promising.
> I don't see how this could ever work, and non-scientists seem to often dramatically underestimate the amount of work it would be to replicate every published paper.
They also tend to over-estimate the effect of peer review (often equating peer review with validity).
> If someone cares enough about the work to build on it, they will replicate it anyway. And in that case they have a good incentive to spend the effort. If that works this will indirectly support the original paper even if the following papers don't specifically replicate the original results. Though this part is much more problematic if the following experiments fail, then this will likely remain entirely unpublished.
It can also remain unpublished if other things did not work out, even if the results could be replicated. A half-fictional example: a team is working on a revolutionary new material to solve complicated engineering problems. They found a material that was synthesised by someone in the 1980s, published once and never reproduced, which they think could have the specific property they are after. So they synthesise it, and it turns out that the material exists, with the expected structure but not with the property they hoped. They aren’t going to write it up and publish it; they’re just going to scrap it and move on to the next candidate. Different teams might be doing the same thing at the same time, and nobody coming after them will have a clue.
>I don't see how this could ever work, and non-scientists seem to often dramatically underestimate the amount of work it would be to replicate every published paper.
I think it would be fine to half the productivity of these fields, if it means that you can reasonably expect papers to be accurate.
FYI there is a at least one science journal that only publishes reproduced research:
Organic Syntheses
"A unique feature of the review process is that all of the data and experiments reported in an article must be successfully repeated in the laboratory of a member of the editorial board as a check for reproducibility prior to publication"
It's simple but not easy: You create another path to tenure which is based on replication, or on equal terms as a part of a tenure package. (For example, x fewer papers but x number of replications, and you are expected to have x replications in your specialty.) You also create a grant funding section for replication which is then passed on to these independent systems. (You would have to have some sort of randomization handled as well.) Replication has to be considered at the same value as original research.
And maybe smaller faculties at R2s pivot to replication hubs. And maybe this is easier for some sections of biology, chemistry and psychology than it is for particle physics. We could start where cost of replication is relatively low and work out the details.
It's completely doable in some cases. (It may never be doable in some areas either.)
99% of all papers mean nothing. They add nothing to the collective knowledge of humanity. In my field of robotics there are SOOO many papers that are basically taking three or four established algorithms/machine learning models, and applying them to off-the-shelf hardware. The kind of thing any person educated in the field could almost guess the results exactly. Hundreds of such iterations for any reasonably popular problems space (prosthetics, drones for wildfires, museum guide robot) etc every month. Far more than could possibly be useful to anyone.
There should probably be some sort of separate process for things that actually claim to make important discoveries. I don't know what or how that should work. In all honesty maybe there should just be less papers, however that could be achieved.
> If someone cares enough about the work to build on it, they will replicate it anyway.
Well, the trouble is that hasn't been the case in practice. A lot of the replication crisis was attempting for the first time to replicate a foundational paper that dozens of other papers took as true and built on top of, and then seeing said foundational paper fail to replicate. The incentives point toward doing new research instead of replication, and that needs to change.
> If someone cares enough about the work to build on it, they will replicate it anyway.
Does it really deserve to be called work if it doesn't include the a full, working set of instructions that if followed to a T allow it to be replicated? To me that's more like pollution, making it someone else's problem. I certainly don't see how "we did this, just trust us" can even be considered science, and that's not because I don't understand the scientific method, that's because I don't make a living with it, and have no incentive to not rock the boat.
Also, don't forget that a lot of replication would fundamentally involve going and collecting additional samples / observations / etc in the field area, which is often expensive, time consuming, and logistically difficult.
It's not just "can we replicate the analysis on sample X", but also "can we collect a sample similar to X and do we observe similar things in the vicinity" in many cases. That alone may require multiple seasons of rather expensive fieldwork.
Then you have tens to hundreds of thousands of dollars in instrument time to pay to run various analysis which are needed in parallel with the field observations.
It's rarely the simple data analysis that's flawed and far more frequently subtle issues with everything else.
In most cases, rather than try to replicate, it's best to test something slightly different to build confidence in a given hypothesis about what's going on overall. That merits a separate paper and also serves a similar purpose.
E.g. don't test "can we observe the same thing at the same place?", and instead test "can we observe something similar/analogous at a different place / under different conditions?". That's the basis of a lot of replication work in geosciences. It's not considered replication, as it's a completely independent body of work, but it serves a similar purpose (and unlike replication studies, it's actually publishable).
What's the value in publishing something that is never replicated? If no one ever reproduces the experiment and gets the same results then you don't know if any interpretations based on that experiment are valid. It would also mean that whatever practical applications could have come from the experiment are never realized. It makes the entire pursuit seem completely useless.
>I don't see how this could ever work, and non-scientists seem to often dramatically underestimate the amount of work it would be to replicate every published paper.
Then perhaps those papers shouldn't be published? Or held in any higher esteem than a blog post by the same authors?
When I looked into this, more than 15 years ago, I thought the difficult portion wasn't sharing the recipe, but the ingredients, if you will - granted I was in a molecular biology lab. Effectively the Material Transfer Agreements between Universities all trying to protect their IP made working with each other unbelievably inefficient.
You'd have no idea if you were going down a well trodden path which would yield no success because you have no idea it was well trod. No one publishes negative results, etc.
I think the current system is just measuring entirely the wrong thing. Yes, fewer papers would be published. But today's goal is "publish papers" not "learn and disseminate truly useful and novel things", and while this doesn't solve it entirely, it pushes incentives further away from "publish whatever pure crap you can get away with." You get what you measure -> sometimes you need to change what/how you measure.
> If someone cares enough about the work to build on it, they will replicate it anyway.
That's duplicative at the "oh maybe this will be useful to me" stage, with N different people trying to replicate. And with replication not a first-class part of the system, the effort of replication (e_R) is high. For appealing things, N is probably > 2. So N X e_R total effort.
If you move the burden at the "replicate to publish" stage, you can fix the number of replicas needed so N=2 (or whatever) and you incentive the orginal researchers to make e_R lower (which will improve the quality of their research even before the submit-for-publication stage).
I've been in the system, I spent a year or two chasing the tail of rewrites, submissions, etc, for something that was detectable as low-effect-size in the first place but I was told would still be publishable. I found out as part of that that it would only sometimes yield a good p-value! And everything in the system incentivized me to hide that for as long as possible, instead of incentivizing me to look for something else or make it easy for others to replicate and judge for themselves.
Hell, do something like "give undergrads the opportunity to earn Master's on top of their BSes, say, by replicating (or blowing holes in) other people's submissions." I would've eaten up an opportunity like that to go really really deep* in some specialized area in exchange for a masters degree in a less-structured way than "just take a bunch more courses."
While it is a lot of work, I tend to think that one can then always publish preprints if they can't wait for the replication. I don't understand why a published paper should count as an achievement (against tenure or funding) at all before the work is replicated. The current model just creates perverse incentives to encourage lying, P-hacking, and cherry-picking. This would at least work for fields like machine learning.
This is, of course, a naive proposal without too much thought into it. But I was wondering what I would have missed here.
> I don't see how this could ever work, and non-scientists seem to often dramatically underestimate the amount of work it would be to replicate every published paper.
I don't see how the current system works really either. Fraud is rampant, and replication crisis is the most common state of most fields.
Basically the current system is failing at finding out what is true. Which is the entire point. That's pretty damn bad.
Maybe doing an experiment twice, even with a cost that is double, makes more sense so that we don't all throw away our coffee when coffee is bad, or throw away our gluten when gluten is bad, etc... (those are trivial examples) basically the cost to perform the science in many cases is so minuscule in scale to how it could affect society.
In some fields research can’t be replicated later. Much of all autism research will NEVER be replicated because the population of those considered autistic is not stable over time.
Other research proves impossible to replicate because whatever experiment was not described in enough detail to actually replicate it, which should be grounds to immediately dismiss the research before publishing, but which can’t truly be caught if you don’t actually try to reproduce.
Finally these practical concerns don’t even touch on the biggest benefit of reproduction as standard which is that almost nobody wants to reproduce research as they are not rewarded for doing so. This would give somebody, namely those who want to publish something, a strong impetus to get that reproduction done which wouldn’t otherwise exist.
> [...] non-scientists seem to often dramatically underestimate the amount of work it would be to replicate every published paper
Either "peer reviewed" articles describe progress of promising results, or they don't. If they don't the research is effectively ignored (at least until someone finds it promising). So let's consider specifically output that described promising results.
After "peer review" any apparently promising results prompt other groups to build on them by utilizing it as a step or building block.
It can take many failed attempts by independent groups before anyone dares publish the absence of the proclaimed observations, since they may try it over multiple times thinking they must have botched it somewhere.
On paper it sounds more expensive to require independent replication, but only because the costs of replication attempts are hidden until its typically rather late.
Is it really more expensive if the replication attempts are in some sense mandatory?
Or is it perhaps more expensive to pretend science has found a one-shot "peer reviewed" method, resulting in uncoordinated independent reproduction attempts that may go unannounced before, or even after failed replications?
The pseudo-final word, end of line?
What about the "in some sense mandatory" replication? Perhaps roll provable dice for each article, and in-domain sortition to randomly assign replicators. So every scientist would be spending a certain fraction of their time replicating the research of others. The types of acceptable excuses to derelict these duties should be scrutinized and controlled. But some excuses should be very valid, for example conscientious objection. If you are tasked to reproduce some of Dr. Mengele's works, you can cop out on condition that you thoroughly motivate your ethical concerns and objections. This could also bring a lot of healthy criticism to a lot of practices, which is otherwise just ignored an glossed over for fear of future career opportunities.
> I don't see how this could ever work, and non-scientists seem to often dramatically underestimate the amount of work it would be to replicate every published paper.
The alternative is a bunch of stuff being published which people belief as "science" that doesn't hold up under scrutiny, which undermines the reliability of science itself. The current approach simply gives people reason to be skeptical.
> All procedures and characterization data in OrgSyn are peer-reviewed and checked for reproducibility in the laboratory of a member of the Board of Editors
The purpose of science publications is to share new results with other scientists, so others can build on or verify the correctness of the work. There has always been an element of “receiving credit” to this, but the communication aspect is what actually matters from the perspective of maximizing scientific progress.
In the distant past, publication was an informal process that mostly involved mailing around letters, or for a major result, self-publishing a book. Eventually publishers began to devise formal journals for this purpose, and some of those journals began to receive more submissions than it was feasible to publish or verify just by reputation. Some of the more popular journals hit upon the idea of applying basic editorial standards to reject badly-written papers and obvious spam. Since the journal editors weren’t experts in all fields of science, they asked for volunteers to help with this process. That’s what peer review is.
Eventually bureaucrats (inside and largely outside of the scientific community) demanded a technique for measuring the productivity of a scientist, so they could allocate budgets or promotions. They hit on the idea of using publications in a few prestigious journals as a metric, which turned a useful process (sharing results with other scientists) into [from an outsider perspective] a process of receiving “academic points”, where the publication of a result appears to be the end-goal and not just an intermediate point in the validation of a result.
Still other outsiders, who misunderstand the entire process, are upset that intermediate results are sometimes incorrect. This confuses them, and they’re angry that the process sometimes assigns “points” to people who they perceive as undeserving. So instead of simply accepting that sharing results widely to maximize the chance of verification is the whole point of the publication process, or coming up with a better set of promotion metrics, they want to gum up the essential sharing process to make it much less efficient and reduce the fan-out degree and rate of publication. This whole mess seems like it could be handled a lot more intelligently.
Very well put. This is the clearest way of looking at it in my view.
I’ll pile on to say that you also have the variable of how the non-scientist public gleans information from the academics. Academia used to be a more insular cadre of people seeking knowledge for its own sake, so this was less relevant. What’s new here is that our society has fixated on the idea that matters of state and administration should be significantly guided by the results and opinions of academia. Our enthusiasm for science-guided policy is a triple whammy, because
1. Knowing that the results of your study have the potential to affect policy creates incentives that may change how the underlying science is performed
2. Knowing that results of academia have outside influence may change WHICH science is performed, and draw in less-than-impartial actors to perform it
3. The outsized potential impact invites the uninformed public to peer into the world of academia and draw half-baked conclusions from results that are still preliminary or unreplicated. Relatively narrow or specious studies can gain a lot of undue traction if their conclusions appear, to the untrained eye, to provide a good bat to hit your opponent with.
>
Still other outsiders, who misunderstand the entire process, are upset that intermediate results are sometimes incorrect. This confuses them, and they’re angry that the process sometimes assigns “points” to people who they perceive as undeserving. So instead of simply accepting that sharing results widely to maximize the chance of verification is the whole point of the publication process, or coming up with a better set of promotion metrics, they want to gum up the essential sharing process to make it much less efficient and reduce the fan-out degree and rate of publication.
Does not represent my experience in the academy at all. There is a ton of gamesmanship in publishing. That is ultimately the yardstick academics are measured against, whether we like it or not. No one misunderstands that IMO, the issue is that it's a poor incentive. I think creating a new class of publication, one that requires replication, could be workable in some fields (e.g. optics/photonics), but probably is totally impossible in others (e.g. experimental particle physics).
For purely intellectual fields like mathematics, theoretical physics, philosophy, you probably don't need this at all. Then there are 'in the middle fields' like machine learning which in theory would be easy to replicate, but also would be prohibitively expensive for, e.g. baseline training of LLMs.
Your analysis seems to portray all scientists as pure hearted. May I remind you of the latest Stanford scandal where the president of Stanford was found to have manipulated data?
Today, publications do not serve the same purpose as they did before the internet. It is trivial today to write a convincing paper without research and getting that published(www.theatlantic.com/ideas/archive/2018/10/new-sokal-hoax/572212/&sa=U&ved=2ahUKEwjnp5mRtsiAAxVwF1kFHesBDC8QFnoECAkQAg&usg=AOvVaw0t_Bo31BrT5D9zHBdmNAqi).
For a while Reddit had the mantra “pics or it didn’t happen”.
At least in CS/ML there needs to be a “code or it didn’t happen”. Why? Papers are ambiguous. Even if they have mathematical formulas, not all components are defined.
Peer replication in these fields is an easy low hanging fruit that could set an example for other fields of science.
I like the idea of splitting "peer review" into two, and then having a citation threshold standard where a field agrees that a paper should be replicated after a certain number of citations. And journals should have a dedicated section for attempted replications.
1. Rebrand peer review as a "readability review" which is what reviewers tend to focus on today.
2. A "replicability statement", a separately published document where reviewers push authors to go into detail about the methodology and strategy used to perform the experiments, including specifics that someone outside of their specialty may not know. Credit NalNezumi ITT
Every experimental paper I've ever read has contained an "Experimental" section, where they provide the details on how they did it. Those sections tend to be general enough, albeit concise.
In some fields, aside from specialized knowledge, good experimental work requires what we call "hands." For instance, handling air sensitive compounds, or anything in a condensed or crystalline state. In my thesis experiment, some of the equipment was hand made, by me.
Sometimes specialized facilities are needed. My doctoral thesis project used roughly 1/2 million dollars of gear, and some of the equipment that I used was obsolete and unavailable by the time I finished.
Imo, A more realistic thing to do is "replicability review" and/or requirement to submit "methodology map" to each paper.
The former would be a back and forth between a reviewer that inquire and ask questions (based on the paper) with the goal to reproduce the result, but don't have to actually reproduce it. This is usually good to find out missing details in the paper that the writer just took for granted everyone in the field knows (I've met Bio PHD that have wasted Months of their life tracking up experimental details not mentioned in a paper)
The latter would be the result of the former. Instead of having pages long "appendix" section in the main paper, you produce another document with meticulous details of the experiment/methodology with every stone turned together with an peer reviewer. Stamp it with the peer reviewes name so they can't get away with hand wavy review.
I've read too many papers where important information to reproduce the result is omitted. (for ML/RL) If the code is included I've countless of times found implementation details that is not mentioned in the paper. In matter of fact, there's even results suggesting that those details are the make or break of certain algorithms. [1] I've also seen breaking details only mentioned in code comments...
Another atrocious thing I've witnessed is a paper claiming they evaluated their method on a benchmark and if you check the benchmark, the task they evaluated on doesn't exit! They forked the benchmark and made their own task without being clear about it! [2]
Shit like this make me lose faith in certain science directions. And I've seen a couple of junior researcher giving it all up because they concluded it's all just house of cards.
Edit: also if you think that's too tedious/costly, reminder that publishers rake in record profits so the resources are already there
https://youtu.be/ukAkG6c_N4M
Who seriously thinks this shouldn't have been published until someone else had been able to replicate the result?
Who thinks the results of a drug trial can't be published until they are replicated?
How does one replicate "A stellar occultation by (486958) 2014 MU69: results from the 2017 July 17 portable telescope campaign" at https://ui.adsabs.harvard.edu/abs/2017DPS....4950403Z/abstra... which required the precise alignment of a star, the trans-Neptunian object 486958 Arrokoth, and a region in Argentina?
Or replicate the results of the flyby of Pluto, or flying a helicopter on Mars?
"""Fires are relatively common yet underreported occurrences in chemical laboratories, but their consequences can be devastating. Here we describe our first-hand experience of a savage laboratory fire, highlighting the detrimental effects that it had on the research group and the lessons learned."""
With some of the things, but admittedly not most of the things you mentioned, there's a dataset (somewhere) and some code run on that dataset (somewhere) and replication would mean someone else being able to run that code on that dataset and get the same results.
Would this require labs to improve their software environments and learn some new tools? Would this require labs to give up whatever used to be secret sauce? That's. The. Point.
> Who seriously thinks this shouldn't have been published until someone else had been able to replicate the result?
Nobody, obviously. You cannot reproduce a result that hasn’t been published, so no new phenomenon is replicated the moment it is first published. The problem is not the publication of new discoveries, it’s the lack of incentives to confirm them once they’ve been published.
In your example, new observations of giant squids are still massively valuable even if not that novel anymore. So new observations should be encouraged (as I am sure they are).
> Or replicate the results of the flyby of Pluto, or flying a helicopter on Mars?
Well, we should launch another probe anyway. And I am fairly confident we’ll have many instances of aircrafts in Mars’ atmosphere and more data than we’ll know what to do with it. We can also simulate the hell out of it. We’ll point spectrometers and a whole bunch of instruments towards Pluto. These are not really good examples of unreproducible observations.
Besides, in such cases robustness can be improved by different teams performing their own analyses separately, even if the data comes from the same experimental setup. It’s not all black or white. Observations are on a spectrum, some of them being much more reliable than others and replication is one aspect of it.
> How would peer replication be relevant?
How would you know which aspects of the observed phenomena come from particularities of this specific lab? You need more than one instance. You need some kind of statistical and factor analyses. Replication in this instance would not mean setting actual labs on fire on purpose.
It’s exactly like studying car crashes: nobody is going to kill people on purpose, but it is still important to study them so we regularly have new papers on the subject based on events that happened anyway, each one confirming or disproving previous observations.
I think in some of those cases you have conclusions drawn from raw data that could be replicated or reviewed. For example many teams use the same raw data from Large Colliders, or JWT, or other large science projects to reach competiting conclusions.
Yes in a perfect world we would also replicate the data collection but we do not live in a perfect world
Same is true for Drug Trials, there is always a battle over getting the raw data from drug trails as the companies claim that data is trade secret, so independent verification of drug trails is very expensive but if the FDA required not just the release of redacted conclusions and supporting redacted data but 100% of all data gathered it would be alot better IMO
For example the FDA says it will take decades to release the raw data from the COVID Vaccine trials.. Why... and that is after being forced to do so via a law suit.
I spent a lot of my graduate years in CS implementing the details of papers only to learn that, time and time again, the paper failed to mention all the short comings and fail cases of the techniques. There are great exceptions to this.
Due to the pressure of "publish or die" there is very little honesty in research. Fortunately there are some who are transparent with their work. But for the most part, science is drowning in a sea of research that lacks transparency and replication short falls.
In the PL field, conferences have started to allow authors to submit packaged artifacts (typically, source code, input data, training data, etc) that are evaluated separately, typically post-review. The artifacts are evaluated by a separate committee, usually graduate students. As usual, everything is volunteer. Even with explicit instructions, it is hard enough to even get the same code to run in a different environment and give the same results. Would "replication" of a software technique require another team to reimplement something from scratch? That seems unworkable.
I can't even imagine how hard it would be to write instructions for another lab to successfully replicate an experiment at the forefront of physics or chemistry, or biology. Not just the specialized equipment, but we're talking about the frontiers of Science with people doing cutting-edge research.
I get the impression that suggestions like these are written by non-scientists who do not have experience with the peer review process of any discipline. Things just don't work like that.
On replication, it is a worthwhile goal but the career incentives need to be there. I think replicating studies should be a part of the curriculum in most programs - a step toward getting a PhD in lieu of one of the papers.
The website dies if I try to figure out who the author (“sam”) is, but it sounds like they are used to some awful backwater of academia.
They have this idea that a single editor screens papers to decide if they are uninteresting or fundamentally flawed, then they want a bunch of professors to do grunt work litigating the correctness of the experiments.
In modern (post industrial revolution) branches of science, the work of determining what is worthy of publication is distributed amongst a program committee, which is comprised of reviewers. The editor / conference organizers pick the program committee. There are typically dozens of program committee members, and authors and reviewers both disclose conflicts. Also, papers are anonymized, so the people that see the author list are not involved in accept/reject decisions.
This mostly eliminates the problem where work is suppressed for political reasons, etc.
It is increasingly common for paper PDFs to be annotated with badges showing the level of reproducibility of the work, and papers can win awards for being highly reproducible. The people that check reproducibility simply execute directions from a separate reproducibility submission that is produced after the paper is accepted.
I argue the above approach is about 100 years ahead of what the blog post is suggesting.
Ideally, we would tie federal funding to double blind review and venues with program committees, and papers selected by editors would not count toward tenure at universities that receive public funding.
As much as I agree with the sentiment, we have to admit it isn't always practical. There's only one LIGO, LHC or JWST, for example. Similarly, not every lab has the resources or know-how to host multi-TB datasets for the general public to pick through, even if they wanted to. I sure didn't when I was a grad student.
That said, it infuriates me to no end when I read a Phys. Rev. paper that consists of a computational study of a particular physical system, and the only replicability information provided is the governing equation and a vague description of the numerical technique. No discretized example, no algorithm, and sure as hell no code repository. I'm sure other fields have this too. The only motivation I see for this behavior is the desire for a monopoly on the research topic on the part of authors, or embarrassment by poor code quality (real or perceived).
One thing I think people are missing, is that labs replicate other experiments all the time as part of doing their own research. It's just that the results are not always published, or not published in a like-for-like way.
But the information gets around. In my former field, everyone knew which were the dodgy papers, with results no-one could replicate.
Reproducibility would become a much higher priority if electronic versions of papers are required (by their distributors, archives, institutions, ...) to have reproduction sections, which the authors are encouraged to update over time.
Making this stuff more visible would help reproducers validated the value of reproduction to their home and funding institutions.
Having a standard section for this, with an initial state of "Not reproduced" provides more incentive for original workers to provide better reproduction info.
For algorithm and math work the reproduction could be served best with downloadable executable bundle.
You know what I would love to see is metadata attributes surrounding a paper such as [retracted], [reproduced], [rejected], etc. We already have the preprint thing down. Some of these would be implied by being published, ie not a preprint. Maybe even a quick symbol for what method of proof was relied upon—-video evidence, randomized control trial, observational study, Sample count of n>1000 (predefined inequality brackets), etc. I think having this quick digest of information would help an individual wade through a lot of studies quickly.
If they are, in fact, implying that another lab should produce a matching data-set to try to replicate results, well, I'm sorry, but that won't work, at least in a whole lot of fields. Data collection can be very expensive, and take a lot of time. It certainly is in my field.
If on, the other hand, they just want the raw data, and let others go to town on it in their own way, that's fine, probably. Results that don't depend on very particular details of the processing pipeline are probably more robust anyway.
How do you replicate a literature review? Theoretical physics? A neuro case? Research that relies upon natural experiments? There are many types of research. Not all of them lend themselves to replication, but they can still contribute to our body of knowledge. Peer review is helpful in each of these instances.
Science is a process. Peer review isn't perfect. Replication is important. But it doesn't seem like the author understands what it would take to simply replace peer review with replication.
We can have tiers. Tier 1 peer reviewed. Tier 2 peer replicated. We can have it as a stamp on the papers.
All PhD programs have requirement for a minimum number of novel publications. We could add to the requirements a minimum number of replications.
But truth to be told, a PhD in science/ engineering will probably spend their first two years trying to replicate the SOTA anyway. It’s just that today you cannot publish this effort, nobody cares, except yourself and your advisor.
[+] [-] fabian2k|2 years ago|reply
This of course depends a lot on the specific field, but it can easily be months of effort to replicate a paper. You save some time compared to the original as you don't have to repeat the dead ends and you might receive some samples and can skip parts of the preparation that way. But properly replicating a paper will still be a lot of effort, especially when there are any issues and it doesn't work on the first try. Then you have to troubleshoot your experiments and make sure that no mistakes were made. That can add a lot of time to the process.
This is also all work that doesn't benefit the scientists replicating the paper. It only costs them money and time.
If someone cares enough about the work to build on it, they will replicate it anyway. And in that case they have a good incentive to spend the effort. If that works this will indirectly support the original paper even if the following papers don't specifically replicate the original results. Though this part is much more problematic if the following experiments fail, then this will likely remain entirely unpublished. But the solution here unfortunately isn't as simple as just publishing negative results, it take far more work to create a solid negative result than just trying the experiments and abandoning them if they're not promising.
[+] [-] kergonath|2 years ago|reply
They also tend to over-estimate the effect of peer review (often equating peer review with validity).
> If someone cares enough about the work to build on it, they will replicate it anyway. And in that case they have a good incentive to spend the effort. If that works this will indirectly support the original paper even if the following papers don't specifically replicate the original results. Though this part is much more problematic if the following experiments fail, then this will likely remain entirely unpublished.
It can also remain unpublished if other things did not work out, even if the results could be replicated. A half-fictional example: a team is working on a revolutionary new material to solve complicated engineering problems. They found a material that was synthesised by someone in the 1980s, published once and never reproduced, which they think could have the specific property they are after. So they synthesise it, and it turns out that the material exists, with the expected structure but not with the property they hoped. They aren’t going to write it up and publish it; they’re just going to scrap it and move on to the next candidate. Different teams might be doing the same thing at the same time, and nobody coming after them will have a clue.
[+] [-] sebzim4500|2 years ago|reply
I think it would be fine to half the productivity of these fields, if it means that you can reasonably expect papers to be accurate.
[+] [-] sqrt_1|2 years ago|reply
Organic Syntheses "A unique feature of the review process is that all of the data and experiments reported in an article must be successfully repeated in the laboratory of a member of the editorial board as a check for reproducibility prior to publication"
https://en.wikipedia.org/wiki/Organic_Syntheses
[+] [-] ebiester|2 years ago|reply
And maybe smaller faculties at R2s pivot to replication hubs. And maybe this is easier for some sections of biology, chemistry and psychology than it is for particle physics. We could start where cost of replication is relatively low and work out the details.
It's completely doable in some cases. (It may never be doable in some areas either.)
[+] [-] RugnirViking|2 years ago|reply
99% of all papers mean nothing. They add nothing to the collective knowledge of humanity. In my field of robotics there are SOOO many papers that are basically taking three or four established algorithms/machine learning models, and applying them to off-the-shelf hardware. The kind of thing any person educated in the field could almost guess the results exactly. Hundreds of such iterations for any reasonably popular problems space (prosthetics, drones for wildfires, museum guide robot) etc every month. Far more than could possibly be useful to anyone.
There should probably be some sort of separate process for things that actually claim to make important discoveries. I don't know what or how that should work. In all honesty maybe there should just be less papers, however that could be achieved.
[+] [-] justinpombrio|2 years ago|reply
Well, the trouble is that hasn't been the case in practice. A lot of the replication crisis was attempting for the first time to replicate a foundational paper that dozens of other papers took as true and built on top of, and then seeing said foundational paper fail to replicate. The incentives point toward doing new research instead of replication, and that needs to change.
[+] [-] johnnyworker|2 years ago|reply
Does it really deserve to be called work if it doesn't include the a full, working set of instructions that if followed to a T allow it to be replicated? To me that's more like pollution, making it someone else's problem. I certainly don't see how "we did this, just trust us" can even be considered science, and that's not because I don't understand the scientific method, that's because I don't make a living with it, and have no incentive to not rock the boat.
[+] [-] jofer|2 years ago|reply
It's not just "can we replicate the analysis on sample X", but also "can we collect a sample similar to X and do we observe similar things in the vicinity" in many cases. That alone may require multiple seasons of rather expensive fieldwork.
Then you have tens to hundreds of thousands of dollars in instrument time to pay to run various analysis which are needed in parallel with the field observations.
It's rarely the simple data analysis that's flawed and far more frequently subtle issues with everything else.
In most cases, rather than try to replicate, it's best to test something slightly different to build confidence in a given hypothesis about what's going on overall. That merits a separate paper and also serves a similar purpose.
E.g. don't test "can we observe the same thing at the same place?", and instead test "can we observe something similar/analogous at a different place / under different conditions?". That's the basis of a lot of replication work in geosciences. It's not considered replication, as it's a completely independent body of work, but it serves a similar purpose (and unlike replication studies, it's actually publishable).
[+] [-] throwaway4aday|2 years ago|reply
[+] [-] mattkrause|2 years ago|reply
Some experiments that study biological development or trained animals can take a year or more of fairly intense effort to start generating data.
[+] [-] coldtea|2 years ago|reply
Then perhaps those papers shouldn't be published? Or held in any higher esteem than a blog post by the same authors?
[+] [-] kshahkshah|2 years ago|reply
You'd have no idea if you were going down a well trodden path which would yield no success because you have no idea it was well trod. No one publishes negative results, etc.
[+] [-] majormajor|2 years ago|reply
> If someone cares enough about the work to build on it, they will replicate it anyway.
That's duplicative at the "oh maybe this will be useful to me" stage, with N different people trying to replicate. And with replication not a first-class part of the system, the effort of replication (e_R) is high. For appealing things, N is probably > 2. So N X e_R total effort.
If you move the burden at the "replicate to publish" stage, you can fix the number of replicas needed so N=2 (or whatever) and you incentive the orginal researchers to make e_R lower (which will improve the quality of their research even before the submit-for-publication stage).
I've been in the system, I spent a year or two chasing the tail of rewrites, submissions, etc, for something that was detectable as low-effect-size in the first place but I was told would still be publishable. I found out as part of that that it would only sometimes yield a good p-value! And everything in the system incentivized me to hide that for as long as possible, instead of incentivizing me to look for something else or make it easy for others to replicate and judge for themselves.
Hell, do something like "give undergrads the opportunity to earn Master's on top of their BSes, say, by replicating (or blowing holes in) other people's submissions." I would've eaten up an opportunity like that to go really really deep* in some specialized area in exchange for a masters degree in a less-structured way than "just take a bunch more courses."
[+] [-] oldgradstudent|2 years ago|reply
If you build upon a result, you almost have to replicate it.
An acquaintance spent years building upon a result that turned out to be fraudulent/p-hacked.
[+] [-] dongping|2 years ago|reply
This is, of course, a naive proposal without too much thought into it. But I was wondering what I would have missed here.
[+] [-] boxed|2 years ago|reply
I don't see how the current system works really either. Fraud is rampant, and replication crisis is the most common state of most fields.
Basically the current system is failing at finding out what is true. Which is the entire point. That's pretty damn bad.
[+] [-] coding123|2 years ago|reply
[+] [-] faeriechangling|2 years ago|reply
Other research proves impossible to replicate because whatever experiment was not described in enough detail to actually replicate it, which should be grounds to immediately dismiss the research before publishing, but which can’t truly be caught if you don’t actually try to reproduce.
Finally these practical concerns don’t even touch on the biggest benefit of reproduction as standard which is that almost nobody wants to reproduce research as they are not rewarded for doing so. This would give somebody, namely those who want to publish something, a strong impetus to get that reproduction done which wouldn’t otherwise exist.
[+] [-] DoctorOetker|2 years ago|reply
Either "peer reviewed" articles describe progress of promising results, or they don't. If they don't the research is effectively ignored (at least until someone finds it promising). So let's consider specifically output that described promising results.
After "peer review" any apparently promising results prompt other groups to build on them by utilizing it as a step or building block.
It can take many failed attempts by independent groups before anyone dares publish the absence of the proclaimed observations, since they may try it over multiple times thinking they must have botched it somewhere.
On paper it sounds more expensive to require independent replication, but only because the costs of replication attempts are hidden until its typically rather late.
Is it really more expensive if the replication attempts are in some sense mandatory?
Or is it perhaps more expensive to pretend science has found a one-shot "peer reviewed" method, resulting in uncoordinated independent reproduction attempts that may go unannounced before, or even after failed replications?
The pseudo-final word, end of line?
What about the "in some sense mandatory" replication? Perhaps roll provable dice for each article, and in-domain sortition to randomly assign replicators. So every scientist would be spending a certain fraction of their time replicating the research of others. The types of acceptable excuses to derelict these duties should be scrutinized and controlled. But some excuses should be very valid, for example conscientious objection. If you are tasked to reproduce some of Dr. Mengele's works, you can cop out on condition that you thoroughly motivate your ethical concerns and objections. This could also bring a lot of healthy criticism to a lot of practices, which is otherwise just ignored an glossed over for fear of future career opportunities.
[+] [-] brightball|2 years ago|reply
The alternative is a bunch of stuff being published which people belief as "science" that doesn't hold up under scrutiny, which undermines the reliability of science itself. The current approach simply gives people reason to be skeptical.
[+] [-] backtoyoujim|2 years ago|reply
It would mean disruption is no longer a useful tool for human development.
[+] [-] throwawaymaths|2 years ago|reply
http://www.orgsyn.org/
> All procedures and characterization data in OrgSyn are peer-reviewed and checked for reproducibility in the laboratory of a member of the Board of Editors
Never is a strong word.
[+] [-] matthewdgreen|2 years ago|reply
In the distant past, publication was an informal process that mostly involved mailing around letters, or for a major result, self-publishing a book. Eventually publishers began to devise formal journals for this purpose, and some of those journals began to receive more submissions than it was feasible to publish or verify just by reputation. Some of the more popular journals hit upon the idea of applying basic editorial standards to reject badly-written papers and obvious spam. Since the journal editors weren’t experts in all fields of science, they asked for volunteers to help with this process. That’s what peer review is.
Eventually bureaucrats (inside and largely outside of the scientific community) demanded a technique for measuring the productivity of a scientist, so they could allocate budgets or promotions. They hit on the idea of using publications in a few prestigious journals as a metric, which turned a useful process (sharing results with other scientists) into [from an outsider perspective] a process of receiving “academic points”, where the publication of a result appears to be the end-goal and not just an intermediate point in the validation of a result.
Still other outsiders, who misunderstand the entire process, are upset that intermediate results are sometimes incorrect. This confuses them, and they’re angry that the process sometimes assigns “points” to people who they perceive as undeserving. So instead of simply accepting that sharing results widely to maximize the chance of verification is the whole point of the publication process, or coming up with a better set of promotion metrics, they want to gum up the essential sharing process to make it much less efficient and reduce the fan-out degree and rate of publication. This whole mess seems like it could be handled a lot more intelligently.
[+] [-] sebastos|2 years ago|reply
I’ll pile on to say that you also have the variable of how the non-scientist public gleans information from the academics. Academia used to be a more insular cadre of people seeking knowledge for its own sake, so this was less relevant. What’s new here is that our society has fixated on the idea that matters of state and administration should be significantly guided by the results and opinions of academia. Our enthusiasm for science-guided policy is a triple whammy, because 1. Knowing that the results of your study have the potential to affect policy creates incentives that may change how the underlying science is performed 2. Knowing that results of academia have outside influence may change WHICH science is performed, and draw in less-than-impartial actors to perform it 3. The outsized potential impact invites the uninformed public to peer into the world of academia and draw half-baked conclusions from results that are still preliminary or unreplicated. Relatively narrow or specious studies can gain a lot of undue traction if their conclusions appear, to the untrained eye, to provide a good bat to hit your opponent with.
[+] [-] casualscience|2 years ago|reply
> Still other outsiders, who misunderstand the entire process, are upset that intermediate results are sometimes incorrect. This confuses them, and they’re angry that the process sometimes assigns “points” to people who they perceive as undeserving. So instead of simply accepting that sharing results widely to maximize the chance of verification is the whole point of the publication process, or coming up with a better set of promotion metrics, they want to gum up the essential sharing process to make it much less efficient and reduce the fan-out degree and rate of publication.
Does not represent my experience in the academy at all. There is a ton of gamesmanship in publishing. That is ultimately the yardstick academics are measured against, whether we like it or not. No one misunderstands that IMO, the issue is that it's a poor incentive. I think creating a new class of publication, one that requires replication, could be workable in some fields (e.g. optics/photonics), but probably is totally impossible in others (e.g. experimental particle physics).
For purely intellectual fields like mathematics, theoretical physics, philosophy, you probably don't need this at all. Then there are 'in the middle fields' like machine learning which in theory would be easy to replicate, but also would be prohibitively expensive for, e.g. baseline training of LLMs.
[+] [-] nine_k|2 years ago|reply
The public perception of a publication in a prestigious journal as the established truth does not help, too.
[+] [-] dmbche|2 years ago|reply
Today, publications do not serve the same purpose as they did before the internet. It is trivial today to write a convincing paper without research and getting that published(www.theatlantic.com/ideas/archive/2018/10/new-sokal-hoax/572212/&sa=U&ved=2ahUKEwjnp5mRtsiAAxVwF1kFHesBDC8QFnoECAkQAg&usg=AOvVaw0t_Bo31BrT5D9zHBdmNAqi).
[+] [-] miga|2 years ago|reply
Given that some experiments cost billions to conduct, it is impossible to implement "Peer Replication" for all papers.
What could be done is to add metadata about papers that were replicated.
[+] [-] janalsncm|2 years ago|reply
At least in CS/ML there needs to be a “code or it didn’t happen”. Why? Papers are ambiguous. Even if they have mathematical formulas, not all components are defined.
Peer replication in these fields is an easy low hanging fruit that could set an example for other fields of science.
[+] [-] infogulch|2 years ago|reply
1. Rebrand peer review as a "readability review" which is what reviewers tend to focus on today.
2. A "replicability statement", a separately published document where reviewers push authors to go into detail about the methodology and strategy used to perform the experiments, including specifics that someone outside of their specialty may not know. Credit NalNezumi ITT
[+] [-] analog31|2 years ago|reply
In some fields, aside from specialized knowledge, good experimental work requires what we call "hands." For instance, handling air sensitive compounds, or anything in a condensed or crystalline state. In my thesis experiment, some of the equipment was hand made, by me.
Sometimes specialized facilities are needed. My doctoral thesis project used roughly 1/2 million dollars of gear, and some of the equipment that I used was obsolete and unavailable by the time I finished.
[+] [-] NalNezumi|2 years ago|reply
The former would be a back and forth between a reviewer that inquire and ask questions (based on the paper) with the goal to reproduce the result, but don't have to actually reproduce it. This is usually good to find out missing details in the paper that the writer just took for granted everyone in the field knows (I've met Bio PHD that have wasted Months of their life tracking up experimental details not mentioned in a paper)
The latter would be the result of the former. Instead of having pages long "appendix" section in the main paper, you produce another document with meticulous details of the experiment/methodology with every stone turned together with an peer reviewer. Stamp it with the peer reviewes name so they can't get away with hand wavy review.
I've read too many papers where important information to reproduce the result is omitted. (for ML/RL) If the code is included I've countless of times found implementation details that is not mentioned in the paper. In matter of fact, there's even results suggesting that those details are the make or break of certain algorithms. [1] I've also seen breaking details only mentioned in code comments...
Another atrocious thing I've witnessed is a paper claiming they evaluated their method on a benchmark and if you check the benchmark, the task they evaluated on doesn't exit! They forked the benchmark and made their own task without being clear about it! [2]
Shit like this make me lose faith in certain science directions. And I've seen a couple of junior researcher giving it all up because they concluded it's all just house of cards.
[1] https://arxiv.org/abs/2005.12729
[2] https://arxiv.org/abs/2202.02465
Edit: also if you think that's too tedious/costly, reminder that publishers rake in record profits so the resources are already there https://youtu.be/ukAkG6c_N4M
[+] [-] eesmith|2 years ago|reply
> What if all the experiments in the paper are too complicated to replicate? Then you can submit to [the Journal of Irreproducible Results].
Observational science is still a branch of science even if it's difficult or impossible to replicate.
Consider the first photographs of a live giant squid in its natural habitat, published in 2005 at https://royalsocietypublishing.org/doi/10.1098/rspb.2005.315... .
Who seriously thinks this shouldn't have been published until someone else had been able to replicate the result?
Who thinks the results of a drug trial can't be published until they are replicated?
How does one replicate "A stellar occultation by (486958) 2014 MU69: results from the 2017 July 17 portable telescope campaign" at https://ui.adsabs.harvard.edu/abs/2017DPS....4950403Z/abstra... which required the precise alignment of a star, the trans-Neptunian object 486958 Arrokoth, and a region in Argentina?
Or replicate the results of the flyby of Pluto, or flying a helicopter on Mars?
Here's a paper I learned about from "In The Pipeline"; "Insights from a laboratory fire" at https://www.nature.com/articles/s41557-023-01254-6 .
"""Fires are relatively common yet underreported occurrences in chemical laboratories, but their consequences can be devastating. Here we describe our first-hand experience of a savage laboratory fire, highlighting the detrimental effects that it had on the research group and the lessons learned."""
How would peer replication be relevant?
[+] [-] msla|2 years ago|reply
Would this require labs to improve their software environments and learn some new tools? Would this require labs to give up whatever used to be secret sauce? That's. The. Point.
[+] [-] kergonath|2 years ago|reply
Nobody, obviously. You cannot reproduce a result that hasn’t been published, so no new phenomenon is replicated the moment it is first published. The problem is not the publication of new discoveries, it’s the lack of incentives to confirm them once they’ve been published.
In your example, new observations of giant squids are still massively valuable even if not that novel anymore. So new observations should be encouraged (as I am sure they are).
> Or replicate the results of the flyby of Pluto, or flying a helicopter on Mars?
Well, we should launch another probe anyway. And I am fairly confident we’ll have many instances of aircrafts in Mars’ atmosphere and more data than we’ll know what to do with it. We can also simulate the hell out of it. We’ll point spectrometers and a whole bunch of instruments towards Pluto. These are not really good examples of unreproducible observations.
Besides, in such cases robustness can be improved by different teams performing their own analyses separately, even if the data comes from the same experimental setup. It’s not all black or white. Observations are on a spectrum, some of them being much more reliable than others and replication is one aspect of it.
> How would peer replication be relevant?
How would you know which aspects of the observed phenomena come from particularities of this specific lab? You need more than one instance. You need some kind of statistical and factor analyses. Replication in this instance would not mean setting actual labs on fire on purpose.
It’s exactly like studying car crashes: nobody is going to kill people on purpose, but it is still important to study them so we regularly have new papers on the subject based on events that happened anyway, each one confirming or disproving previous observations.
[+] [-] phpisthebest|2 years ago|reply
Yes in a perfect world we would also replicate the data collection but we do not live in a perfect world
Same is true for Drug Trials, there is always a battle over getting the raw data from drug trails as the companies claim that data is trade secret, so independent verification of drug trails is very expensive but if the FDA required not just the release of redacted conclusions and supporting redacted data but 100% of all data gathered it would be alot better IMO
For example the FDA says it will take decades to release the raw data from the COVID Vaccine trials.. Why... and that is after being forced to do so via a law suit.
[+] [-] waynecochran|2 years ago|reply
Due to the pressure of "publish or die" there is very little honesty in research. Fortunately there are some who are transparent with their work. But for the most part, science is drowning in a sea of research that lacks transparency and replication short falls.
[+] [-] titzer|2 years ago|reply
I can't even imagine how hard it would be to write instructions for another lab to successfully replicate an experiment at the forefront of physics or chemistry, or biology. Not just the specialized equipment, but we're talking about the frontiers of Science with people doing cutting-edge research.
I get the impression that suggestions like these are written by non-scientists who do not have experience with the peer review process of any discipline. Things just don't work like that.
[+] [-] leedrake5|2 years ago|reply
On replication, it is a worthwhile goal but the career incentives need to be there. I think replicating studies should be a part of the curriculum in most programs - a step toward getting a PhD in lieu of one of the papers.
[+] [-] hedora|2 years ago|reply
They have this idea that a single editor screens papers to decide if they are uninteresting or fundamentally flawed, then they want a bunch of professors to do grunt work litigating the correctness of the experiments.
In modern (post industrial revolution) branches of science, the work of determining what is worthy of publication is distributed amongst a program committee, which is comprised of reviewers. The editor / conference organizers pick the program committee. There are typically dozens of program committee members, and authors and reviewers both disclose conflicts. Also, papers are anonymized, so the people that see the author list are not involved in accept/reject decisions.
This mostly eliminates the problem where work is suppressed for political reasons, etc.
It is increasingly common for paper PDFs to be annotated with badges showing the level of reproducibility of the work, and papers can win awards for being highly reproducible. The people that check reproducibility simply execute directions from a separate reproducibility submission that is produced after the paper is accepted.
I argue the above approach is about 100 years ahead of what the blog post is suggesting.
Ideally, we would tie federal funding to double blind review and venues with program committees, and papers selected by editors would not count toward tenure at universities that receive public funding.
[+] [-] fastneutron|2 years ago|reply
That said, it infuriates me to no end when I read a Phys. Rev. paper that consists of a computational study of a particular physical system, and the only replicability information provided is the governing equation and a vague description of the numerical technique. No discretized example, no algorithm, and sure as hell no code repository. I'm sure other fields have this too. The only motivation I see for this behavior is the desire for a monopoly on the research topic on the part of authors, or embarrassment by poor code quality (real or perceived).
[+] [-] JR1427|2 years ago|reply
But the information gets around. In my former field, everyone knew which were the dodgy papers, with results no-one could replicate.
[+] [-] Nevermark|2 years ago|reply
UPDATABLE COVER PAGE:
Title Authors
Abstract
State of reproduction: UPDATABLE REPRODUCTION SECTION ATTACHED AT ENDReproduction resources:
Reproduction challenges: Making this stuff more visible would help reproducers validated the value of reproduction to their home and funding institutions.Having a standard section for this, with an initial state of "Not reproduced" provides more incentive for original workers to provide better reproduction info.
For algorithm and math work the reproduction could be served best with downloadable executable bundle.
[+] [-] nomilk|2 years ago|reply
[+] [-] jxramos|2 years ago|reply
[+] [-] jonnycomputer|2 years ago|reply
If on, the other hand, they just want the raw data, and let others go to town on it in their own way, that's fine, probably. Results that don't depend on very particular details of the processing pipeline are probably more robust anyway.
[+] [-] jimmar|2 years ago|reply
Science is a process. Peer review isn't perfect. Replication is important. But it doesn't seem like the author understands what it would take to simply replace peer review with replication.
[+] [-] whatever1|2 years ago|reply
All PhD programs have requirement for a minimum number of novel publications. We could add to the requirements a minimum number of replications.
But truth to be told, a PhD in science/ engineering will probably spend their first two years trying to replicate the SOTA anyway. It’s just that today you cannot publish this effort, nobody cares, except yourself and your advisor.
[+] [-] hgsgm|2 years ago|reply
Publication is a starting point, not a conclusion
Publication is submitting your code. It still needs to be tested, rolled out, evaluated, and time-tested.