top | item 42572081

(no title)

eeeeeeehio | 1 year ago

Peer review is not designed for science. Many papers are not rejected because of an issue with the science -- in fact, reviewers seldom have the time to actually check the science! As a CS-centric example: you'll almost never find a reviewer who reads a single line of code (if code is submitted with the paper at all). There is artifact review, but this is never tied to the acceptance of the paper. Reviewers focus on ideas, presentation, and the presented results. (And the current system is a good filter for this! Most accepted papers are well-written and the results always look good on paper.) However, reviewers never take the time to actually verify that the experiment code matches the ideas described in the paper, and that the results reproduce. Ask any CS/engineering PhD student how many papers (in top venues) they've seen with a critical implementation flaw that invalidates the results -- and you might begin to understand the problem.

At least in CS, the system can be fixed, but those in power are unable and unwilling to fix it. Authors don't want to be held accountable ("if we submit the code with the paper -- someone might find a critical bug and reject the paper!"), and reviewers are both unqualified (i.e. haven't written a line of code in 25 years) and unwilling to take on more responsibility ("I don't have the time to make sure their experiment code is fair!"). So we are left with an obviously broken system where junior PhD students review artifacts for "reproducibility" and this evaluation has no bearing whatsoever on whether a paper gets accepted. It's too easy to cook up positive results in almost any field (intentionally, or unintentionally), and we have a system with little accountability.

It's not "the best we have", it's "the best those in power will allow". Those in power do not want consequences for publishing bad research, and also don't want the reviewing load required to keep bad research out.

discuss

order

Ar-Curunir|1 year ago

This is much too negative. Peer review indeed misses issues with papers, but by-and-large catches the most glaring faults.

I don’t believe for one moment that the vast majority of papers in reputable conferences are wrong, if only for the simple reason that putting out incorrect research gives an easy layup for competing groups to write a follow-up paper that exposes the flaw.

It’s also a fallacy to state that papers aren’t reproducible without code. Yes code is important, but in most cases the core contribution of the research paper is not the code, but some set of ideas that together describe a novel way to approach the tackled problem.

izacus|1 year ago

I spent a chunk of my career working on productionizing code from ML/AI papers and huge part of them are outright not reproducible.

Mostly they lack critical information (missing chosen constants in equations, outright missing information on input preparation or chunks of "common knowledge algorithms"). Those that don't have measurements that outright didn't fit the reimplemented algorithms or only succeeded in their quality on the handpicked, massaged dataset of the author.

It's all worse than you can imagine.

withinboredom|1 year ago

I spent 3 months implementing a paper once. Finally, I got to the point where I understood the paper probably better than the author. It was an extremely complicated paper (homomorphic encryption). At this point, I realized that it doesn't work. There was nothing about it that would ever work, and it wasn't for lack of understanding. I emailed the author asking to clarify some specific things in the paper, they never responded.

In theory, the paper could work, but it would be incredibly weak (the key turned out to be either 1 or 0 -- a single bit).

jeltz|1 year ago

Anecdotally it is not. Most papers in CS I have read have been bad and impossible to reproduce. Maybe I have been unlucky but my experience is sadly the same.

eeeeeeehio|1 year ago

> by-and-large catches the most glaring faults.

I did not dispute that peer review acts as a filter. But reviewers are not reviewing the science, they are reviewing the paper. Authors are taking advantage of this distinction.

> if only for the simple reason that putting out incorrect research gives an easy layup for competing groups to write a follow-up paper that exposes the flaw.

You can’t make a career out of exposing flaws in existing research. Finding a flaw and showing that a paper from last year had had cooked results gets you nowhere. There’s nowhere to publish “but actually, this technique doesn’t seem to work” research. There’s no way for me to prove that the ideas will NEVER work —- only that their implementation doesn’t work as well as they claimed. Authors who claim that the value is in the ideas should stick to Twitter, where they can freely dump all of their ideas without any regard for whether they will work or not.

And if you come up with another way of solving the problem that actually works, it’s much harder to convince reviewers that the problem is interesting (because the broken paper already “solved” it!)

> in most cases the core contribution of the research paper is not the code, but some set of ideas that together describe a novel way to approach the tackled problem

And this novel approach is really only useful if it outperforms existing techniques. “We won’t share the code but our technique works really well we promise” is obviously not science. There is a flood of papers with plausible techniques that look reasonable on paper and have good results, but those results do not reproduce. It’s not really possible to prove the technique “wrong”, but the burden should be on the authors to provide proof that their technique works and on reviewers to verify it.

It’s absurd to me that mathematics proofs are usually checked during peer review, but in other fields we just take everyone at their word.

kortilla|1 year ago

They aren’t necessarily wrong but most are nearly completely useless due to some heavily downplayed or completely omitted flaw that surfaces when you try to implement the idea in actual systems.

There is technically academic novelty so it’s not “wrong”. It’s just not valuable for the field or science in general.

franga2000|1 year ago

I don't think anyone is saying it's not reproducible without code, it's just much more difficult for absolutely no reason. If I can run the code of a ML paper, I can quickly check if the examples were cherry-picked, swap in my own test or training set... The new technique or idea was still the main contribution, but I can test it immediately, apply it to new problems, optimise the performance to enable new use-cases...

It's like a chemistry paper for a new material (think the recent semiconductor thing) not including the amounts used and the way the glassware was set up. You can probably get it to work in a few attempts, but then the result doesn't have the same properties as described, so now you're not sure if your process was wrong or if their results were.

DiogenesKynikos|1 year ago

> It's not "the best we have", it's "the best those in power will allow". Those in power do not want consequences for publishing bad research, and also don't want the reviewing load required to keep bad research out.

This is a very conspiratorial view of things. The simple and true answer is your last suggestion: doing a more thorough review takes more time than anyone has available.

Reviewers work for free. Applying the level of scrutiny you're requesting would require far more work than reviewers currently do, and maybe even something approaching the amount of work required to write the paper in the first place. The more work it takes to review an article, the less willing reviewers are to volunteer their time, and the harder it is for editors to find reviewers. The current level of scrutiny that papers get at the peer-review stage is a result of how much time reviewers can realistically volunteer.

Peer review is a very low standard. It's only an initial filter to remove the garbage and to bring papers up to some basic quality standard. The real test of a paper is whether it is cited and built upon by other scientists after publication. Many papers are published and then forgotten, or found to be flawed and not used any more.

ksenzee|1 year ago

> Reviewers work for free.

If journals were operating on a shoestring budget, I might be able to understand why academics are expected to do peer review for free. As it is, it makes no sense whatsoever. Elsevier pulls down huge amounts of money and still manages to command free labor.

eeeeeeehio|1 year ago

>The real test of a paper is whether it is cited and built upon by other scientists after publication. Many papers are published and then forgotten, or found to be flawed and not used any more.

This does seem true, but this forgets the downstream effects of publishing flawed papers.

Future research in this area is stymied by reviewers who insist that the flawed research already solved the problem and/or undermines the novelty of somewhat similar solutions that actually work.

Reviewers will reject your work and insist that you include the flawed research in your own evaluations, even if you’ve already pointed out the flaws. Then, when you show that the flawed paper underperforms every other system, reviewers will reject your results and ask you why they differ from the flawed paper (no matter how clearly you explain the flaws) :/

Published papers are viewed as canon by reviewers, even if they don’t work at all. It’s very difficult to change this perception.