top | item 44584984

(no title)

xondono | 7 months ago

That’s because we’ve basically reinterpreted what “peer review” is.

Peer review used to mean “some peers have reviewed it”, mainly the editors, who pushed for correctness and novelty. There was a clear difference between publishing and making a paper public. It never meant “it’s right”, but it meant “it has passed basic quality control and it’s worth your time to read it”.

Modern day academics push people to fragment into ever smaller niches, meaning most editors are nowadays completely out of their depth when evaluating papers, so now we keep referring to editor approval as “peer review” and try to diminish the public perception that comes with it.

discuss

tensor|7 months ago

This is not true. In most of the top journals you need at least three other practitioners in your field to read it and sign off on it. The editor finds the appropriate reviewers, manages the process, does some basic format and other types of vetting, and also will accept or reject it based on the reviews from the reviewers.

The reviewers here are the "peers", and generally are expected to be qualified experts in the area that the paper deals with.

godelski|7 months ago

  > This is not true. In most of the top journals you need at least three other practitioners in your field to read it and sign off on it.

You're misreading xondono as well as me. I think your idea of what peer review is (in practice) is too idealized.

The problem is the word "expert". We're using it to mean different things, and the difference is important. Despite it appearing that way, "expert" is not a binary condition. It is a spectrum. Where along the spectrum requires context to determine the threshold. Ours (xondono, correct me if I misinterpreted), is higher than the one you're using.

Finding appropriate reviewers is a non-trivial task, which is kinda the entire problem. You can have a PhD in machine learning and that does not mean you're qualified to review another machine learning paper. I know, because I've told ACs I'm not qualified for certain works!

The problem is that what is being published is new knowledge. I'll refer to the (very very short) "illustrated guide to a Ph.D." How many people are qualified to determine if that knowledge is new? It's probably a lot fewer than you think. Let's go back to ML. Let's say your PhD and all your work is in Vision Transformers. Does that mean you're qualified to evaluate a paper on diffusion models? Truth is, probably not. Hell, there's been papers I've reviewed where I'm literally 1 of 2 people in the world who are the appropriate reviewers (the other is the main author of the paper we wrote that's being extended).

Hell, most people working on diffusion aren't even qualified to properly evaluate every diffusion paper! Here's a great example, where this work is more on the mathy side of diffusion models and you can look at the reviews[1]. Reviews are 6 (Weak Accept), 9 (Very Strong Accept), 8 (Strong Accept), 8, 6. Reviewer confidence was even low: 2, 4, 3, 3, 4, respectively (out of 5), and confidence is usually over stated.

Mind you, this is the #1 ML conference and these reviews are post rebuttal. There were over 13000 people reviewing that year[2] and they couldn't get people who had 5/5 confidence. This is even for a paper written by 2 top researchers at a top institution...

  > The reviewers here are the "peers", and generally are expected to be qualified experts in the area that the paper deals with.

So no. They are "expert" when compared to the general public, but not necessarily "expert" in context to the paper being reviewed.

I hope the physical evidence is enough to convince you, because honestly this is quite common and there's a viewing bias. Most of the time we don't have this data for works that were rejected. But there's plenty of works that were accepted that you can see this. Not to mention (as stated in my original comment), multiple extremely influential works (worthy of a Nobel Prize) have been rejected. Here's a pretty famous example, where it had both been rejected for being "too trivial" (twice) as well as "obviously incorrect."[3] Yet, it resulted in a Nobel and is one of the most cited works in the field. Doesn't sound like these reviews helped the paper become better, sounds more like it was just wasting time.

[0] https://matt.might.net/articles/phd-school-in-pictures/

[1] https://openreview.net/forum?id=NnMEadcdyD

[2] https://media.neurips.cc/Conferences/NeurIPS2024/NeurIPS2024...

[3] https://en.wikipedia.org/wiki/The_Market_for_Lemons#Critical...

olddustytrail|7 months ago

> That’s because we’ve basically reinterpreted what “peer review” is.

Who is "we" in this scenario? Because that's certainly not how I've seen peer review work.

The editor would ask a small group of people in the field to act as reviewers and then send them the papers. They review it and send it back with any requests for changes prior to publication.

So they're the peers that are reviewing, not the editor.

godelski|7 months ago

Look at the history of Peer Review. What you see post 1950 is pretty different than what you see prior to that. I think this quote is the best one-liner, though I think everyone should dig much more into the question

  > in the early 20th century, "the burden of proof was generally on the opponents rather than the proponents of new ideas.

That is, the reviewers had a higher burden than the authors. The bias is towards acceptance rather than rejection. In a perfect world we could only accept good papers and could reject bad papers, but we don't live in that world. So the question is "when we fail, which way do we want to fail?" Obviously, I'm on the side of Blackstone here

https://en.wikipedia.org/wiki/Scholarly_peer_review#History

pessimizer|7 months ago

I think they meant "reinterpreted" over the last century, not over the span of your personal experience and career.

godelski|7 months ago

I pretty much agree with you but wanted to nitpick this part

  > mainly the editors, who pushed for correctness and novelty.

I don't want to use the word correctness here[0], because no one checks if the work is correct. Rather, I'd say the goal is to check for wrongness. A peer reviewer cannot determine if a work is correct simply by reading it. The only way to do this is replication or by extension (which is the case of the work here. The physical verification was an extension of the earlier work). It's important to make this distinction because, as you say, it doesn't mean the work is right. Nor does it even mean the the readers think it is right.

In the past, many journals published as long as they did not think there were serious errors and were not plagiarized. Editing is completely different, where we want to make sure works are communicated correctly.

But I purposefully didn't say "novelty"

It is a trash word that means nothing. The original intent was that work wasn't redone. That you can't go in and take credit for discovering something someone else did, which we'd cal plagiarism. You could change all the words and still plagiarize.

It is VERY easy to find problems/limitations with works. All works have limitations. All works are incomplete. But are these reasons to reject? Often, no... You see the same thing on HN and it's a classic bias of STEM people. Hyperfixate on the issues. We're trained to, because that's the first step to solving problems! But that's not what matters in publishing, because we're not trying to solve all problems. We do it iteratively! It also runs counter to quickly publishing ("publish or perish") as what, you want to wait to publish till we got a grand theory of everything? And don't get me started on how bad we are at predicting impact of works and how impact often runs counter to the status quo (you can't paradigm shift by maintaining the paradigm). So we don't explore...

AND very frequently, we DO NOT WANT novelty in science. Sounds strange, but it is *critical* to science existing.

- Our goal is to figure out how things work. The causal structure of things. So this means works need to be reproducible. We *want* reproductions, but we also don't want them ad infinitum.

- We *also* want to find other ways to derive the same thing. Some reviewers will consider this novel while others won't, typically inversely related to their expertise in the field (more expert = more likely to consider novel while less expert means you can't see the nuanced differences which are important).

This greatly stifles innovation and reduces how well papers communicate their ideas.

The problem here is as we advance, nuances matter more and more. Think of it as with approximations. Calculating the first order term is usually computationally easy, with computation exponentially increasing as the order of accuracy increases. The nuances start to dominate. But by focusing on "novelty" (rather than plagiarism) we face the exact problem you mention.

  > most editors are nowadays completely out of their depth when evaluating papers,

So authors end up just making their works look more convoluted, to look more impressive and make it look less like the work that they are building on top of. High experts can see right through this and as grad students usually groan but then just become accustomed to the shit and start doing the same thing. Because, non-niche experts cannot differentiate the work that's being built upon from the new work.

It is a self-inflicted problem. As editors/reviewers we think we're doing right, but we're too dumb to see the minute (but important) differences. As authors we're just trying to get published, keep our jobs, and it's not exactly like the reviewers are "wrong". But it often just becomes a chase and does nothing to help make the papers actually better. This gets even worse with targeted acceptance rates, as it incentivizes reviewers to reject and be less nuanced. Which they're already incentivized to do because there's just so much stress and time crunch to the job anyways (including needing to rewrite papers because others did exactly this).

The targeted acceptance rates are just silly and we see the absurdity in domains like Machine Learning[1]. We have an exponentially increasing number of papers to review each year. This isn't just because there are new works, but because works are being resubmitted. Most of these conferences have 30% acceptance rates but the number of "wrong" papers is not that low. We also know the acceptance rate is very noisy for the majority of papers, where a different set of reviewers would result in a different outcome (see the multiple "NeurIPS experiment"s). You can do an easy model to see why this is bad. It just leads to more papers and if the number of reviewers stays the same, this is more reviews that need to be done per reviewer, which just exacerbates the problem. If you have 1000 fixed papers submitted each year and even a low percent of rejected works resubmitting the next year, like 10%, you actually have to review ~1075 papers. With a more realistic ~50% of rejected works getting recycled, you need to actually review ~1500 per year. Most serious authors will try a few times, and it is a common to say "just keep trying".

We don't have to do this to ourselves... It helps no one, and actually harms everyone. So... why? What are we gaining?

It's just so fucking stupid

/rant (have we even started?)

[0] I'm pretty sure we're going to agree, but we're talking in public and want to make sure we communicate with the public. Tbh, even many scientists think "correctness" is the same as "is correct"

[1] It is extra bad because the primary publishing venue is conferences. To you submit, get a review (usually 3), get to do a rebuttal (often 1 page max), and then the final decision is made. There is no real discussion so you have no real chance to explain things to near-niche experts. Worse with acceptance deadlines and overlapping deadlines between conferences. It is better in other domains since journals have conversations, but some of these problems still exist.