top | item 38687722

(no title)

fanzhang | 2 years ago

Agree that this is how papers are often judged, but strong disagree on how this is how papers should be judged. This is exactly the problem of reviewers looking for the keys under the lamp post (does the paper check these boxes), versus where they lost the keys (should this paper get more exposure because it advances the field).

The fact that the first doesn't lead more to the second is a failure of the system.

This is the same sort of value system that leads to accepting job candidates with neat haircuts and says the right shibboleths, versus the ones that make the right bottom line impact.

Basically, are "good" papers that are very rigorous but lead to nothing actually "good"? If your model of progress in science is that rigorous papers are a higher probability roll of the dice, and nonrigorous papers are low probability rolls of the dice, then we should just look for rigorous papers. And that a low-rigor paper word2vec actually make progress was "getting really lucky" and we should have not rated the paper well.

But I contend that word2vec was also very innovative, and that should be a positive factor for reviewers. In fact, I bet that innovative papers have a hard time being super rigorous because the definition of rigor in that field has yet to be settled yet. I'm basically contending that on the extreme margins, rigor is negatively correlated with innovation.

discuss

order

jll29|2 years ago

You are right. I often got told "You don't compare with anything" when proposing something very new. That's true, because if you are literally the first one attempting a task, there isn't any benchmark. The trick then is to make up at least a straw man alternative to your method and to compare with that.

Since then, I have evolved my thinking, and I now use something that isn't just a straw man: Before I even conceive my own method or model or algorithm, I ask myself "What is the simplest non-trivial way to do this?". For example, when tasked with developing a transformer based financial summarization system we pretrained a BERT model from scratch (several months worth of work), but I also implemented a 2-line grep based mini summarizer as a shell script, which defied the complexity of the BERT transformer yet proved to be a competitor tought to beat: https://www.springerprofessional.de/extractive-summarization...

I'm inclined to organize a workshop on small models with few parameters, and to organize a shared task as part of it where no model can be larger than 65 kB, a sort of "small is beautiful" workshop in dedication of Occam.

nybsjytm|2 years ago

I don't consider clearly stating your model and meaningfully comparing it to prior work and other models (seemingly the main issues here) to be analogous to a proper haircut or a shibboleth. Actually I think it's a strange comparison to make.

hospadar|2 years ago

Papers are absolutely judged on impact - it's not as though any paper submitted to Nature gets published as long as it gets through peer review. Most journals (especially high-impact for-profit journals) have editors that are selecting interesting and important papers. I think it's probably a good idea to separate those two jobs ("is this work rigorous and clearly documented") vs ("should this be included in the fall 2023 issue").

That's (probably) good for getting the most important papers to the top, but it also strongly disincentivizes whole categories (often very important paper). Two obvious categories are replication studies and negative results. "I tried it too and it worked for me" "I tried it too and it didn't work" "I tried this cool thing and it had absolutely no effect on how lasers work" could be the result of tons of very hard work and could have really important implications, but you're not likely to make a big splash in high-impact journals with work like that. A well-written negative result can prevent lots of other folks from wasting their own time (and you already spent your time on it so might as well write it up).

The pressure for impactful work also probably contributes to folks juicing the stats or faking results to make their results more exciting (other things certainly contribute to this too like funding and tenure structures). I don't think "don't care about impact" is a solution to the problem because obviously we want the papers that make cool new stuff.

godelski|2 years ago

> Papers are absolutely judged on impact

This is post hoc thinking but impossible a priori. You're also discounting the bias of top venues, in that the act of being in their venue is a causal variable for higher impact if you measure by citation counts.

I'd also mention that ML does not typically use a journal system but rather conferences. A major difference is that conferences are not rolling submissions and there is only one rebuttal available to authors. Usually this is limited to a single page that includes citations. You can probably imagine that it's difficult to do an adequate rebuttal to 3-4 reviewers under the best of circumstances. It's like trying to hold a debate where the defending side must respond to any question from the opposition, with clear citations, in a short time frame, and there is no limit to how abstract the opposing side's question need be. Nor that their is congruence within the opposition. It's not a very good framework for making "arguments" more clear or convincing, especially when you consider that the game is zero sum.

I definitely agree with your comments about how other types of useful communication (like null results) are highly discouraged. But I wanted to note that there's a poor framework for even "standard" works.

aaronkaplan|2 years ago

Your argument is that if a paper makes a valuable contribution then it should be accepted even if it's not well written. But the definition of "well written" is that it makes it easy for the reader to understand its value. If a paper is not well written, then reviewers won't understand its value and will reject it.

seanmcdirmid|2 years ago

Well written and rigor aren’t highly correlated. You can have poorly written papers that are very rigorous, and vic versa. Rigor is often another checkbox (does the paper have some quantitative comparisons), especially if the proper rigor is hard to define by the writer or the reader.

My advice to PhD students is to always just focus on subjects where the rigor is straightforward, since that makes writing papers that get in easier. But of course, that is a selfish personal optimization that isn’t really what’s good for society.