top | item 45712641

(no title)

> Not that I understand much of what they say, but it appears there are a lot of correctness bugs in pytorch that are flying under the radar, probably having a measurable impact on the results of model quality.

Do you have any links to public thoughts about this? As if it was true, could mean a lot of research could be invalidated, so obviously would make huge news.

Also feels like something that would be relatively easy to make reproducible test cases from, so easy to prove if that's true or not.

And finally if something is easy to validate, and would make huge news, I feel like someone would already have attempted to prove this, and if it was true, would have published something a long time ago.

discuss

bobbylarrybobby|4 months ago

Could this really invalidate research? Managing to produce a model that works (assuming you check all of the myriad modeling correctness checkboxes) is sufficient on its own. The fact that the modeling process itself was broken in some way — but not the assumptions made of the model inputs, or data leakage assumptions, or anything that fundamentally undermines any model produced — has no bearing on the outcome, which is the fact that you got a model that evidently did make accurate predictions.

Majromax|4 months ago

> Could this really invalidate research? Managing to produce a model that works (assuming you check all of the myriad modeling correctness checkboxes) is sufficient on its own.

In the academic sense, a model that happens to work isn't research; the product of research should be a technique or insight that generalizes.

"Standard technique X doesn't work in domain Y, so we developed modified technique X' that does better" is the fundamental storyline of many machine learning papers, and that could be 'invalidated' if the poor performance of X was caused by a hidden correctness bug avoided by X'.

p1esk|4 months ago

a lot of research could be invalidated, so obviously would make huge news.

A lot of research is unreproducible crap. That’s not news to anyone. Plus, bugs usually make results worse, not better.

Calavar|4 months ago

There are many more ways to degrade model performance than to enhance it, so I would expect the vast majority of bugs to lead to artificially reduced accuracy, not artificially increased accuracy.

So if PyTorch is full of numerical flaws, that would likely mean many models with mediocre/borderline performance were discarded (never published) because they just failed to meet the threshold where the authors felt it was worth their time to package it up for a mid-tier conference. A finding that many would-be mediocre papers are actually slightly less mediocre than believed would be an utterly unremarkable conclusion and I believe that's why we haven't seen a bombshell analysis of PyTorch flaws and reproducibility at NeurIPS.

A software error in, say, a stats routine or a data preprocessing routine would be a different story because the degrees of freedom are fewer, leaving a greater probability of an error hitting a path that pushes a result to look artificially better as opposed to artificially worse

dangoodmanUT|4 months ago

Check their Twitter, I saw something either yesterday or earlier today iirc