top | item 32152180

(no title)

twak | 3 years ago

Academics are judged by the publications not their implementations, so the system favours over-sold manuscripts and it-ran-once implementations. Until funding is conditional-on (and provided for) robust well maintained code it will remain challenging to get reproducibility.

Frequently the PIs (bosses) will not even glance at the repositories written by junior members, probably can't read code anyway, and certainly won't allocate time for their maintenance. Even worse, most academics who do publish code have never been exposed to real world software engineers, their techniques, or tools.

discuss

order

InefficientRed|3 years ago

The basic issue is the labor. We don't pay for good science.

Suppose I told you to develop good software that's novel enough to publish about, but only gave you enough budget to pay your SWEs a maximum of $30K/yr. That's one zero, for those reading quickly. Additional non-beneifts:

1. Unlike literally every other job in the country, you don't have budget to pay FICA taxes for your employees, and tax code allows this. This means your employees don't even have the USA's paltry social safety net to fall back on if they are hit by a bus or graduate into a massive recession, and their years working for you do not count toward social security or medicare retirement benefits.

2. Obviously, there is no budget for 401K retirement benefits

3. No CoL raises

4. Healthcare benefits will be paltry.

5. Your SWEs need to serve as a teaching assistant every once in a while. This likely means grading homework and a few late evenings of grading exams. No overtime for those late nights, obviously.

6. All travel, which is mandatory and often international, must be paid by the employee up front and reimbursement can take 1-3 months. We don't trust $30K/yr drones with corporate cards. Good luck making rent after a conference :)

Just to reiterate: You need to hire SWEs. You pay $30K/yr (less than some Amazon warehouses!), benefits package is literally worse than a part-time gig at a supermarket or fast food joint, and your employee is expected to give you $2K-$4K loans a few times a year while living paycheck to paycheck.

I just roll my eyes hard when I see complaints about garbage research code. Almost everyone in my PhD cohort had FAANG or finance offers; we were all taking 5x-10x paycuts to work on interesting problems and do science. If you want productizable research prototypes, hire PhDs to do science for you.

(And I say this, for the record, as a rare PhD who during their phd wrote code that is well-documented, well-maintained, and still used by dozens of companies for business-critical processes many years later.)

valarauko|3 years ago

I agree. Plus most of the code is written by a single person, and while most first authors are relatively responsive on github, they soon get overwhelmed with other projects, manuscript responsibilities, and job hunts. Coding and its maintenance is only a small part of an overworked and underpaid academic's responsibilities, so frankly its understandable. There's also a good chance that they are no longer employed at the same place a few years out.

shantnutiwari|3 years ago

We the taxpayers do pay, just the money doesnt reach the researchers.

tpoacher|3 years ago

I get where you're coming from, but this is unnecessarily reductive.

By this logic all companies maliciously sell broken software in order to charge for updates.

But obviously not all companies do that, those that do get called out for it, those that produce good products get a good reputation for it, etc. Similar things apply to academia.

"It ran once" papers run the risk of not getting cited as much compared to good papers with robust implementations, so the maligned incentive you describe isn't as clear cut, even in the corner case where the novelty is considered to be in the algorithm rather than the particular implementation. Worse, if the algorithm fails to reproduce, a researcher runs the risk of being retracted or shamed in subsequent publications when their work fails to reproduce. And reproducibility is a key aspect of journal publications in reputable journals, meaning less reproducible work will end up in lower quality outlets which often hurt one's career more than they help.

time_to_smile|3 years ago

> By this logic all companies maliciously sell broken software in order to charge for updates.

Your analogy here is not great since the parent's claim is that academics have no incentive to produce good, reproducible research, not that they are maliciously creating bad research.

A more apt analogy would be:

"By this logic all companies would be driven by quarterly metrics and rush out broken software and then charge for updates/support"

... which is pretty much the exact state of the industry right now.

DrScientist|3 years ago

Totally - I remember hearing from Sean Eddy how hard it was to get continued funding to work on hmmer - making it faster, more sensitive, more general, more robust etc.

Well crafted academic software is a rarity - the stuff that does exist tends to comes out of institutes where the software is necessary to their wider mission - like the Broad or Sanger Institutes.

drnonsense42|3 years ago

Regarding replicability, I disagree this is a problem at all. Writing shit code is not going to prevent someone highly capable from replicating your results. If anything, I empathize with researchers writing sloppy code. It’s a creative field and they already have to do enough editing and documentation. Omitting code or fabricating/manipulating evaluation results is what prevents replication.

Frankly, unlike the author, I think there’s too many people in the field. They produce a handful of papers worth reading every year along with thousands upon thousands of models that may or may not slightly improve performance on a specific task and then have no general value beyond that. And I don’t believe this will change much- ml is likely the most monetizable PhD path by a safe margin, so there is too much profit incentive to churn out crap at any cost.