top | item 29939633

(no title)

asoplata | 4 years ago

Absolutely, yes. The other comments here have some fantastic reasons for doing this, and several do a good job of weighing the pros vs cons.

The paper alone is, almost always, never enough to fully reproduce the result. I've been bitten by this almost every time I've tried to implement someone else's computational model. It comes down to that only relying on your paper to explain your code leaves a LOT of room for errors. I've experienced all of these when trying to implement someone else's computational work without their code being published:

    1. Despite your best efforts, you include fundamental, result-breaking typos in the equations you write up to explain the math of what you're doing. This WILL happen to you at some point in your career, and in my experience, it's a problem in >>50% of computational modeling papers.
    2. There are assumptions in the logic of the code that you don't include in the writeup, since they're obvious to you, but you don't realize that someone else trying to understand your paper won't necessarily be starting with those same assumptions. This happens frequently with neural models that use complicated synapse-computation schemes.
    3. Your codebase may be big enough that you think code part X works a certain kind of way from memory, but you forget that you changed the logic late in the project to work in a different way.
    4. Publishing your code at the time of publication prevents "Which version did I use?" problems. It's very common for people to continue to work on their science code for new work, but they don't bother to save/tag a SPECIFIC version of their code that was used for the actual paper. This results in that even the author doesn't know what exact values were used for the results in the paper!

Any "competitive advantage" has to be weighed versus "positive exposure". If your code is the primary research object (as opposed to the data), then it's technically possible that someone may grab your code, extend it to do the next, interesting use of it, and then scoop you before you can do it yourself. However, even if this happens (which it probably won't), consider the following:

    1. You can't build a successful career out of just small extensions to the same piece of code, and so that codebase won't be the main kernel of your career, but rather your understanding of it.
    2. For every 1 person that tries to use that to scoop you, IMHO there's going to be at least 10 other people who see your code and reach out to you for help with it, or just to ask a question about it, or reach out for potential collaboration! In other words, depending on the field, if you publish the code, I think you're likely to gain new/future collaborators at a MUCH faster rate than people who compete against you. You'll be surprised at how many researchers on the other side of the planet are interested in your software!
    3. Even if someone scoops you with your own code, if they give any indication it came from you, you still get to count that as a publication that built off of your software work when you're applying to jobs :)
    4. At least with US federal government funding, it's gradually becoming required to do this anyways, and I believe/hope that it's going to become the standard anyways very soon.

Finally, don't fret about polishing/cleaning/organizing the code, especially style. For others trying to reproduce your results or just investigating how you did things, the main thing that matters is that your code runs "correctly", i.e. how you ran it to get the results that you did. One idea is to publish it "as is" for the CORRECTNESS of the paper, put a git tag indicating "original version", and THEN clean it up on Github/wherever. This helps prevent any new "organizing" of the code from potentially breaking something, which is counterproductive. This way, when people go to your code page, the first thing they see is a nicely-organized version, and gives you time to test that it works the same. Honestly, if you care enough about this at all, then your code is probably significantly more organized than 95% of research code out there; the standards of code quality in science are VERY low, which is completely different than private sector software engineering.

* edits are for markup

discuss

No comments yet.