etal | 13 years ago | on: Can we make accountable research software?
etal's comments
etal | 13 years ago | on: Can we make accountable research software?
etal | 13 years ago | on: Can we make accountable research software?
1. A program that implements a new technique which forms an important part of a research project. Maybe a program that is the research project, which will be described in a paper.
No doubt this code should be included with the publication, no matter how "ugly" it is. Some journals, e.g. Bioinformatics, already require that an article about software must include the software itself. This is the stuff the Bioinformatics Testing Consortium would run a smoke test on, because amazingly, a lot of programs that have been written up as journal articles just don't compile or work at all on somebody else's machine; many articles don't include the source code, and some don't even say how to get a redistributable binary. That's wrong, and we can fix it.
2. The mountain of single-use scripts and shell commands that are used in a research project that's not really about software at all, only a small fraction of which produce some output that the scientist follows up on.
Key points: (1) this code is very unlikely to work on anyone else's machine as-is; (2) crucial parts of these pipelines are lost in the Bash history, or were executed on a 3rd-party web server, or depend on a data set on loan from a collaborator who is not ready to release the data yet; (3) almost all of the code is dead; (4) whatever comments or notes exist are usually misleading or completely wrong.
As an example of what can go wrong when this code is released as-is, remember when the East Anglia Climate Research Unit "hide the decline" stuff hit the fan? It wasn't clear which code was dead, the comments made no sense, and people freaked because they couldn't be sure how the published results came out of that godawful mess. The eventual solution, way too late, was to make a proper open-source, openly developed software project out of the important bits. That, in a nutshell, is why scientists won't release ALL the code -- even the hard drive itself is not the whole story; the scientist still needs to be available to explain it and navigate over the red herrings. And getting code into a state where it's self-explanatory takes time.
etal | 14 years ago | on: The Dangerous "Research Works Act"
Relevant example: You published these two posts in TechCrunch to get a wide audience. (And I'm glad you did!) I read them partly because they appeared in TechCrunch.
etal | 14 years ago | on: The Dangerous "Research Works Act"
http://www.hhmi.org/news/elife20111107.html
It would probably be considered dubious/anti-competitive if NIH and NSF launched their own journals, but because of the Open Access Initiative (which RWA attempts to reverse), NIH is able to host articles that have already been released to the public via PubMed Central.
etal | 14 years ago | on: The Dangerous "Research Works Act"
http://blogs.nature.com/news/2010/08/nature_and_california_m...
Unclear on the details, but presumably UC got a somewhat better price. (Note that UC was getting a better price than most libraries to begin with.)
etal | 15 years ago | on: Your Commute Is Killing You
etal | 15 years ago | on: PyPy 1.5 Released: Catching Up
etal | 15 years ago | on: The No. 1 Habit of Highly Creative People
etal | 15 years ago | on: Unity on Wayland
More about the architecture:
etal | 15 years ago | on: Thank you, Ubuntu
etal | 15 years ago | on: Hunter S. Thompson's brutally honest Canadian job request
If there's a fundamental difference between the weaselly narratives constructed by Fox News and the psychedelic screeds Thompson put out, it's that most reporters aren't making it explicit that their stories are fully personal, opinionated interpretations of true events -- they record some isolated facts, sample a few quotes and make vague references to public sentiment to back up any narrative they need. But they present all of this as objective information. This was happening well before H.S.T. (see "yellow journalism") and happens outside the U.S. too (see Daily Mail).
Thompson's approach was (1) a veil of entertaining literary showmanship over (2) complete, self-accountable interpretations of the events being covered. He was clear that his stories were subjective, and that freed him to explain exactly why he felt the way he did about Nixon, drug laws, Southern culture, etc.
etal | 15 years ago | on: What Is It Like To Be A Baby?
Still an interesting question, though. I've seen cats focus pretty hard on certain things.
etal | 15 years ago | on: Why Companies Should Insist that Employees Take Naps
As a side effect, dinner shifts later in the day. This is noticeable in places where siestas are common, like Spain -- early breakfast, nothing happening in the mid-afternoon, and then everyone emerges for dinner and night life around 8-10 p.m.
etal | 15 years ago | on: Yegge Strikes Back from the Grave
etal | 15 years ago | on: Why Companies Should Insist that Employees Take Naps
2. Over the entirety of a long flight, test pilot reaction times initially, then at regular intervals until the end of the flight. Result: Without a nap, final reaction times are 34% slower than initial reaction times.
Yes, it's a little bit of a factoid soup, and citations would have been nice.
etal | 15 years ago | on: Ubuntu Server tech lead: The real problem with Java in Linux distros
This depends on these external packages: ...
Java programs usually come with a bunch of .jar files which were once independent packages, but have been dropped into the release itself. No dependency problems!Then, if someone wants to package a Java application for Debian, the process is:
1. Look through the collection of .jar files in the release
2. Do you recognize one of these as already being packaged for Debian?
3. Work with upstream to delete that .jar from Debian's copy and depend on the system's version instead
4. Repeat for every other .jar in the release, until you hit a wall
5. Upload the package to Debian with an acceptably small number of bundled .jars
6. Time permitting, get someone to package the other .jars that aren't available in Debian yet
That said, you can cause a similar amount of trouble in other languages, it's just not the convention (thanks to the success of gems, CPAN, PyPI). For example, Ubuntu appears to have deleted the sagemath package because upstream keeps their own patched copies of dozens of libraries they depend on:
etal | 15 years ago | on: Ubuntu Server tech lead: The real problem with Java in Linux distros
etal | 15 years ago | on: Advice to Aimless, Excited Programmers
(Incidentally, the only new language I've been able to stick with after learning Python is R, because it solves completely different problems. Which means R is now my Blub for stuff that I'd like to, given some free time, try with Clojure/Incanter...)
etal | 15 years ago | on: Ubuntu Server tech lead: The real problem with Java in Linux distros
In Java, developers will bundle dozens of random .jar files with their application. Other versions of these libraries may exist in a Linux distribution already, but specific Java apps aren't linking to those and carefully documenting how to install all the dependencies (or which ones are optional). Instead, from the distro's point of view, these Java apps are being released with a bunch of binary blobs that may or may not contain bugs that need patching later. Which partly defeats the purpose of package management.
Python programmers don't do this. Distros love Python, which is why so many of their system administration/automation scripts depend on it. This puts some tension between packagers, who use it as a bash/perl replacement, and developers like Zed, who want to treat it like an up-to-date library -- but realize that the Java solution of just bundling the whole thing with the main app is ugly. That tension exists because Python developers haven't run amok the way Java developers have.
To be clear, I'm all for open science and even open notebooks where it's a good fit for the project. I just don't think a pile of single-use scripts is a sufficient replacement for a clear English description of the analysis workflow and the reasons for each step. If I can't understand how an analysis was done from the article itself and the documentation for any associated software, I would not trust the article. Including more code, particularly the code further down the Pareto curve of relevance to the final article, does not make the article more correct -- most journal articles are wrong or flawed in some way, even if the code works as advertized.