etal's comments

etal | 13 years ago | on: Can we make accountable research software?

Right, which is why the novel parts should get more attention and undergo code review, which is the goal of the Bioinformatics Testing Consortium.

To be clear, I'm all for open science and even open notebooks where it's a good fit for the project. I just don't think a pile of single-use scripts is a sufficient replacement for a clear English description of the analysis workflow and the reasons for each step. If I can't understand how an analysis was done from the article itself and the documentation for any associated software, I would not trust the article. Including more code, particularly the code further down the Pareto curve of relevance to the final article, does not make the article more correct -- most journal articles are wrong or flawed in some way, even if the code works as advertized.

etal | 13 years ago | on: Can we make accountable research software?

We should be clear which of two kinds of scientific code we're talking about:

1. A program that implements a new technique which forms an important part of a research project. Maybe a program that is the research project, which will be described in a paper.

No doubt this code should be included with the publication, no matter how "ugly" it is. Some journals, e.g. Bioinformatics, already require that an article about software must include the software itself. This is the stuff the Bioinformatics Testing Consortium would run a smoke test on, because amazingly, a lot of programs that have been written up as journal articles just don't compile or work at all on somebody else's machine; many articles don't include the source code, and some don't even say how to get a redistributable binary. That's wrong, and we can fix it.

2. The mountain of single-use scripts and shell commands that are used in a research project that's not really about software at all, only a small fraction of which produce some output that the scientist follows up on.

Key points: (1) this code is very unlikely to work on anyone else's machine as-is; (2) crucial parts of these pipelines are lost in the Bash history, or were executed on a 3rd-party web server, or depend on a data set on loan from a collaborator who is not ready to release the data yet; (3) almost all of the code is dead; (4) whatever comments or notes exist are usually misleading or completely wrong.

As an example of what can go wrong when this code is released as-is, remember when the East Anglia Climate Research Unit "hide the decline" stuff hit the fan? It wasn't clear which code was dead, the comments made no sense, and people freaked because they couldn't be sure how the published results came out of that godawful mess. The eventual solution, way too late, was to make a proper open-source, openly developed software project out of the important bits. That, in a nutshell, is why scientists won't release ALL the code -- even the hard drive itself is not the whole story; the scientist still needs to be available to explain it and navigate over the red herrings. And getting code into a state where it's self-explanatory takes time.

etal | 14 years ago | on: The Dangerous "Research Works Act"

If pre-publication peer review goes out of fashion, then another possibility if that the brand of major journals becomes more important. We still need a quick gauge of the quality of an article, other than its Google rank or number of page views. Nature can retract popular articles that are later proven flawed; I don't think Google would attempt to wield that kind of authority.

Relevant example: You published these two posts in TechCrunch to get a wide audience. (And I'm glad you did!) I read them partly because they appeared in TechCrunch.

etal | 14 years ago | on: The Dangerous "Research Works Act"

The major non-governmental funding agencies recently banded together to solve the problem roughly the way you suggest, by creating their own open-access journal which they will enourage their grantees to submit their work to. It will be called eLife:

http://www.hhmi.org/news/elife20111107.html

It would probably be considered dubious/anti-competitive if NIH and NSF launched their own journals, but because of the Open Access Initiative (which RWA attempts to reverse), NIH is able to host articles that have already been released to the public via PubMed Central.

etal | 15 years ago | on: Your Commute Is Killing You

It's probably being poor, which also correlates with worse working environments. Remember, "drive until you qualify."

etal | 15 years ago | on: PyPy 1.5 Released: Catching Up

Unladen Swallow isn't dead, actually -- it just merged into the main CPython code base. The first few quarters of optimizations are in Python 2.7 (it's noticeably faster than Python2.6) and the more adventurous bits are on separate branches in SVN.

etal | 15 years ago | on: The No. 1 Habit of Highly Creative People

This is also called "flow" or being "in the zone" -- focusing on one thing, intensely, without interruptions. It's one more reason to lump programming in with the other creative arts.

etal | 15 years ago | on: Thank you, Ubuntu

LTS versions have stable patch-level releases every 6 months while they're supported (e.g. 10.04.1 was released in July, and there will probably be a 10.04.2 in January), so they're able to add drivers to the installer when they become available.

etal | 15 years ago | on: Hunter S. Thompson's brutally honest Canadian job request

The word you probably want is "truthiness".

If there's a fundamental difference between the weaselly narratives constructed by Fox News and the psychedelic screeds Thompson put out, it's that most reporters aren't making it explicit that their stories are fully personal, opinionated interpretations of true events -- they record some isolated facts, sample a few quotes and make vague references to public sentiment to back up any narrative they need. But they present all of this as objective information. This was happening well before H.S.T. (see "yellow journalism") and happens outside the U.S. too (see Daily Mail).

Thompson's approach was (1) a veil of entertaining literary showmanship over (2) complete, self-accountable interpretations of the events being covered. He was clear that his stories were subjective, and that freed him to explain exactly why he felt the way he did about Nixon, drug laws, Southern culture, etc.

etal | 15 years ago | on: What Is It Like To Be A Baby?

It would if it were true, but it's not.

Still an interesting question, though. I've seen cats focus pretty hard on certain things.

etal | 15 years ago | on: Why Companies Should Insist that Employees Take Naps

It's easier for students. When I'm on that schedule, I wake up around 7 a.m. (naturally), nap around 4:30 or 5 p.m., and go to bed around 1 a.m. It's maintainable when I don't have regular late-afternoon meetings, and very helpful during "death march"-type projects because it feels like having two days in one.

As a side effect, dinner shifts later in the day. This is noticeable in places where siestas are common, like Spain -- early breakfast, nothing happening in the mid-afternoon, and then everyone emerges for dinner and night life around 8-10 p.m.

etal | 15 years ago | on: Yegge Strikes Back from the Grave

I'm deeply curious where Clojure would fall in this comparison. Is Compojure as suave as Hunchentoot? Is string formatting just as flexible? In practice, is it a pain to have to fail over to Java library documentation when Clojure doesn't suffice, versus having plentiful but scattered docs for a single language?

etal | 15 years ago | on: Why Companies Should Insist that Employees Take Naps

1. In the middle of a long flight, compare pilots' reaction times just before the nap to just after the nap. Result: After the nap, reaction times are 16% shorter.

2. Over the entirety of a long flight, test pilot reaction times initially, then at regular intervals until the end of the flight. Result: Without a nap, final reaction times are 34% slower than initial reaction times.

Yes, it's a little bit of a factoid soup, and citations would have been nice.

etal | 15 years ago | on: Ubuntu Server tech lead: The real problem with Java in Linux distros

Ruby, Perl and Python packages usually come with a README that says:

  This depends on these external packages: ...
Java programs usually come with a bunch of .jar files which were once independent packages, but have been dropped into the release itself. No dependency problems!

Then, if someone wants to package a Java application for Debian, the process is:

1. Look through the collection of .jar files in the release

2. Do you recognize one of these as already being packaged for Debian?

3. Work with upstream to delete that .jar from Debian's copy and depend on the system's version instead

4. Repeat for every other .jar in the release, until you hit a wall

5. Upload the package to Debian with an acceptably small number of bundled .jars

6. Time permitting, get someone to package the other .jars that aren't available in Debian yet

That said, you can cause a similar amount of trouble in other languages, it's just not the convention (thanks to the success of gems, CPAN, PyPI). For example, Ubuntu appears to have deleted the sagemath package because upstream keeps their own patched copies of dozens of libraries they depend on:

http://packages.ubuntu.com/search?keywords=sagemath&sear...

etal | 15 years ago | on: Advice to Aimless, Excited Programmers

Another way is to pick a top-notch library or framework in your target language, and learn to use that. For Clojure, I'd try making some clever charts with Incanter:

http://incanter.org/

(Incidentally, the only new language I've been able to stick with after learning Python is R, because it solves completely different problems. Which means R is now my Blub for stuff that I'd like to, given some free time, try with Clojure/Incanter...)

etal | 15 years ago | on: Ubuntu Server tech lead: The real problem with Java in Linux distros

This problem is different from the Python issue Zed Shaw wrote about.

In Java, developers will bundle dozens of random .jar files with their application. Other versions of these libraries may exist in a Linux distribution already, but specific Java apps aren't linking to those and carefully documenting how to install all the dependencies (or which ones are optional). Instead, from the distro's point of view, these Java apps are being released with a bunch of binary blobs that may or may not contain bugs that need patching later. Which partly defeats the purpose of package management.

Python programmers don't do this. Distros love Python, which is why so many of their system administration/automation scripts depend on it. This puts some tension between packagers, who use it as a bash/perl replacement, and developers like Zed, who want to treat it like an up-to-date library -- but realize that the Java solution of just bundling the whole thing with the main app is ugly. That tension exists because Python developers haven't run amok the way Java developers have.

page 1