top | item 8771852

Problems of Reproducibility in Computer Science

24 points| mkpankov | 11 years ago |blog.michaelpankov.com

9 comments

nkurz|11 years ago

I've been particularly bothered by the way that almost all "real numbers" in CS papers attempt to explain the advantages of one algorithm over another based on small performance differences of two partially optimized implementations, compiled with a single compiler with arbitrary options, and then tested on a single older machine. The few that go further merely repeat the exercise with an even older compiler an an even older machine with a forgotten architecture. While it's a step up from simply counting pseudo-instructions and memory accesses, and while there is room for theoretical analysis of asymptotic behaviour, it's painful how much is read into the particular tea leaves of one or two data points.

malux85|11 years ago

Recently I was asked to help judge the 2014 iGEM competition which includes software entries in the field of biology, I made a special point to the other judges that we should give special consideration to those teams who included instructions or configurations for setting up their environments - too much of the software entered wouldn't compile (or run) when checked out of SC!

michaelochurch|11 years ago

One of the trade-offs we've continually made in computing is between general performance and legibility of performance. Caches (i.e. the memory hierarchy) is one example of that, but we also have pipelining and ILP. Moving up into software, we see this with garbage collection and (more dangerously) with databases. PostgreSQL is great at taking complex queries (many, many joins) and turning them into things that (often) can be very quickly executed. Of course, there's no silver bullet. If your join is a full Cartesian product over two or more large tables, you're screwed. If you assumed that the table was set up with certain "obvious" indexes and it wasn't set up that way, you can get burned on that.

We've built systems that are very fast, but almost no one knows why they are fast and when those speed advantages go against us (which is quite rare) it's often unclear why. For the most part, all of this has been a huge win. It might make reasoning about performance from first principles ugly, if not impossible, but the economics favor it. That said, we're at the point where the only way to reason about performance in many real-world systems is empirically, because there's just so much going on in terms of memory hierarchy, network traffic, etc. that is hard to predict in advance. Combine this with the myriad variables, some in the physical world, that can influence performance... and what we end up with is that results that seem meaningful (30 percent faster!) might be artifacts of differences in configuration.

That said, I don't think the term computer science was coined with performance measurement in mind but, all that said, the field is not really a science. It's a mix of mathematics, microeconomics, design and engineering... with a lot of science involved, but most often not scientific under the original, strict definition of the word. Science is about investigating the natural world and finding the right answers; CS is about making answers. And they're both extremely important, but fundamentally different.

Also, just to underscore how difficult CS is, I tend to think of it as a branch of mathematics that lives on an unfair playing field. Reasoning about arbitrary code, in even the most trivial ways, is known to be impossible (Turing, Church, Rice) and much of CS is contending with the obscenity of arbitrary programs (i.e. often having to quantify over the space of programs) and, typically, that means limiting what can be written while still retaining as much power as is needed.

Umn55|11 years ago

>We've built systems that are very fast, but almost no one knows why they are fast

It's pretty simple, all code can be modelled as a fluid flowing through a landscape. i.e. imagine a river (which represents code), it courses over the land and there are "cliffs", rocks and hills (representing bottle necks/loads).

The reality is as bus's got wider and transistors were able to be packed closer, it just allow for more water to flow faster. There's no big conspiracy at all. Because software ultimately can be expressed as a circuit.

BruceIV|11 years ago

Even if there's no publicly published code, any credible CS researcher will give you access to their code and tests for research purposes if you ask (excepting the rare case where they include someone else's IP), for exactly this reason of reproducibility.

Secondly, testing in a number of different environments is part of the nature of scientific reproducibility, to check if the claimed result is robust or simply an artifact of the experimental setup, so releasing more detailed specs of the hardware used is not so essential.

nkurz|11 years ago

Given that reproducibility is the standard by which scientific publications are usually judged, what would be the reason for only making the code available upon special request rather than simply making it available as part of the publication? This feels like publishing a paper about an experiment and saying "methodology available by special request".

scottpiper|11 years ago

Moyix's Panda will get you most of what you want: http://moyix.blogspot.com/2014/12/reproducible-malware-analy...

unknown|11 years ago

[deleted]