top | item 5590839

(no title)

jd | 13 years ago

Most importantly -- when researchers know their data and methodology will be out in the open they'll have a big incentive to make it look clean and presentable. They know they risk getting called out on excluding certain segments of the data set, so they'll have to at the very least add a small remark in the spreadsheet justifying their decision. It also makes it really easy for other researchers to pick other start and end years to test if the result only holds for the original data. Which again, encourages researchers to explain justify the input data they've chosen.

In addition, researchers are likely to discover mistakes while cleaning up the excel sheet, data sources and code before publishing it. Just like we find mistakes in our work while refactoring and cleaning up code before we push it to github.

So even when nobody ever looks at the data and code we can expect the quality of the research to improve significantly: just because the code is there to be looked at.

I think the case in favor of making data & code public for academic research is pretty overwhelming.

discuss

order

lmg643|13 years ago

I think having data open and available is good for everyone.

The more I learn about this particular incident, the more I feel the scandal is not (a) that famous harvard professors made a mistake, or (b) that microsoft excel is more error prone than alternatives (which seems like nonsense to me because the more complex the replacement the more error prone it will be)...

To me the scandal is that you can be a tenured professor in economics and produce work that amounts to simple averages of widely available data and call it a research paper, and that people take you seriously, and presidential candidates use you as a reference, and your department doesn't bat an eyelash.

The fact that they screwed up seems incidental - mistakes happen.

It seems like the kind of back-of-the envelope work that any old blogger would be capable of doing, we just don't have a way to take good ideas, no matter where they come from, seriously. no matter how much we profess to try - at heart, the consensus is still status-driven, and pedigrees matter.

vacri|13 years ago

I think having data open and available is good for everyone.

Not necessarily everyone. Collecting good data is hard, long, and tedious, and the 'glory' part is the analysis. People get accolades for making analyses, not for good groundwork.