top | item 29938872

(no title)

psmirnov | 4 years ago

I agree with almost all of this, however I believe that publishing random seeds is dangerous in its own way.

Ideally, if your code has a random component (MCMC, bootstrapping, etc), your results should hold up across many random seeds and runs. I don’t care about reproducing the exact same figure you had, I want to reproduce your conclusions.

In a sense, when a laboratory experiment gets reproduced, you start off with a different “random state” (equipment, environment, experimenter - all these introduce random variance). We still expect the conclusions to reproduce. We should expect the same from “computational studies”.

discuss

order

Fomite|4 years ago

The thing is, if you want to ignore someone's random seed, you can if it's provided. If it's not provided and you need it to chase down why something isn't working, you're SOL.

It's zero cost to include it.

dllthomas|4 years ago

I think being able to re-run code with a paper is great, but I think we should be sure to distinguish it from scientific replication.

When replicating physics or chemistry, you build fresh the relevant apparatus, demonstrating that the paper has sufficiently communicated the ideas and that the result is robust to the noise introduced not just by that "random state" you discuss but also to the variations from a trip through human communication.

I acknowledge that this is substantially an aside, but it's something I like to surface from time to time and this seemed a reasonable opportunity.