(no title)
tkiley | 5 years ago
Here's one params file that specifies some of the inputs to a run of the model:
https://github.com/mrc-ide/covid-sim/blob/master/data/param_...
Here's another one:
https://github.com/mrc-ide/covid-sim/blob/master/data/admin_...
There are hundreds of constants in there. A lot of them appear to be wild-ass guesses. Presumably, all of them affect the output of the model in some way.
When a model has enough parameters for which you can make unsubstantiated guesses, you have a ton of wiggle room to generate whatever particular output you want. I'd like to see policy and public discussion focus more on the key parameters (R-naught, hospitalization rate, fatality rate) and less on overly-sophisticated models.
noelsusman|5 years ago
All I can say is welcome to epidemiology. The spread of a disease is highly dependent on a host of factors that we have very little insight into. Even simple things like hospitalization rate or fatality rate can be difficult if not impossible to estimate accurately. Epidemiologists are open about this, but few people ever want to listen. Humans just aren't good at truly conceptualizing uncertainty.
The theory behind disease spread models is relatively sound, but they're highly dependent on accurate estimates of input parameters, and governments have not prioritized devoting resources toward improving those estimates. I sat in on discussions between epidemiologists and government officials about COVID models. The response to nearly every question was "we don't know, but here's our best guess". I listened to them beg officials for random testing of the population to improve their parameter estimates. That testing never happened.
thu2111|5 years ago
I'll take that money off you then.
The code has various memory safety bugs in it and originally had a typo in a random number generator constant. Amongst other problems.
There's really no reason to believe it produces correct outputs, in fact, we know it didn't and probably still doesn't given how it was written.
wbhart|5 years ago
Unless all important factors are accounted for, they are going to result in incorrect information for someone. Public policy will then be based on incorrect predictions. People will grow tired of the predictions being wrong and they'll give up on data science entirely.
It's already quite bad that people think they can choose their reality by finding numbers that agree with them and ignoring the ones that don't.
I do understand the point you are making, which is like the epicycles argument. But in global warming and epidemics alike, more parameters are actually needed to model reality.
I do agree that those parameters should be based on actual data, not guesses though. But what value of R would you pick? Is that actually well-constrained?
datastoat|5 years ago
The way to test predictive models is always to look for their predictive accuracy on holdout data. Machine learning has this ingrained. Classic statistics does this too -- AIC is used to compare models, and it's (asymptotically) leave-one-out cross validation [1].
There's nothing intrinsically wrong with models that have millions of parameters; they might overfit in which case they will have poor predictive accuracy on holdout data, or they might predict well.
I agree with the original article that software engineer scrutiny isn't appropriate for this sort of code -- but I would argue instead that it needs a general-purpose statistician or data scientist or ML expert to evaluate its predictive accuracy. You can't possibly figure this out from a simulator codebase.
At the time the model was published, and acted on by the UK government, there was very little data on which to test predictive accuracy. That's fine -- all it means is that the predictions should have been presented with gigantic confidence intervals.
[1] http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf
CharlesW|5 years ago
That's the nature of all models, "sophisticated" or not. Relatively simple models may or may not be useful for a particular case, just as relative complex models may be.
tkiley|5 years ago
I don't know -- and until we can agree on the answer to your simple question with a high degree of confidence, I think complex models based on specific assumed values of R obscure more than they reveal.
A little bit of modeling is useful because humans are intuitively bad at exponential math and we need scary graphs to jolt us awake sometimes. But when we don't even know the basic parameters (transmission/hospitalization/fatality) with a high degree of precision, complex models with myriad parameters create a false sense of confidence.
jandrewrogers|5 years ago
tripletao|5 years ago
So code quality mattered less then usual. If there's a significant bug, then the operator will probably notice, and if there's an insignificant bug then no one cares. The large number of input parameters also doesn't matter. The operators are fully aware that they could artificially manipulate the output to wherever they wanted, but to do so would be cheating only themselves.
It feels to me like Ferguson's model was built with similar intent, and probably served that purpose well. The problem came only when the media portrayed the model as a source of authority apart from the people operating it, perhaps to create a feeling of objectivity behind the decisions driven from that. That created an expectation of rigor that either didn't exist (in the software engineering), or fundamentally can't exist given our current knowledge of the science (in the input assumptions).
tentboy|5 years ago
asddsfgdfsh|5 years ago