top | item 23223127

(no title)

tkiley | 5 years ago

I poked at the github repo for a bit. The ugliness of the code doesn't bother me, but the quantity of parameters does.

Here's one params file that specifies some of the inputs to a run of the model:

https://github.com/mrc-ide/covid-sim/blob/master/data/param_...

Here's another one:

https://github.com/mrc-ide/covid-sim/blob/master/data/admin_...

There are hundreds of constants in there. A lot of them appear to be wild-ass guesses. Presumably, all of them affect the output of the model in some way.

When a model has enough parameters for which you can make unsubstantiated guesses, you have a ton of wiggle room to generate whatever particular output you want. I'd like to see policy and public discussion focus more on the key parameters (R-naught, hospitalization rate, fatality rate) and less on overly-sophisticated models.

discuss

noelsusman|5 years ago

You're correct to focus on the effect of parameter choices over code quality. It's been a little funny to watch a bunch of software engineers freak out about unit tests while ignoring everything else that has a much larger impact on the output of the model. I would bet large sums of money that this code is producing the correct output according to the model/parameter specifications.

All I can say is welcome to epidemiology. The spread of a disease is highly dependent on a host of factors that we have very little insight into. Even simple things like hospitalization rate or fatality rate can be difficult if not impossible to estimate accurately. Epidemiologists are open about this, but few people ever want to listen. Humans just aren't good at truly conceptualizing uncertainty.

The theory behind disease spread models is relatively sound, but they're highly dependent on accurate estimates of input parameters, and governments have not prioritized devoting resources toward improving those estimates. I sat in on discussions between epidemiologists and government officials about COVID models. The response to nearly every question was "we don't know, but here's our best guess". I listened to them beg officials for random testing of the population to improve their parameter estimates. That testing never happened.

thu2111|5 years ago

I would bet large sums of money that this code is producing the correct output according to the model/parameter specifications.

I'll take that money off you then.

The code has various memory safety bugs in it and originally had a typo in a random number generator constant. Amongst other problems.

There's really no reason to believe it produces correct outputs, in fact, we know it didn't and probably still doesn't given how it was written.

wbhart|5 years ago

The problem is, unsophisticated models do not predict anything. You apply them in one country and they do ok, and apply them in another and they get it totally and completely wrong.

Unless all important factors are accounted for, they are going to result in incorrect information for someone. Public policy will then be based on incorrect predictions. People will grow tired of the predictions being wrong and they'll give up on data science entirely.

It's already quite bad that people think they can choose their reality by finding numbers that agree with them and ignoring the ones that don't.

I do understand the point you are making, which is like the epicycles argument. But in global warming and epidemics alike, more parameters are actually needed to model reality.

I do agree that those parameters should be based on actual data, not guesses though. But what value of R would you pick? Is that actually well-constrained?

datastoat|5 years ago

I would pick a value of R that shows itself to have good predictive accuracy.

The way to test predictive models is always to look for their predictive accuracy on holdout data. Machine learning has this ingrained. Classic statistics does this too -- AIC is used to compare models, and it's (asymptotically) leave-one-out cross validation [1].

There's nothing intrinsically wrong with models that have millions of parameters; they might overfit in which case they will have poor predictive accuracy on holdout data, or they might predict well.

I agree with the original article that software engineer scrutiny isn't appropriate for this sort of code -- but I would argue instead that it needs a general-purpose statistician or data scientist or ML expert to evaluate its predictive accuracy. You can't possibly figure this out from a simulator codebase.

At the time the model was published, and acted on by the UK government, there was very little data on which to test predictive accuracy. That's fine -- all it means is that the predictions should have been presented with gigantic confidence intervals.

[1] http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf

CharlesW|5 years ago

> The problem is, unsophisticated models do not predict anything. You apply them in one country and they do ok, and apply them in another and they get it totally and completely wrong.

That's the nature of all models, "sophisticated" or not. Relatively simple models may or may not be useful for a particular case, just as relative complex models may be.

tkiley|5 years ago

"But what value of R would you pick?"

I don't know -- and until we can agree on the answer to your simple question with a high degree of confidence, I think complex models based on specific assumed values of R obscure more than they reveal.

A little bit of modeling is useful because humans are intuitively bad at exponential math and we need scary graphs to jolt us awake sometimes. But when we don't even know the basic parameters (transmission/hospitalization/fatality) with a high degree of precision, complex models with myriad parameters create a false sense of confidence.

jandrewrogers|5 years ago

I was asked to look at the spatiotemporal parameters and modeling, separate from any code issues. That part of the model is astonishingly naive, apparently oblivious to existing research and science on the matter that strongly recommends a different and much more nuanced approach. Industry has invested inordinate amounts of money in understanding how to build effective real-world predictive models of this type and none of that knowledge is reflected here. That seems like a rather glaring oversight and alone voids any utility as a predictive model.

tripletao|5 years ago

I partially agree with the comment above, but I also think it misunderstands how numerical models are often used. At least where I've built them (not epidemiology), the goal wasn't necessarily to gather the most accurate set of inputs and produce the most accurate prediction of the output. The goal was often to help a highly skilled operator explore the parameter space and guide their intuition on the problem, to help that person and simulation together reach some decision.

So code quality mattered less then usual. If there's a significant bug, then the operator will probably notice, and if there's an insignificant bug then no one cares. The large number of input parameters also doesn't matter. The operators are fully aware that they could artificially manipulate the output to wherever they wanted, but to do so would be cheating only themselves.

It feels to me like Ferguson's model was built with similar intent, and probably served that purpose well. The problem came only when the media portrayed the model as a source of authority apart from the people operating it, perhaps to create a feeling of objectivity behind the decisions driven from that. That created an expectation of rigor that either didn't exist (in the software engineering), or fundamentally can't exist given our current knowledge of the science (in the input assumptions).

tentboy|5 years ago

This reminds me of the Drake equation. A sound formula for the probability extra terrestrial life..but half the parameters are wild guesses that can differentiate in orders of magnitude.

asddsfgdfsh|5 years ago

The flip side to having lots of parameters is that you have lots of knobs to tune beyond a basic lockdown.