It has been obvious from the coverage of the coronavirus pandemic that most people simply do not understand the nature of models.
This is not surprising. Models are messy things, and few people have spent much time thinking about how they work.
There are two parts to a model. First, there is the structure of the model, which is a bunch of mathematical equations that show how one thing is connected to another. The number of equations and the mathematical complexity of the equations can both be quite high, but fundamentally the structure of a model is just like those two-equation, two-unknown problems you learned to solve in your algebra class. Second, there are the data you use in the model. If you have bad data, then the model will not be useful, no matter how carefully it is constructed. Again, your algebra class gives an example: if you know that Y is twice as large as X, but you don’t know the size of X, then the model is not terribly useful in telling you the size of Y.
The epidemiological models that are garnering so much attention in the coronavirus pandemic are not fundamentally different from economic models or weather forecasting models. They have a mathematical structure, and they use existing data to make predictions. Different epidemiological models give different results because they are built differently. Not surprisingly, modelers generally think their particular model is the best. Unfortunately, as of this date, the details of many of the models that are informing policy decisions have not been released to the public, so there is no way for others to evaluate the structures of the models themselves.
Perhaps even more importantly, the data that are being used are woefully incomplete. Consider the fatality rate numbers. To know the fatality rate, you have to know both the number of deaths and the number of infections. We have decent, but not perfect, data on the number of deaths. We do not have anywhere near enough data to know the number of infections. Without widespread, random samples of the population, there is no way to know that number. You can get the same number of deaths with high levels of infections and a low death rate or with low numbers of infections and a high death rate. Which of those scenarios is accurate? The answer has enormous implications for how easy it is to spread the disease in daily interaction.
A model that uses messy data will not give a precise answer. Instead, it will give a range—possibly quite large—of potential outcomes. There is an instinctive reaction in the press to latch onto the most alarmist numbers. We saw this with the Imperial College model, which has been the most influential model in this crisis. Headlines blared that the model predicted that, if nothing was done, there would be half a million deaths in the UK and over two million deaths in the US. A short time later, there was a follow-up announcement that the same model was predicting the number of deaths in the UK would be around 20,000. Much hand-wringing ensued. The real problem here is not that a model gave a different answer as we acquired better data and people’s behavior changed. The real problem was the media’s sensationalism about the upper end of predictions from a model built over thirteen years ago to predict flu pandemics.
Leave a Reply