Estimating a More Realistic Fatality Rate of the Coronavirus, from Publicly Available Data
Since the outbreak of the Novel Coronavirus (2019-nCoV),
there have been numerous statistics concerning the nature and effects of the
virus. One of the key statistics, that
is obviously of great interest to everybody, is the fatality rate. These numbers have been widely disseminated by governments and published in the press. One of the key numbers is the fatality rate - more on that below, after some initial remarks about the progress of the disease.
Here are a couple of graphs showing the progress of the
disease, in numbers of cases and fatalities.
The first shows the numbers with conventional axes, cases on the right
axis and fatalities on the left axis. As
you can see, the lines follow the same general functional shape, to a
considerable level of agreement.
The functions also appear to increasing at more than a linear
rate, likely some form of an exponential function, as might be expected during
the early phase of an epidemic. Of
course no function in the real world can remain exponential for too long – in the
case of a virus it will eventually run out of hosts as it grows, with the hosts
will developing immunity or succumbing to the disease. Eventually, the function turns downward,
looking more like a quadratic, but we seem to be far from that, at the present
moment.
The second shows the same data, but with a logarithmically
scaled axis. Best fit lines are also
shown, as well as the accompanying function form and R-square. The latter is a measure of how well the
equation fits the data, with a value of 1 indicating a perfect fit. The fatality function appears to be a better
fit to an exponential than the total cases function, both visually (on the
logarithmic plot, it approximates a straight line) and in terms of its
R-square, which is high at 0.96. The
total cases curve is visually a less impressive fit, and its R-square is lower
at 0.90.
Note that the first few periods depart from the exponential
form, which is likely due to measurement uncertainties in the early days of
reporting. With relatively low counts,
these uncertainties create a lot of noise in the data, though that tends to
settle down as the Ns go up. Though I
didn’t put in piecewise smooth lines, you can see by eye that the functions “straighten
out”, to a considerable degree after period 4 to 6.
This fact that the total cases line has a lower R-square
than the total deaths line seems to be explainable by the different degrees of measurement
error possible between total cases and total deaths. The former has more room for error – for
example, some cases may be asymptomatic or very mild, and therefore might never
be reported and thus not included in statistical totals. Deaths, on the other hand, are hard to miss,
and though the cause of death may still sometimes be misreported, deaths still
demand an explanation, so are more likely to be accurately identified and
reported.
The same is true of the tendency of authorities to play down
numbers to avoid a panic – it is easier to hide mild cases than deaths, so the
number of cases is less likely to be accurately captured than the number of
deaths (not that I am accusing anybody of doing that intentionally).
That brings us to the matter of how the fatality rate is
calculated and reported. It has
generally been stated to be around 2 to 3 percent, with the value settling into
a recent trend of about 2.1 per cent, in the statistics that I have read.
The fatality rate seems to be simply calculated as:
(deaths up to time T)/(cases up to time T)
The actual numbers are given in the table below and in the
graph. As you can see the fatality rate
is much higher at the start of the time series, then settles down to a fairly
steady 2.0 to 2.5 percent rate.
Date
|
Deaths
|
Cases
|
Fatality Rate
|
23-Jan-20
|
25
|
265
|
9.4%
|
24-Jan-20
|
41
|
733
|
5.6%
|
25-Jan-20
|
56
|
1436
|
3.9%
|
26-Jan-20
|
80
|
2222
|
3.6%
|
27-Jan-20
|
106
|
4000
|
2.7%
|
28-Jan-20
|
132
|
5482
|
2.4%
|
29-Jan-20
|
170
|
7237
|
2.3%
|
30-Jan-20
|
213
|
9242
|
2.3%
|
31-Jan-20
|
259
|
11369
|
2.3%
|
1-Feb-20
|
304
|
13972
|
2.2%
|
2-Feb-20
|
362
|
16808
|
2.2%
|
3-Feb-20
|
426
|
20047
|
2.1%
|
4-Feb-20
|
492
|
23974
|
2.1%
|
5-Feb-20
|
565
|
27697
|
2.0%
|
6-Feb-20
|
638
|
30860
|
2.1%
|
7-Feb-20
|
724
|
34297
|
2.1%
|
8-Feb-20
|
813
|
36973
|
2.2%
|
However, this manner of calculating the rate is misleading
when an epidemic is growing or
shrinking. When the epidemic is growing,
it will tend to under-report the real fatality rate, and once the corner is
turned on the epidemic, it will tend to over-report the rate.
The reason is that deaths are a lagged variable in the time
series, compared to cases. There is a
time lag from being exposed to the virus and being infected, to coming down
with the illness, to ultimately dying from it.
So, the death rate shouldn’t be calculated as current deaths divided
into current cases, but current deaths divided into cases at some earlier time.
How long the time lag should be for the equation will be a
function of the average time between infection and death. That can only be calculated by following a cohort
of patients in the early stages of the disease and noting the time to death. One study gives this time as about 7 days,
though it would take a lot more data to be really sure of this number:
“A study on 138 hospitalized
patients published on February 7 on JAMA, found that the median time from first
symptom to dyspnea was 5.0 days, to hospital admission was 7.0 days, and to
ARDS was 8.0 days.”
Using this data gives the results of the graphs below. The first graph divides the number of new deaths
reported at time T into the number of new cases at various time lags, in days. The case for lag of 7 days is highlighted; it
gives a fatality rate of about 6.3%.
The second graph calculates the rate as the total number of
deaths at time T divided into the total number of cases at time T-Lag, in days. It gives a slightly lower fatality rate at
the 7 day lag, of about 5.6%. I should
note that I clipped out the first few data points from this calculation, as the
early counts had low Ns for both cases and fatalities, which yield unrealistic
numbers. By waiting a bit, the numbers
settle down, as the Ns increase.
It is interesting that this method of calculating the fatality rate gives a more stable number than the first. Nonetheless, it is notable that both of these simple calculations yield
about the same fatality rate for a 7 day lag, about 6 percent.
Obviously, there is still a lot of uncertainty about the
data and the epidemic is still in its early stages, at least as far as we know. But these estimates of fatality rates seem
likely to be more accurate than what we are usually seeing, as things
develop. They also align better with
some earlier similar epidemics, such as SARS, which had a fatality rate of about
9%.
There is a certain symmetry in the situation – just as the
currently used measure underestimates the danger of the virus, it will
overestimate the danger at the time when the danger is actually going down. At that time, authorities will either have to
restate the calculation in more favorable terms (losing public trust by
presenting mixed messages about how best to measure the fatality rate) or attempt
to communicate the fact that things are actually getting better when the faulty
measure says that they are getting worse (which also will create a public trust
problem).
Let’s hope that the situation
remains manageable and that people “remain calm and carry on”. Personally, I think the China has reacted
quite responsibly to the outbreak. I am
not sure if western governments could have taken the similarly strong measures
that seem necessary. Various economic
and privacy rights concerns would have made that extremely difficult. However, there may be tests to come, if the
epidemic progresses unfavorably.
And, here’s a more pleasant travel story than anticipating the
worldwide journey of a virus.
On the Road with Bronco Billy
What follows is an
account of a ten day journey through western North America during a working
trip, delivering lumber from Edmonton Alberta to Dallas Texas, and returning
with oilfield equipment. The writer had the opportunity to accompany a friend
who is a professional truck driver, which he eagerly accepted. He works as a
statistician for the University of Alberta, and is therefore is generally
confined to desk, chair, and computer. The chance to see the world from the cab
of a truck, and be immersed in the truck driving culture was intriguing. In
early May 1997 they hit the road.
Some time has passed
since this journal was written and many things have changed since the late
1990’s. That renders the journey as not just a geographical one, but also a
historical account, which I think only increases its interest.
We were fortunate to have an eventful trip - a mechanical breakdown, a near miss from a tornado, and a large-scale flood were among these events. But even without these turns of fate, the drama of the landscape, the close-up view of the trucking lifestyle, and the opportunity to observe the cultural habits of a wide swath of western North America would have been sufficient to fill up an interesting journal.
The travelogue is about 20,000 words, about 60 to 90 minutes of reading, at typical reading speeds.
We were fortunate to have an eventful trip - a mechanical breakdown, a near miss from a tornado, and a large-scale flood were among these events. But even without these turns of fate, the drama of the landscape, the close-up view of the trucking lifestyle, and the opportunity to observe the cultural habits of a wide swath of western North America would have been sufficient to fill up an interesting journal.
The travelogue is about 20,000 words, about 60 to 90 minutes of reading, at typical reading speeds.
Amazon U.S.: http://www.amazon.com/gp/product/B00X2IRHSK
Amazon U.K.: http://www.amazon.co.uk/gp/product/B00X2IRHSK
Amazon Germany: http://www.amazon.de/gp/product/B00X2IRHSK
Amazon France: https://www.amazon.fr/dp/B00X2IRHSK
Amazon Spain: https://www.amazon.es/dp/B00X2IRHSK
Amazon Italy: https://www.amazon.it/dp/B00X2IRHSK
Amazon Netherlands: https://www.amazon.nl/dp/B00X2IRHSK
Amazon Japan: https://www.amazon.co.jp/dp/B00X2IRHSK
Amazon Brazil: https://www.amazon.com.br/dp/B00X2IRHSK
Amazon Canada: http://www.amazon.ca/gp/product/B00X2IRHSK
Amazon Mexico: https://www.amazon.com.mx/dp/B00X2IRHSK
Amazon Australia: https://www.amazon.com.au/dp/B00X2IRHSK
Amazon India: https://www.amazon.in/dp/B00X2IRHSK
No comments:
Post a Comment