Monday, 10 February 2020

Estimating a More Realistic Fatality Rate of the Coronavirus, from Publicly Available Data


Estimating a More Realistic Fatality Rate of the Coronavirus, from Publicly Available Data

Since the outbreak of the Novel Coronavirus (2019-nCoV), there have been numerous statistics concerning the nature and effects of the virus.  One of the key statistics, that is obviously of great interest to everybody, is the fatality rate.  These numbers have been widely disseminated by governments and published in the press.  One of the key numbers is the fatality rate - more on that below, after some initial remarks about the progress of the disease.

Here are a couple of graphs showing the progress of the disease, in numbers of cases and fatalities.  The first shows the numbers with conventional axes, cases on the right axis and fatalities on the left axis.  As you can see, the lines follow the same general functional shape, to a considerable level of agreement.

The functions also appear to increasing at more than a linear rate, likely some form of an exponential function, as might be expected during the early phase of an epidemic.  Of course no function in the real world can remain exponential for too long – in the case of a virus it will eventually run out of hosts as it grows, with the hosts will developing immunity or succumbing to the disease.  Eventually, the function turns downward, looking more like a quadratic, but we seem to be far from that, at the present moment.



The second shows the same data, but with a logarithmically scaled axis.  Best fit lines are also shown, as well as the accompanying function form and R-square.  The latter is a measure of how well the equation fits the data, with a value of 1 indicating a perfect fit.  The fatality function appears to be a better fit to an exponential than the total cases function, both visually (on the logarithmic plot, it approximates a straight line) and in terms of its R-square, which is high at 0.96.  The total cases curve is visually a less impressive fit, and its R-square is lower at 0.90.

Note that the first few periods depart from the exponential form, which is likely due to measurement uncertainties in the early days of reporting.  With relatively low counts, these uncertainties create a lot of noise in the data, though that tends to settle down as the Ns go up.  Though I didn’t put in piecewise smooth lines, you can see by eye that the functions “straighten out”, to a considerable degree after period 4 to 6.

 

This fact that the total cases line has a lower R-square than the total deaths line seems to be explainable by the different degrees of measurement error possible between total cases and total deaths.  The former has more room for error – for example, some cases may be asymptomatic or very mild, and therefore might never be reported and thus not included in statistical totals.  Deaths, on the other hand, are hard to miss, and though the cause of death may still sometimes be misreported, deaths still demand an explanation, so are more likely to be accurately identified and reported.

The same is true of the tendency of authorities to play down numbers to avoid a panic – it is easier to hide mild cases than deaths, so the number of cases is less likely to be accurately captured than the number of deaths (not that I am accusing anybody of doing that intentionally).

That brings us to the matter of how the fatality rate is calculated and reported.  It has generally been stated to be around 2 to 3 percent, with the value settling into a recent trend of about 2.1 per cent, in the statistics that I have read.

The fatality rate seems to be simply calculated as:

 (deaths up to time T)/(cases up to time T)

The actual numbers are given in the table below and in the graph.  As you can see the fatality rate is much higher at the start of the time series, then settles down to a fairly steady 2.0 to 2.5 percent rate.
Date
Deaths
Cases
Fatality Rate
23-Jan-20
25
265
9.4%
24-Jan-20
41
733
5.6%
25-Jan-20
56
1436
3.9%
26-Jan-20
80
2222
3.6%
27-Jan-20
106
4000
2.7%
28-Jan-20
132
5482
2.4%
29-Jan-20
170
7237
2.3%
30-Jan-20
213
9242
2.3%
31-Jan-20
259
11369
2.3%
1-Feb-20
304
13972
2.2%
2-Feb-20
362
16808
2.2%
3-Feb-20
426
20047
2.1%
4-Feb-20
492
23974
2.1%
5-Feb-20
565
27697
2.0%
6-Feb-20
638
30860
2.1%
7-Feb-20
724
34297
2.1%
8-Feb-20
813
36973
2.2%



However, this manner of calculating the rate is misleading when an epidemic is growing  or shrinking.  When the epidemic is growing, it will tend to under-report the real fatality rate, and once the corner is turned on the epidemic, it will tend to over-report the rate.

The reason is that deaths are a lagged variable in the time series, compared to cases.  There is a time lag from being exposed to the virus and being infected, to coming down with the illness, to ultimately dying from it.  So, the death rate shouldn’t be calculated as current deaths divided into current cases, but current deaths divided into cases at some earlier time.

How long the time lag should be for the equation will be a function of the average time between infection and death.  That can only be calculated by following a cohort of patients in the early stages of the disease and noting the time to death.  One study gives this time as about 7 days, though it would take a lot more data to be really sure of this number:

“A study on 138 hospitalized patients published on February 7 on JAMA, found that the median time from first symptom to dyspnea was 5.0 days, to hospital admission was 7.0 days, and to ARDS was 8.0 days.”


Using this data gives the results of the graphs below.  The first graph divides the number of new deaths reported at time T into the number of new cases at various time lags, in days.  The case for lag of 7 days is highlighted; it gives a fatality rate of about 6.3%.



The second graph calculates the rate as the total number of deaths at time T divided into the total number of cases at time T-Lag, in days.  It gives a slightly lower fatality rate at the 7 day lag, of about 5.6%.  I should note that I clipped out the first few data points from this calculation, as the early counts had low Ns for both cases and fatalities, which yield unrealistic numbers.  By waiting a bit, the numbers settle down, as the Ns increase.  It is interesting that this method of calculating the fatality rate gives a more stable number than the first.  Nonetheless, it is notable that both of these simple calculations yield about the same fatality rate for a 7 day lag, about 6 percent.



Obviously, there is still a lot of uncertainty about the data and the epidemic is still in its early stages, at least as far as we know.  But these estimates of fatality rates seem likely to be more accurate than what we are usually seeing, as things develop.  They also align better with some earlier similar epidemics, such as SARS, which had a fatality rate of about 9%.

There is a certain symmetry in the situation – just as the currently used measure underestimates the danger of the virus, it will overestimate the danger at the time when the danger is actually going down.  At that time, authorities will either have to restate the calculation in more favorable terms (losing public trust by presenting mixed messages about how best to measure the fatality rate) or attempt to communicate the fact that things are actually getting better when the faulty measure says that they are getting worse (which also will create a public trust problem).

Let’s hope that the situation remains manageable and that people “remain calm and carry on”.  Personally, I think the China has reacted quite responsibly to the outbreak.  I am not sure if western governments could have taken the similarly strong measures that seem necessary.  Various economic and privacy rights concerns would have made that extremely difficult.  However, there may be tests to come, if the epidemic progresses unfavorably.



And, here’s a more pleasant travel story than anticipating the worldwide journey of a virus.

On the Road with Bronco Billy

What follows is an account of a ten day journey through western North America during a working trip, delivering lumber from Edmonton Alberta to Dallas Texas, and returning with oilfield equipment. The writer had the opportunity to accompany a friend who is a professional truck driver, which he eagerly accepted. He works as a statistician for the University of Alberta, and is therefore is generally confined to desk, chair, and computer. The chance to see the world from the cab of a truck, and be immersed in the truck driving culture was intriguing. In early May 1997 they hit the road.

Some time has passed since this journal was written and many things have changed since the late 1990’s. That renders the journey as not just a geographical one, but also a historical account, which I think only increases its interest.

We were fortunate to have an eventful trip - a mechanical breakdown, a near miss from a tornado, and a large-scale flood were among these events. But even without these turns of fate, the drama of the landscape, the close-up view of the trucking lifestyle, and the opportunity to observe the cultural habits of a wide swath of western North America would have been sufficient to fill up an interesting journal.

The travelogue is about 20,000 words, about 60 to 90 minutes of reading, at typical reading speeds.




No comments:

Post a Comment