Friday, 28 February 2020

(Update, Feb 28, 2020) Estimating the Corona Virus (Covid-19) Transmission Rate, from the Diamond Princess Data


(Update, Feb 28, 2020) Estimating the Corona Virus (Covid-19) Transmission Rate, from the Diamond Princess Data

Update: Feb 28, 2020

Almost all of the passengers of the Diamond Princess have now been evacuated from the ship, though there are still crew on board, waiting out a further quarantine.  However, some of the public servants who helped with the quarantine have now become sick, so there are still well-founded concerns about the efficacy of the quarantine.  Be that as it may, it seems like the number of reported cases has settled down to 705, out of the 3711 people originally on board (19% known infection rate).


Using these updated numbers, I have redone the graphs using different functional forms for the progress of the disease on the ship.  Above is the updated quadratic form (R-square=0.943 vs an earlier value of 0.964), which still has a very good fit to the data.



The power law form has an almost equal fit to the earlier estimate (R-square=0.988 vs 0.988), both being very high.  I would note, though, that the exponent here is very close to 2 (i.e., the quadratic model).



The exponential model’s fit is reduced from the previous case (R-square=0.802 vs the earlier fit of 0.856).  It is worth noting, though, that the date for the 705 count, which I have put as the 26th day of the outbreak is not exact - that’s just the first date that I could find with the updated value.  If the actual date was earlier, the exponential fit would be that much better (by visual inspection you can see that moving the final point to the left would considerably increase the fit of the model).


Conversely, the fit for the linear fit model is actually improved with the new data (R-square=0.901 vs the earlier fit of 0.855).  However, it would also be quite sensitive to a more accurate figure for the date of the 705 case report.


There have been some additional deaths linked to the Diamond Princess outbreak, now totaling 6.  Depending on the lag time between being diagnosed and death, that gives a fatality rate of between about 2% (9 day lag) and 5% (17 day lag).

Original Blog (Feb 22, 2020)

Since the outbreak of the Novel Coronavirus 2019-nCoV (now called Covid-19), there have been numerous questions concerning the nature and effects of the virus.  One of the key questions, that is obviously of great interest to everybody, is just how fast it can spread.

The cruise ship, Diamond Princess, presents us with a “natural experiment”, that can provide some clues.  We have good data on cases per day, which should be quite reliable, as the crew and passengers were under constant scrutiny, as to whether they were showing symptoms, and were being tested quite soon thereafter.  Given the attention of the world on the situation, it seems probable that the data would have been accurately reported.

It is worth noting that the Diamond Princess case is different from the virus in the wild, so to speak, for a number of reasons:

  • Great efforts were being made to prevent the spread of the virus within the ship, as it remained in harbour, with strict rules about access to the ship, travel within the ship and contact between people on board (passengers and crew).  That is, after all, what is meant by quarantine.  Granted, there are questions about just how effective those efforts were, but that should have reduced risk of transmission, at least in theory.
  •  Conversely, though, keeping all of these people in close quarters, with a virulent virus on board, presents a greater than average risk of any particular person in this limited population coming into contact with the virus.  Even though great efforts were presumably being made to enforce the quarantine, the situation was quite favourable, from the virus’s point of view.  As has been said, it was a sort of giant petri dish.

Given those facts, it is hard to say how generalizable the data on spreading is to other conditions, such as an urban environment.  Nonetheless, it is worth examining and learning what lessons we can, keeping in mind these caveats.  

Here are some graphs showing the progress of the disease, while the ship was in quarantine, in numbers of cases.  The graphs show the numbers of cases reported, a best-fit line that gives an idea of the underlying mathematical function, the statistical properties of that best-fit function (equation and R-square) and a projection of future cases predicted by the function, had the situation remained unchanged over the next couple of weeks.  Note that this data is publicly available, including the ship’s website.

I will present these graphs ascending order of their R-square.  This is a statistical measure that gives an idea of how closely the data fits the best-fit function; the higher the R-square, the better the data fits the functional form.  An R-square of 1 is a perfect fit (i.e. no error-term between the actual data and what would be predicted by the functional form).  Any R-square close to 1 is a good fit, though just how good is a bit of a judgement call.



Case 1 – Linear Relationship

This is the best case scenario, where the spread of the virus is slowest.  The graph shows a relatively slow but steady increase in cases.  In this scenario, the number of cases would remain below 1000 until two more weeks had passed.

Though the actual data (the blue points) lie relatively close to the line, the fit doesn’t look that great.  The R-square is fairly high at 0.855, though.

 

This relationship seems fairly unlikely on theoretical grounds. It implies that the same amount of new people become infected every day.  That could happen, but the rate of transmission from person to person would be quite low.  However, if the amount of time an infected person was infectious to others was rather short and contact was limited, something like this might prevail.  Basically, although the number of people who had been infected would grow, the number that were actually infectious at any point in time and therefore could spread the disease, would remain stable.

Case 2 – Exponential Relationship

This is probably the worst case scenario, where the spread of the virus is the most rapid.  The best fit line on the graph shows a rapid rise in cases, with the numbers exploding to infinity, as it is often said.  In fact, based on this functional relationship, everyone on board the ship would be expected to fall ill before another week was out (the ship’s complement of passengers and crew totalled 3711 people.
Again, the actual data (the blue points) lie reasonably close to the line, but visual impression of the fit isn’t that great, especially the lack of fit in the last three points.  At 0.859, the R-square is nearly identical to the linear relationship.


On theoretical grounds, a pandemic can grow at an exponential rate, at least for a while.  If each infected person can infect several people, the pandemic can grow at a very rapid clip.  Of course no function in the real world can remain exponential for too long – in the case of a virus it will eventually run out of hosts as it grows, the hosts will develop immunity or succumb to the disease and die.  Eventually, the function must turn  downward.

Case 3 – Quadratic Relationship

The quadratic form is a second order polynomial, which can also indicate rapid growth for an underlying phenomenon, though not as rapid as exponential growth.  This growth rate is faster than the linear model, but slower than the exponential.  This function predicts that about 2000 people would be infected within two weeks, over half the people on the ship.
In this case, the data points appear to fit the function very well, with some points slightly below the line and some slightly above (technically it is not heteroscedastic, as the exponential was).  The R-square is very high at 0.964.


As stated earlier, the quadratic model indicates that there is both a linear trend, and a second order trend to the data.  The latter means that the rate of change is accelerating, so to speak, as the days go on.
Like the exponential, a second order relationship can only go on for so long in the real world.  It will eventually be bounded by real world constraints.


Case 4 – Power Law Relationship

The quadratic form is a special case of a power law, where the exponent is equal to the integer 2.  The exploratory power law model shown below is very close to the quadratic model.  It would also have about half of the people on the ship sick within a couple of more weeks.


In this case, the data points also appear to fit the function very well, closely resembling the quadratic.  The R-square is slightly higher, at 0.988.


The Death Rate on Diamond Princess

At the time that passengers are being transferred to other locations, there were only 2 deaths among the 634 cases.  We don’t know when those deaths occurred for sure, but will make the assumption that they were on the last day (Feb 20, 20 days after the start of the outbreak, which we will place on Feb 1).
Given those parameters, we can calculate a rough death rate, assuming different lag times for median time from diagnosis to death.  Doing that gives the graph below, indicating a death rate of somewhere between 2 and 5%, assuming that the latency period is between a week and two weeks.  These are admittedly rough figures, but they correspond fairly well with the experience in China, which gives a fatality rate of about 5 to 6 percent, using a similar lag time.

It is difficult to extrapolate the death rates that might occur in less selected populations than a cruise ship.  On the one hand, cruise ship populations skew older, which could lead to higher fatality rates than in a more general population.

But on the other hand, people generally don’t take cruises if they are in extremely bad health or are extremely old.  Plus, a cruise ship population will be drawn from economically well off populations, who have benefitted from good health care all their lives.  So, these factors might tend to indicate a lower death rate.

At any rate, the ship quarantine has now been called off, and people are being air-lifted back to their home countries or to mainland Japan, though they may well continue being kept in quarantine in those locations.  So, this interesting natural experiment is now over.  Here’s hoping that epidemiologists and other public health workers learn some useful lessons from it, statistical and otherwise.



--------------------------------------------------------------------------------------------- 

And, here’s a more pleasant travel story than anticipating the worldwide journey of a virus.

On the Road with Bronco Billy

What follows is an account of a ten day journey through western North America during a working trip, delivering lumber from Edmonton Alberta to Dallas Texas, and returning with oilfield equipment. The writer had the opportunity to accompany a friend who is a professional truck driver, which he eagerly accepted. He works as a statistician for the University of Alberta, and is therefore is generally confined to desk, chair, and computer. The chance to see the world from the cab of a truck, and be immersed in the truck driving culture was intriguing. In early May 1997 they hit the road.
Some time has passed since this journal was written and many things have changed since the late 1990’s. That renders the journey as not just a geographical one, but also a historical account, which I think only increases its interest.

We were fortunate to have an eventful trip - a mechanical breakdown, a near miss from a tornado, and a large-scale flood were among these events. But even without these turns of fate, the drama of the landscape, the close-up view of the trucking lifestyle, and the opportunity to observe the cultural habits of a wide swath of western North America would have been sufficient to fill up an interesting journal.

The travelogue is about 20,000 words, about 60 to 90 minutes of reading, at typical reading speeds.

Saturday, 22 February 2020

Estimating the Corona Virus (Covid-19) Transmission Rate, from the Diamond Princess Data


Estimating the Corona Virus (Covid-19) Transmission Rate, from the Diamond Princess Data

Since the outbreak of the Novel Coronavirus 2019-nCoV (now called Covid-19), there have been numerous questions concerning the nature and effects of the virus.  One of the key questions, that is obviously of great interest to everybody, is just how fast it can spread.

The cruise ship, Diamond Princess, presents us with a “natural experiment”, that can provide some clues.  We have good data on cases per day, which should be quite reliable, as the crew and passengers were under constant scrutiny, as to whether they were showing symptoms, and were being tested quite soon thereafter.  Given the attention of the world on the situation, it seems probable that the data would have been accurately reported.

It is worth noting that the Diamond Princess case is different from the virus in the wild, so to speak, for a number of reasons:


  • Great efforts were being made to prevent the spread of the virus within the ship, as it remained in harbor, with strict rules about access to the ship, travel within the ship and contact between people on board (passengers and crew).  That is, after all, what is meant by quarantine.  Granted, there are questions about just how effective those efforts were, but that should have reduced risk of transmission, at least in theory.

  • Conversely, though, keeping all of these people in close quarters, with a virulent virus on board, presents a greater than average risk of any particular person in this limited population coming into contact with the virus.  Even though great efforts were presumably being made to enforce the quarantine, the situation was quite favourable, from the virus’s point of view.  As has been said, it was a sort of giant petri dish.

Given those facts, it is hard to say how generalizable the data on spreading is to other conditions, such as an urban environment.  Nonetheless, it is worth examining and learning what lessons we can, keeping in mind these caveats.  

Here are some graphs showing the progress of the disease, while the ship was in quarantine, in numbers of cases.  The graphs show the numbers of cases reported, a best-fit line that gives an idea of the underlying mathematical function, the statistical properties of that best-fit function (equation and R-square) and a projection of future cases predicted by the function, had the situation remained unchanged over the next couple of weeks.  Note that this data is publicly available, including the ship’s website.

I will present these graphs ascending order of their R-square.  This is a statistical measure that gives an idea of how closely the data fits the best-fit function; the higher the R-square, the better the data fits the functional form.  An R-square of 1 is a perfect fit (i.e. no error-term between the actual data and what would be predicted by the functional form).  Any R-square close to 1 is a good fit, though just how good is a bit of a judgement call.


Case 1 – Linear Relationship

This is the best case scenario, where the spread of the virus is slowest.  The graph shows a relatively slow but steady increase in cases.  In this scenario, the number of cases would remain below 1000 until two more weeks had passed.

Though the actual data (the blue points) lie relatively close to the line, the fit doesn’t look that great.  The R-square is fairly high at 0.855, though.




This relationship seems fairly unlikely on theoretical grounds. It implies that the same amount of new people become infected every day.  That could happen, but the rate of transmission from person to person would be quite low.  However, if the amount of time an infected person was infectious to others was rather short and contact was limited, something like this might prevail.  Basically, although the number of people who had been infected would grow, the number that were actually infectious at any point in time and therefore could spread the disease, would remain stable.


Case 2 – Exponential Relationship

This is probably the worst case scenario, where the spread of the virus is the most rapid.  The best fit line on the graph shows a rapid rise in cases, with the numbers exploding to infinity, as it is often said.  In fact, based on this functional relationship, everyone on board the ship would be expected to fall ill before another week was out (the ship’s complement of passengers and crew totalled 3711 people.

Again, the actual data (the blue points) lie reasonably close to the line, but visual impression of the fit isn’t that great, especially the lack of fit in the last three points.  At 0.859, the R-square is nearly identical to the linear relationship.



On theoretical grounds, a pandemic can grow at an exponential rate, at least for a while.  If each infected person can infect several people, the pandemic can grow at a very rapid clip.  Of course no function in the real world can remain exponential for too long – in the case of a virus it will eventually run out of hosts as it grows, the hosts will develop immunity or succumb to the disease and die.  Eventually, the function must turn  downward.

Case 3 – Quadratic Relationship

The quadratic form is a second order polynomial, which can also indicate rapid growth for an underlying phenomenon, though not as rapid as exponential growth.  This growth rate is faster than the linear model, but slower than the exponential.  This function predicts that about 2000 people would be infected within two weeks, over half the people on the ship.

In this case, the data points appear to fit the function very well, with some points slightly below the line and some slightly above (technically it is not heteroscedastic, as the exponential was).  The R-square is very high at 0.964.



As stated earlier, the quadratic model indicates that there is both a linear trend, and a second order trend to the data.  The latter means that the rate of change is accelerating, so to speak, as the days go on.

Like the exponential, a second order relationship can only go on for so long in the real world.  It will eventually be bounded by real world constraints.


Case 4 – Power Law Relationship

The quadratic form is a special case of a power law, where the exponent is equal to the integer 2.  The exploratory power law model shown below is very close to the quadratic model.  It would also have about half of the people on the ship sick within a couple of more weeks.



In this case, the data points also appear to fit the function very well, closely resembling the quadratic.  The R-square is slightly higher, at 0.988.



The Death Rate on Diamond Princess

At the time that passengers are being transferred to other locations, there were only 2 deaths among the 634 cases.  We don’t know when those deaths occurred for sure, but will make the assumption that they were on the last day (Feb 20, 20 days after the start of the outbreak, which we will place on Feb 1).

Given those parameters, we can calculate a rough death rate, assuming different lag times for median time from diagnosis to death.  Doing that gives the graph below, indicating a death rate of somewhere between 2 and 5%, assuming that the latency period is between a week and two weeks.  These are admittedly rough figures, but they correspond fairly well with the experience in China, which gives a fatality rate of about 5 to 6 percent, using a similar lag time.



It is difficult to extrapolate the death rates that might occur in less selected populations than a cruise ship.  On the one hand, cruise ship populations skew older, which could lead to higher fatality rates than in a more general population.

But on the other hand, people generally don’t take cruises if they are in extremely bad health or are extremely old.  Plus, a cruise ship population will be drawn from economically well off populations, who have benefitted from good health care all their lives.  So, these factors might tend to indicate a lower death rate.

At any rate, the ship quarantine has now been called off, and people are being air-lifted back to their home countries or to mainland Japan, though they may well continue being kept in quarantine in those locations.  So, this interesting natural experiment is now over.  Here’s hoping that epidemiologists and other public health workers learn some useful lessons from it, statistical and otherwise.




And, here’s a more pleasant travel story than anticipating the worldwide journey of a virus.

On the Road with Bronco Billy

What follows is an account of a ten day journey through western North America during a working trip, delivering lumber from Edmonton Alberta to Dallas Texas, and returning with oilfield equipment. The writer had the opportunity to accompany a friend who is a professional truck driver, which he eagerly accepted. He works as a statistician for the University of Alberta, and is therefore is generally confined to desk, chair, and computer. The chance to see the world from the cab of a truck, and be immersed in the truck driving culture was intriguing. In early May 1997 they hit the road.

Some time has passed since this journal was written and many things have changed since the late 1990’s. That renders the journey as not just a geographical one, but also a historical account, which I think only increases its interest.

We were fortunate to have an eventful trip - a mechanical breakdown, a near miss from a tornado, and a large-scale flood were among these events. But even without these turns of fate, the drama of the landscape, the close-up view of the trucking lifestyle, and the opportunity to observe the cultural habits of a wide swath of western North America would have been sufficient to fill up an interesting journal.

The travelogue is about 20,000 words, about 60 to 90 minutes of reading, at typical reading speeds.