Country progress to 9 May in controlling COVID-19 epidemics

I downloaded the latest COVID-19 data for reported deaths and confirmed cases from Johns Hopkins this morning to see whether the data supports the relaxation of social isolation that is starting to happen in many countries. The USA now has 1.35 million confirmed cases (just under 1/3 of the global total) and 80,323 deaths (28% of the global total) and the trends are quite different for New York and the rest of the USA, as shown in the plot below. So I have done separate plots for New York and USA excluding New York in the following plots.

Daily deaths per million (dark blue line) and confirmed cases per 100,000 (red line).

The plots below show smoothed death rates (per million) and case rates (per 100,000) up to 9 May using simple moving averages. Inspired by some plots on I’ve organized the countries into three groups:  those who have controlled the epidemic, those who are partway there (deaths are coming down, but not yet approaching zero) and those that need to take further action to turn things around. I excluded countries with population less than 1 million.

There are many more deaths and cases than recorded in this dataset. For countries with good data systems, excess deaths in the last two months is around 50 to 60% higher than confirmed coronavirus deaths, and the under-reporting is almost certainly much higher in most developing countries. However, these data probably provide a reasonably indication of the epidemic trends, at least in countries with reasonable testing levels and good data systems.

The final graph shows the top 20 countries in terms of deaths per million population, and their epidemic status. As to whether I’ve grouped the countries appropriately, there are a number where its debatable which group they should be in, and a few days more data may clarify what progress is occurring.

* Since almost all the deaths for China are in Hubei Province (Wuhan etc), I’ve used the population of Hubei rather than that of total China to calculate rates.

Posted in Global health trends | Tagged , , , | 3 Comments

COVID-19 projections and reality

On 27th April, I posted some short-run projections of COVID-19 cases and deaths. The plots below show how the daily new cases per million population and deaths per million population compare with reality (at least the confirmed case rates and death rates up to 4th May according to Johns Hopkins CSSEGIS Data.

Its a mixed bag. The projections match reasonably well for a few countries and are very different for others. I’ve revised the smoothing algorithm I used, and that may result in improved projections. But overall, I think I’m not doing much better than IHME, and should probably leave it to those with better models that use SEIR (susceptible-exposed-infected-recovered modeling) or computer simulations of case transmission.

Vox recently published an excellent article on the problems with the IHME modelling of COVID-19. The article also gives a link to a site which has been set up so that you can look at the US predictions made by old versions of the IHME model (and another model). The IHME models are frequently fairly far off. Here is the comparison for the USA as a whole (you can also examine State specific projections).

Its clear that the projection method takes the latest data point and plummets and essentially the same rate as the earlier rise. On second thoughts, I think my short-run projections are doing better than these for many countries.


Posted in Global health trends, Projections | Tagged , , , , | Leave a comment

COVID-19 short-run projections

Purely statistical predictions of future trends in COVID-19 deaths or cases, even including predictive covariates, have been unable to make sensible forecasts that are not highly sensitive to slight additions of data. The only useful models have been the more traditional SIR and individual-based models of epidemic infectious disease spread in which scenarios allow some assessment of the impact of various social distancing measures.

It struck me there may be a way to make short-run projections of daily deaths, using information on recent trends in confirmed cases. I spent a couple of hours doing this yesterday, here is the result for Switzerland. I explain the simplistic modelling assumptions below, in what is intended only to be an exploratory analysis.

Reported daily new confirmed cases and deaths (dots) along with short-run projections.

A major issue is the quality of the available data on deaths and cases. Recent analysis of total mortality data for the pandemic period with similar data for previous years summarized in a recent New York Times article suggests that total deaths due to COVID-19 are around 60% higher than the reported COVID-19 deaths, which largely include only hospital deaths in many countries. These data are for countries with good death registration systems which cover 100% of deaths, albeit with some delays in registration and coding. For countries with poor or non-existent death registration, including most of Africa, the under-reporting will almost certainly be much higher. I’ve been looking at plots of daily deaths and noticed that there are occasional large increases in some countries for around 1 day in 7, and this may relate to timing of data compilation and reporting, or to “catch-up” when batches of deaths from outside the acute hospital sector are added. Additionally, the very strong age and sex dependence of fatality rates should mean that models need to take into account population age-sex structure and variation of case fatality rates and other factors by age and sex.

Confirmed case time series are also affected by the scope of testing and the extent to which the testing is restricted to specific risk groups or symptomatic people, or is extensive enough to approximate a population sampling. According to the data in, country testing rates in developed countries, excluding those for some small populations under 1 million, range from around 10 per 1,000 in the UK up to around 35 per 1000 in some smaller European countries. For developing countries, rates can be much lower, less than 1 per 1,000 in many African and Asian countries. Trends in confirmed cases may be strongly influenced by trends in testing rates, as well as changes in the populations targeted for testing.

I downloaded latest data up to 25 April (CSSEGIS COVID-19 data) yesterday, and initially did some curve smoothing to look at trends in daily confirmed cases and deaths, mostly to see whether the epidemic does seem to have peaked in countries now discussing easing social distancing. It struck me that it may be possible to make use of these smoothed series to do some short-run projections of deaths out for around a week, without making strong assumptions about the shape of the curve, as was done by IHME in its recent modelling. Here is a plot for Switzerland of daily new confirmed cases and deaths.

My key insight was that trends in mortality should reflect trends in confirmed cases around 14 days earlier (based on a quick literature review). If testing rates are stable, this should allow projection of daily deaths out up to around 10-14 days based on the confirmed case time series. Furthermore, it should be possible to assess and project the trend in apparent case fatality rate (deaths divided by cases 14 days earlier) which should reflect the trend in testing rates and regime. So I spent a couple of hours yesterday having a go at doing this. This is not intended to be a serious attempt at prediction of short-run trends, as I’ve made some simplifying assumptions and picked a curve smoothing technique that was to hand, but not probably ideal. But I will compare reality in a week’s time with my projections, just for the heck of it.

In order to calculate the denominator (confirmed cases) for estimating apparent case fatality rates acfr, I assumed that the days d from diagnosis to death ranged from 8 to 21 days with a lognormal distribution with a mean of 14.1 days. In the limit where there was high levels of testing, I next assumed that the acfr should approach that observed for confirmed cases in Wuhan (2.2%) but adjusted for the age distribution of the country. So the long-term acfr, lacfr, will be higher for developed countries with older population distribution, such as Italy (4.3%) and lower for developing countries such as South Africa (1.2%).

I projected recent trend in ln(acfr-lacfr) using simple regression against time with an exponential weight, giving weight of 1 to the observation for the most recent day, and weights decreasing by factor 0.85 each day into the past. If the recent observed acfr was already lower than my estimated lacfr, I left it constant at its current value. For countries with recently declining acfr, the projection asymptotes at lacfr. For countries with increasing acfr, that increase is projected to continue. Only in a handful of countries does that projection result in dramatically increasing acfr. I’m not sure what that says about the data series, but there is clearly some issue with the data.

The following plots show two typical examples of the projection of apparent case fatality rates in which it starts very high (when deaths have started to occur but there is still very limited testing) and declines in a reasonably regular manner. The third example, for Sweden, is a country in which there is much more variability in the apparent case fatality rate, perhaps reflecting low numbers of cases and deaths, and also likely variations in data quality or scope.

Next, I did a similar short run projection of ln(smoothed daily confirmed cases) out 7 days into the future, and then calculated the denominators associated with smoothed daily deaths out 10 days into the future. This denominator projection actually uses only the first two days of the cases projection for the small fraction of the denominator associated with early deaths. I mostly did the projection of cases to see what a slightly longer projection of deaths looked like, but don’t present that here.

The plots below show examples of these projections for a few selected countries. I’ve added some comments in the captions.

Australia has a low case fatality rate, and appears to have indeed done extremely well in containing the epidemic, as is being claimed.

Switzerland shows clear evidence that the epidemic peaked in late March to early April and is in decline.

The US projections are for continuing increasing daily cases, and for slight decline in daily deaths, probably partly reflecting increasing levels of testing. These projections are probably not that meaningful, as the US has epidemics occurring with different timing in various States, and State level modelling would probably give more nuanced results.

Its unclear from this graph whether daily cases have plateaued, but a projected declining case fatality rate associated with increasing levels of testing has resulted in a projected decline in daily deaths. It will be interesting to see whether reality does better than this.

Clearly a continuing decline for Italy, though projected deaths are nearly flat. Again, reality will hopefully do better.

A rising apparent case fatality rate has resulted in a projected continuing increase in deaths. However, Germany has concluded that daily deaths are declining and it is time to relax social restrictions. Unclear from this data, but the death rate is much lower than for most other European countries.

France also show a peak followed by decline which is projected to continue.

The Netherlands has a death rate 40% higher than Switzerland, though the confirmed case rate is only 70% that of Switzerland. But both appear to be declining.

Larger variability in daily cases and deaths make it difficult to know whether these projected rising numbers are plausible. But there does not seem to be any good evidence the epidemic has peaked in Sweden.

These projections are simplistic, and really mainly to explore the data and the possibility of dealing with changing testing rates in doing projections.  A genuinely useful projection model of this type would not only need to have better evidence-based inputs but also ideally data disaggregated by age and sex, and for larger countries with epidemics in various population centres with different timing, to model at regional rather than national level. The likelihood of “later” epidemics starting in care home or other special population and either spreading into the community or causing later epidemic waves may also need to be taken into account. To a limited extent, it may be possible to treat the data on confirmed cases and hospital deaths as “indicators” of the epidemic and its dynamics, even though large numbers of cases and deaths are not included in such statistics. But large untested populations in institutions such as care homes or prisons could make a huge difference in some countries with relatively large institutionalized populations.

Still I conclude that there may be some value in using case data to make very short-run projections of deaths. Or perhaps to test the usefulness of such an approach using scenario results from one of the SIR or individual-based models.

Posted in Global health trends, Projections | Tagged , , , , | 6 Comments

Critics agree that IHME COVID-19 projection model is flawed

I downloaded the latest IHME update of its COVID-19 projections yesterday to do another evaluation. Their projections still look very problematic. Other disease modellers and epidemiologists are coming to the same conclusion. An article just published in Statnews was much less polite than I was in my previous post. This article quoted epidemiologist Ruth Etzioni as saying “That it is being used for policy decisions and its results interpreted wrongly is a travesty unfolding before our eyes.”

The Statnews article claims that the IHME model may have influenced Trump’s thinking on when to re-open the country, and that IHME’s early very high projections for the USA are likely to be used by Trump to claim the government response has prevented a great catastrophe.

An article in the Washington Post is also highly critical of IHME and reviews the various other models in use which enable the impacts of social isolation to be better taken into account along with the epidemiological characteristics of the epidemics. A critique from researchers at Imperial College and LSHTM was also published this week in Annals of Internal Medicine and stated that the IHME projections are based “on a statistical model with no epidemiologic basis.”

These articles also note the volatility of the projections to updates of a few extra days data and like me see this as evidence of a very poor predictive model. The main impact of a model like this where the results vary wildly from one update to the next (as opposed to from one scenario to another) are very likely to reduce public/political confidence in all modelling. And this could lead to more deaths.

Having read these various other critiques, I don’t think I will bother to do a second evaluation of IHME’s latest updated projections now.

Posted in Global health trends, Projections | Tagged , , , , | 2 Comments

How useful are IHME projections of the coronavirus pandemic?

The Institute for Health Metrics and Evaluation (IHME), based at the University of Washington in Seattle, caused considerable alarm on 7 April when it released projections of Covid-19 deaths which predicted total deaths for the UK would be the highest in Europe at 66,314, and higher than their projected total deaths of 60,415 for the USA. According to the results on their webpage at, daily deaths for USA would peak at 2200 in the next few days and start declining from 12 April. In contrast, the UK daily deaths continue to rise almost linearly for the next 12 days from 623 per day to 2900 per day. The curve then flattens at around 3000 deaths per day for a while before declining back to zero in June, giving total deaths of 66,314.

According to the Guardian newspaper: “The 66,000 figure was disputed by scientists whose modelling of the likely shape of the UK epidemic is relied on by the government. Prof Neil Ferguson, of Imperial College London, said last week when the prediction was published that the IHME figures were twice as high as they should have been.”

Three days later, IHME revised the UK projection downwards to around 37,000 deaths by end of July. Despite this lower figure, the UK would still have the highest death toll in Europe. The IHME website says this revision is due to the inclusion of four more days of data as input to their projection model. However, the very different projections for the UK from those for the USA and other European countries did not seem plausible to me, or explicable as due to different social distancing policies (the only predictive variable included in the IHME model).

So I have tested their projections over a short time period of days against subsequent reality. On April 11 I recorded their projections from the last data point for 9 April through to April 18. And today, I downloaded cumulative deaths from the  Johns Hopkins Covid-19 site and calculated deaths per day for Italy, Switzerland, UK and USA. The graph below shows the reported deaths for these countries as solid lines, and the IHME projected deaths from 9 April as dashed lines. I have to conclude their projection model is producing seriously bizarre results.

Reported deaths to 13 April are shown as solid curves. The IHME projected deaths from 9 April to 18 April are shown as dotted curves.

Today I also took another look at their latest projections on their website and they have changed quite substantially again. Now the UK deaths peak yesterday 13 April and start declining from now on, leading to an eventual total deaths of 23,791. The Swiss deaths per day, which have been plateaued for about a week with some signs they may be starting to decline, are projected to start rising to more than double the current number and then start declining from May 7. This is despite Switzerland implementing social distancing rules earlier than the UK and USA.

The following plot compares the government policy responses to COVID-19 for these four countries using the OxCGRT Stringency Index. The IHME also uses an index based on four policy indicators as a predictive variable in its model, and assumes that all countries reach maximum stringency one week after the last input data point. So I can’t see how this variable would create such large differences in projections.

The OxCGRT Stringency Index combines information of nine indicators of government response (school closures, travel bans, shop closures, etc) into a single index on a scale of 0 to 100 (maximum stringency).

The IHME projection model is based on fitting a curve to the cumulative deaths time series with the form shown in the figure below, which results in a symmetrical curve for daily deaths. This means that the fitted curves will tend to have faster declines for countries with faster rising death rates. I can see no reason to think that is what happens in reality.

The IHME projection model is based on fitting a curve to the cumulative deaths time series with the form shown on the left. The daily deaths are a symmetrical curve with shape d = exp(-α*t*t), where t=0 at the peak of the curve.

I checked the projections on the website today, and indeed for three of the countries, the number of days between the peak and one third of the peak deaths is similar for before and after the peak: Switzerland (before 26 days,  after 24 day),  UK (13,13) and Italy (13, 17). The USA is quite different with 14 days to peak and 29 days to reach 1/3 of peak afterwards. I conclude that despite the sophisticated Bayesian curve fitting used, the model appears to be fundamentally inappropriate for Covid-19 projections.

Posted in Global health trends, Projections | Tagged , , , , | Leave a comment

How does population age structure affect overall case fatality ratios for COVID-19?

The first graph shows coronavirus deaths in Italy up to 26 March 2020 by sex and age. The overall case fatality rate for lab confirmed cases is 11.1% and 70% of deaths are male (closer to 80% below age 80, and it drops to 63% for 80 and above, because fewer men than women survive to their 80s. This is a much higher apparent case fatality rate than other countries and it is often mentioned as an explanatory factor that Italy has the oldest population in Europe. How much do differences in age structure of populations affect overall crude case fatality rates?

A paper published 2 days ago in the Lancet used data from China to estimate infection fatality ratios by age for all COVID-19 infections after adjustments for censoring (recent cases for which there has not been enough time for deaths to occur), demography, differential testing rates by age, and underascertainment. The second graph shows the resulting infection fatality ratios (as fractions -not per cents) by age group, corresponding to an overall infection fatality ratio of 0.65%. This is much lower than the crude confirmed case fatality ratio of 2.3%.

Note: These case fatality rates relate to tested cases if all age groups had the same testing rate as 50-59 year olds (the age group with highest testing rate as proportion of population). The infection fatality ratio refers to total infections including an estimate for non-tested cases that are not diagnosed.

I did a “what if” calculation to see how the overall case fatality rate would change if the population of China (where 17% of people are aged 60 or older) had the age structure of the Italian population (where 30% of people are 60+) or that of various other countries, including Nigeria (with 4.5% aged 60+ typical of African countries).  The third graph shows the resulting overall case fatality rates.  If all else was equal, including age specific infection fatality ratios, having the Italian age distribution  would approximately double the Chinese ratio, and having that of Nigeria would halve it.

Note: these do not represent real infection fatality ratios in countries. They are predicted overall ratios for all infections if age-specific infection and fatality rates are the same as those of mainland China.

Across European countries, the variation of population age structure by itself would cause relatively small variations in overall case fatality rates. Presumably other factors such as smoking levels, cardiovascular disease prevalence, health system responsiveness, and intensive care respirator supply would be more important.  Apparent case fatality rates calculated from COVID-19 deaths divided by lab-confirmed cases are not comparable across countries for a number of reasons. In particular, overall testing rates may vary across countries, with varying proportions of community and hospital samples, and the testing rates may vary in different ways across age groups.

Posted in Global health trends | Tagged , , , , , | Leave a comment

COVID-19 growth rates by country

Nice new site that plots time trends in cases and deaths, total numbers and rates per million population. The time axis is days since 100+ cases/deaths or days since 1 case/death per million population. I’ve attached screenshots of cases/million and deaths/million with Switzerland highlighted. The dotted straight line on the log scale represents a daily growth rate of 1.35 (35% more cases than day before). That corresponds to a doubling time of 2.31 days. Fortunately, most curves are showing some flattening after the first 10 days to around 1.2 or lower. Australia has a curve that corresponds to a daily growth rate of 1.2. That difference is huge. At a daily growth rate of 1.35, the first case becomes 3.2 million after 50 days, whereas at 1.2 it becomes 9,100. Most of the countries that are beyond 15 days from first case/million are showing flattening of growth, and in the case of China its almost completely flat.

The USA is on day 20 since 100 confirmed cases (or day 18 since 1 case/million) and is following the 1.35x line very closely so far. Unlike most other countries this far into the epidemic, it is not yet showing signs of slowing down.  US total confirmed cases will overtake those of Italy and China by tomorrow or day after.

Posted in Global health trends | Tagged , , , , | Leave a comment