As a former WHO staff member, who played a key role in the production and clearance of WHO health statistics over the last 15 years, and a long-time collaborator with the Global Burden of Disease (GBD) enterprise and with the Institute of Health Metrics and Evaluation (IHME), I was interested to see two papers that were recently published discuss the roles of IHME and WHO in production of global health statistics. The IHME is located within the University of Washington in Seattle, USA and is primarily funded by the Bill and Melinda Gates Foundation.
- Manjari Mahajan. The IHME in the Shifting Landscape of Global Health Metrics. Global Policy 28 January 2019. https://doi.org/10.1111/1758-5899.12605 Available in the Wiley Online Library at https://onlinelibrary.wiley.com/doi/10.1111/1758-5899.12605#.XIjcbywdSzk.email
- Marless Tichenor and Devi Sridhar. Metric partnerships: global burden of disease estimates within the World Bank, the World Health Organisation and the Institute for Health Metrics and Evaluation. Wellcome Open Research, 18 February 2019. Available at https://wellcomeopenresearch.org/articles/4-35/v1
The first of these articles, by Mahajan, takes a reasonably sceptical view of the claims of the power of “big data” and modelling, but mistakenly characterises WHO global statistics as “traditionally relying on national statistics”. In fact, WHO global health statistics also makes use of modelling and multiple sources of data, including national statistics as well as surveys and epidemiological studies to address issues of biased and missing data. However, the paper does give a reasonable overview of some of the issues arising from private sector production of global statistics outside the UN system, and with limited ability of outside experts or national-level users to assess and understand the derivation of the statistics.
The second paper is more problematic as the authors state that they primarily used notes taken at three IHME events. There were apparently no inputs from WHO staff involved in the interaction with IHME and GBD. This has resulted in some inaccuracies in the paper, some of which I address in the comments below. It is disappointing that an article examining the interaction between IHME and WHO/UN did not make the effort to include inputs from WHO and UN people who are closely involved in global estimates as well.
Despite what the editor of the Lancet, Richard Horton, is quoted as saying in the paper, there was no so-called “cold war” between WHO and IHME before 2012. Ties Boerma and I were members of the core scientific group for the first GBD2010. This was the central scientific decision-making group set up in 2007, with 15 members of whom 9 were from outside Chris Murray’s research group. I and many other WHO staff contributed to the work of the GBD over the next five years, though Ties and I became increasingly concerned that the external core group were being excluded from access to the data and analyses. Around the period 2011 to 2012, six of the external core group members withdrew from the core group due to this and related issues. Apart from myself and Ties Boerma from WHO, this included Bob Black and Neff Walker from Johns Hopkins University, and Ken Hill and Dean Jamison from Harvard University. From WHO’s point of view, there was no cold war (1), and various WHO staff continued to provide data and contribute to GBD analyses, and WHO continued to make use of analyses derived from the IHME GBD results. However, because we could not gain access to data and analyses, WHO staff were unable to agree to be authors on GBD papers and WHO as an institution was unable to endorse the results. Perhaps more importantly, WHO was also unable to examine areas where GBD results differed from WHO and other UN statistics in order to reconcile differences and potentially improve global health statistics.
On page 4, the paper claims that the GBD 1990 data were reworked in various ways and used for the next 25 years, until IHME undertook the GBD2010. This is quite incorrect. During the period from 1999 through to 2008, the majority of mortality and morbidity estimates (for almost all diseases of public health importance) were revised with new inputs. This included development of new model life tables at WHO, a big growth in disease-specific modelling both at WHO and by academic collaborators, and the establishment of various UN interagency groups, particularly for MDG targeted diseases. I have reviewed WHO work on GBD during the period 1999-2008 and estimate that morbidity and disability estimates were revised using new data for around 90% of the disease and injury causes (including all those of public health importance) and mortality estimates were revised for 100% of causes. Disability weights were the main area where a comprehensive update was not carried out, though quite a few were revised using a European study (2), the World Health Surveys (3) and other sources of population information on health states.
The paper is incorrect in saying that the difference in malaria mortality estimates is because the IHME uses MAP parasite prevalence. WHO also uses the same parasite prevalence data as a major input to its estimates of malaria mortality (4). The big difference arises from IHME interpretation of verbal autopsy data in a way which maps much more “fever of unknown cause” to malaria for adults than WHO does.
The paper notes the difference in the IHME estimated trend for maternal mortality compared to that estimated by WHO, although there is little difference in the latest year estimates. Both IHME and WHO methods estimate the proportion of all female deaths in the reproductive period that are maternal deaths, and these estimates are reasonably similar. The trend difference in numbers of deaths arises because the IHME life tables have flatter adult female mortality trends than the UN life tables (5). The IHME life tables place greater credence on sibling history data for periods long before surveys and have flatter adult mortality trends in parts of Africa. This results in flatter maternal mortality trends.
In the discussion, the authors question the value of competition in achieving global health goals and link this to the emphasis in the GBD and indeed in all the UN global health statistics on the comparability of statistics across locations and times. While it is arguable whether the whole global targets setting process spurs healthy competition between countries, the concern about comparability in statistics is essentially a concern to have meaningful statistics. And any statistic is only meaningful and interpretable through comparison. For example, an average death rate of 8,945 per 100,000 population is uninterpretable to almost everyone, unless put in a comparative context.
Measurement only has meaning if a standard scale is used (or at least fixed scales that can be translated to each other). Since bias varies over time as well as over space, you could argue that lack of concern for comparability would be like tracking your weight with a scale whose zero is varying in an unknown way over time.
The authors do raise relevant and important issues around the potential creation of a global health data monopoly, the concentration of analytic skills in a first-world institution, and the broader governance structures and accountability for statistics. Many developing countries have little interest in the outputs of a US academic group, but are very concerned about WHO and UN statistics. UN agencies have a mandate to produce statistics and some responsibility to consult with countries. IHME has tried to spin this as “political interference” which has largely not been the case, at least in my experience carrying out a central statistical clearance role in WHO and in working with the various UN interagency groups. The downside of IHME “independence” is that there have been quite drastic changes in methods and estimates from revision to revision for some causes and topics with little responsiveness in some cases to those who pointed out problems before publication. A recent example includes drug overdose deaths for USA, where GBD2016 excluded prescription opioid deaths (without documenting this) for unknown reasons, and GBD2017 included them, resulting in a more than doubling of drug overdose deaths. The sudden introduction of very different birth denominators in GBD2016 similarly knocked around half a million child deaths off the global total compared to UN (which previously was almost identical).
IHME is now estimating its own population and birth numbers. So the mortality and other outputs are inhabiting a parallel demographic universe to those of the UN agencies. This makes the issues of understanding difference even more complex and opaque. And I suspect will unfortunately limit the ability of UN agencies to make direct use of IHME results.
This is a great pity, as there is a lot to be gained by collaborating more closely and working together to improve both the primary data, the analyses and the statistical assessments that are increasingly important for guiding and tracking global health progress.
- Boerma T, Mathers C. The World Health Organization and global health estimates: improving collaboration and capacity. BMC Medicine.2015, 13:50. doi: 10.1186/s12916-015-0286-7. Available online at http://www.biomedcentral.com/1741-7015/13/50
- Stouthard M , Essink-Bot M, Bonsel G, Barendregt J, Kramers P. Disability weights for diseases in the Netherlands. Rotterdam, Department of Public Health, Erasmus University, 1997.
- Ustun TB, Chatterji S, Mechbal A, Murray C.J.L, WHS Collaborating Groups. The world health surveys. In: Murray CJL, Evans D, eds. Health systems performance assessment: debates, methods and empiricism. Geneva, World Health Organization, 2003.
- World Health Organization (2018). World Malaria Report 2018.
- Gerland P, Masquelier B, Helleringer S, Hogan D, Mathers CD. Maternal mortality estimates.
Colin, I very much enjoyed reading this interesting and informative post and wanted to echo some of your sentiments with some thoughts from my own experiences. I am a proponent of modeled estimates but feel that it is key that they are transparent and come with accurate measures of uncertainty. It is without these factors that modelers give those that pushback against their use a valid platform.
With regard to GBD, the difficulty of explaining the pipeline for producing the estimates is a huge hurdle. In the GBD most of the details of their analyses are (understandably) pushed to the supplemental materials, which can be over 1,000 pages long (and are not peer reviewed). As someone who has combed through the supplemental materials, I can say that they are commonly difficult (or impossible) to fully understand. As you state, “there is a lot to be gained by collaborating more closely and working together”, but to do this the methods cannot be a black box. There is a need for clear documentation and transparency of methods used. This allows others to understand (and critique) the assumptions and spurs collaboration. If modelers continue to provide poor documentation and few opportunities for education about the process (via webinars or tutorials) the divide between modelers and non-modelers will continue or worsen.
Regarding the comparability of the estimates, another key issue is having appropriate measures of uncertainty. As you point out, having an average death rate of 8,945 per 100,000 population is only interpretable in a comparative context. However, it’s also important to understand that the estimate may have an uncertainty interval of 8,900 to 9,000 or from 2,500 to 20,000. In the former we have a good understanding of the burden of the condition, while in the latter it’s clear that more data are needed. Just like it’s important to validate the predictions developed by modeled estimates, it’s important to validate the properties of uncertainty intervals through cross-validation or simulation. I’ve found that researchers with limited analytic backgrounds can understand and appreciate the importance of such exercises. This is an area that the current GBD has not addressed.
Some collaborators and I discussed some of these issues in more detail in a recent paper:
McLain, A.C., E.A. Frongillo, S.Y. Hess, E. Piwoz (2019). Comparison of methods used to estimate the global burden of disease related to undernutrition and suboptimal breastfeeding. Advances in Nutrition 10:3 380–390.
Alexander, thanks for a very thoughtful comment. I’ll certainly take a look at your recent paper. As well as the black box issue, there is also the issue of the computing power needed to run the very CPU-time intensive ensemble modelling which is beyond the technical and financial ability of most academic or national government groups to replicate. In the same way as health economists look at cost-benefit, perhaps we should be examining the statistical gain for added complexity of models. I suspect it may be quite marginal, particularly if the “understandability” and “replicability” of simpler models is included in the assessment of benefit.