Estimates of worldwide mortality from the influenza pandemic of 1918–1919 vary widely, from 15 million to 100 million. In terms of loss of life, India was the focal point of this profound demographic event. In this article, we calculate mortality from the influenza pandemic in India using panel data models and data from the Census of India. The new estimates suggest that for the districts included in the sample, mortality was at most 13.88 million, compared with 17.21 million when calculated using the assumptions of Davis (1951). We conclude that Davis’ influential estimate of mortality from influenza in British India is overstated by at least 24%. Future analyses of the effects of the pandemic on demographic change in India and worldwide will need to account for this significant downward revision.
The influenza pandemic of 1918–1919 has been called the “greatest medical holocaust in history” (Waring 1971:33) and the “mother of all pandemics” (Taubenberger and Morens 2006). Early research indicates that global mortality during the pandemic exceeded 21.5 million (Johnson and Mueller 2002; Jordan 1927). More recent estimates have varied widely, suggesting that between 15 and 100 million people died in the short span of about a year (Johnson and Mueller 2002; Patterson and Pyle 1991). The country that faced the greatest devastation in terms of human mortality from influenza is India (Johnson and Mueller 2002:112, Table 3). Mortality estimates for India are also wide-ranging, from 12 to 20 million (Davis 1951; Marten 1923; Mills 1986; Patterson and Pyle 1991). According to a recent estimate, mortality in India accounted for 38% of global mortality (Johnson and Mueller 2002). These figures are highly problematic because “[d]eath totals for British India . . . . provide the largest single source of uncertainty for Asian and world mortality totals” (Patterson and Pyle 1991:18). Sound estimates of the impact of the pandemic are of critical importance to understanding demographic change in India and the world in the twentieth century. The enormous numbers of lives lost and people sickened by the virus had consequences that lingered through successive generations long past the middle of the twentieth century (Almond 2006; Johnson and Mueller 2002; Mazumder et al. 2010; Mills 1986; Morens et al. 2009).
Because of the unreliability of vital registration records (Government of India 1920, various years; Klein 1973:642) of the time, the most influential study on this subject, that by Davis (1951), used data from the Indian census, collected in times of relative stability, as checks on preliminary mortality estimates based on death registration records. Using an average decadal population growth rate of 8% for the decades 1901–1911 and 1921–1931, he computed a shortfall of 18.5 million (Davis 1951:237, Appendix B), which he attributed to the influenza pandemic. Mills (1986) challenged the assumption of 8% decadal growth before the pandemic and instead used a population growth rate of 6.8% for the period up to 1917 (p. 10). In this article, we demonstrate that both sets of assumptions about growth rates before and after 1918–1919 are inaccurate. In addition to estimating the population growth leading up to 1918 and finding it to be lower than the 6.8% or 8% as posited in these two studies, we estimate the decadal population growth rate after the pandemic and find it to be well in excess of the 8% assumed earlier. This is in line with Klein’s (1990) assertion about the interwar period as “the beginning of a demographic revolution” (p. 33).
The aim of this article, therefore, is to use analytic methods that were not available to earlier demographers to estimate the loss of population from the influenza pandemic of 1918–1919 in British India using census data. In doing so, we generate a revised figure that is significantly lower than that calculated in the aforementioned studies.
The data used in this article come from six decennial censuses of British India, conducted every 10 years from 1891 to 1941 (Census of India various years). The 1891 Census of India is the first one for which undercounting amounts to approximately 1% or less of the total population (Davis 1951:27, Table 7, and 235, Appendix A; Klein 1974:200), and for which data collection methods are consistent with those of later censuses.
For the analysis, we focus on those districts of British India that were under direct rule. This ensures that populations for the districts included in the study were enumerated using a similar set of administrative procedures and comparable institutional structures. The districts belonging to the so-called princely states, which had their own civil service systems, were often not as well equipped to carry out the census as the directly ruled districts. The data we use, therefore, form a panel of 199 districts (see Appendix 1 for a list) for six censuses, totaling 1,194 observations. The population statistics for each census were adjusted to conform to the district boundaries as of the 1941 census. This adjustment is discussed in the 1941 census (Yeatts 1943).
A key contribution of this article is to bring recently developed methods in statistical demography to bear on the census data of British India. These methods improve our understanding of the influenza pandemic by simultaneously estimating population growth rates before and after the pandemic and the drop in population without any prior restrictions on what these should be. By using a panel of 199 districts, we use the full power of the census data for the estimation of the impact of the pandemic.
All models were implemented and evaluated using the PROC MIXED and PROC PANEL modules in the SAS software (SAS 2011a, b). We conducted two tests to evaluate the different model specifications. First, the Hausman test assessed the null hypothesis that the random coefficients are uncorrelated with other regressors, a key assumption of random-coefficients models. We failed to reject the null hypothesis (m statistic, = 0.00, p = 1.00). Second, the Breusch-Pagan test assessed the null hypothesis that there is no random variation across districts. We rejected this null hypothesis ( = 2,182.02, p < .0001), indicating significant variation across districts. The two tests in combination favor the choice of the random-coefficients model over the fixed-coefficients specification. The results presented in Table 1, therefore, focus on the random-coefficients models.
In the first model, labeled the “unrestricted model” (Table 1, column 1), we allowed the growth rate before the influenza pandemic to differ from that after the pandemic (i.e., π3i was included in the model). To compare this unrestricted model with the equivalent one using Davis’ (1951) assumption of equal population growth rates before and after the decade of 1911–1921, we also analyzed a model that does not include a change in slope after the epidemic outbreak (i.e., π3i was excluded). We label this model the “restricted (Davis) model” (Table 1, column 2). Table 1 also contains corresponding estimates of the loss of population that occurred between 1918 and 1919 as a result of the epidemic, and estimates of annual rates of population growth before and after 1918–1919.
A key difference between the unrestricted and restricted (Davis) models is the rates of population growth before and after 1918. The restricted (Davis) model shows a decadal rate of slightly above 8%. This number is consistent with Davis (1951). However, as the coefficient estimates for the unrestricted model show, the assumption of equal growth rates before and after the pandemic is statistically untenable. The null hypothesis , for all i, is rejected at the 1% level of significance, indicating that the population grew at different rates before and after the pandemic. This finding is also consistent with Klein (1990).
To estimate the population loss from 1918 to 1919, we computed district growth trajectories to produce model-based estimates of the population change for each district, and we summed these changes across districts to create an estimate of the total population change. The population change was estimated to be 13.88 million people for the unrestricted model and 17.21 million people for the Davis model, showing that Davis’ assumption yields an overestimate in the number of deaths in 1918–1919 by at least 3.3 million, or 24%. Figure 1 is a graphical representation of the impact of relaxing the assumption of equal growth rates before and after the pandemic on the population estimates.
As a robustness check, we estimated the same model excluding data from the 1891 census to eliminate the potential effects of the unusually devastating famines of the late 1890 s on the models (Davis 1951:28, 39). This only serves to exacerbate the difference between the estimates of population loss for the unrestricted and restricted (Davis) models (15.51 million versus 25.56 million, respectively; columns 3 and 4 of Table 1 and Fig. 2). Finally, Table 1 also contains estimates from the fixed-coefficients models that correspond to the random-coefficients models emphasized in this article. These estimates demonstrate the robustness of the results across model specifications.
Discussion and Conclusion
The estimates from our models suggest that influenza-related mortality in the districts of British India was lower than 14 million. This number is significantly lower than the estimate of over 17 million based on the Davis model for the same set of districts. The reason for this difference lies in the assumption in Davis (1951) that the rate of population growth before and after the influenza pandemic was uniformly 8% per decade, an assumption that is statistically indefensible. Figure 1 compares the random-coefficients models with and without Davis’ restriction on population growth rates, showing the impact of relaxing the restriction on the estimate of population loss attributable to the pandemic. The unrestricted model shows a drop in population that is visibly smaller than that in the unrestricted model.
The findings of this article raise a number of interesting questions, key among them being that of a possible connection between the influenza pandemic and the subsequent population growth spurt. According to Klein (1988, 1990), this spurt was the consequence of the gradual development of immunity to diseases such as malaria and plague. However, it is entirely plausible that, as in the case of Iran, the influenza pandemic also played a role by causing a one-time spike in mortality among those who suffered from chronic malaria and other diseases (Afkhami 2003), leaving a healthier but diminished population in its wake. A second interesting and related question concerns the disproportionate mortality that influenza caused among people of childbearing age. This would have had a depressing effect on post-pandemic population growth through reduced fertility; hence, the estimates we present are more accurately interpreted as “lost population” or an upper bound on mortality. Taken together, the two aforementioned phenomena suggest that any future study of the impact of the influenza pandemic on India’s population will need to disentangle multiple, possibly countervailing effects.
In sum, using methods of statistical demographic analysis that were not available to earlier demographers who studied mortality from the 1918 influenza pandemic in India, we find evidence that suggests that the estimate that has stood for the past 60 years was overstated by at least 24%. This finding is significant because it will necessitate a reappraisal of the impact of one of the most devastating epidemiologic events the world has seen in a country that was the focal point of the event.
This research was made possible by Grant No. 1R21DA025917-01A1 from the National Institute on Drug Abuse (NIDA) of the National Institutes of Health. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of NIDA. The authors would also like to thank participants of the XXXIII Annual Conference of the Indian Association for the Study of Population (IASP) held in Lucknow, India, in 2011, for their input.
Appendix 1: List of Districts Used in the Analysis (colonial spellings)
Appendix 2: Details of Random-Coefficients Models
The coefficients are modeled as varying randomly across districts, and the estimates reported in Table 1 are the mean coefficients across all districts. Details of these models are provided in SAS (2011b).