Abstract
Context: This study examines whether autocratic governments are more likely than democratic governments to manipulate health data. The COVID-19 pandemic presents a unique opportunity for examining this question because of its global impact.
Methods: Three distinct indicators of COVID-19 data manipulation were constructed for nearly all sovereign states. Each indicator was then regressed on democracy and controls for unintended misreporting. A machine learning approach was then used to determine whether any of the specific features of democracy are more predictive of manipulation.
Findings: Democracy was found to be negatively associated with all three measures of manipulation, even after running a battery of robustness checks. Absence of opposition party autonomy and free and fair elections were found to be the most important predictors of deliberate undercounting.
Conclusions: The manipulation of data in autocracies denies citizens the opportunity to protect themselves against health risks, hinders the ability of international organizations and donors to identify effective policies, and makes it difficult for scholars to assess the impact of political institutions on population health. These findings suggest that health advocates and scholars should use alternative methods to estimate health outcomes in countries where opposition parties lack autonomy or must participate in uncompetitive elections.
Are autocratic governments more likely than democratic governments to manipulate policy-relevant data? According to one view, autocratic and democratic leaders both have an incentive to misreport data when the truth may reveal incompetence, leading to protest or electoral losses. However, data manipulation is harder to achieve in a democracy because the government is subject to greater scrutiny from opposition parties, independent media, and civil society organizations (Carlitz and McLellan 2021; Hollyer, Rosendorff, and Vreeland 2011). According to another view, autocratic governments are primarily concerned about the interpretation of information, rather than access to information (Rozenas and Stukal 2019). Autocratic leaders may calculate that bad news will not lead to collective action if they can persuade citizens that the government was not responsible. Autocrats can, for example, use their control over media and the internet to convince citizens that a bad outcome is the result of external forces (e.g., global macroeconomic factors) or natural phenomena (e.g., a pandemic) that are beyond the government’s control.
It is not immediately obvious, therefore, that autocratic leaders are more likely to provide inaccurate data than their democratic counterparts. In this article I argue that autocrats retain an incentive to manipulate data even as they seek to shift the blame to external forces or natural phenomena. This is because they cannot be sure whether their attempts to persuade citizens that they are not responsible will succeed. They have a greater capacity to shape perceptions than democratic leaders do, but there is a risk that a critical number of citizens will remain unconvinced, leading to criticism and protest. Democratic leaders, by contrast, are less able to successfully deploy either strategy—that is, to manipulate data or to shift the blame—because they lack control over traditional and online media, opposing political parties, and civil society groups.
The devastating pandemic produced by the spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) provides a unique opportunity to examine whether autocratic governments are more susceptible to data manipulation than their democratic counterparts. This is because the sheer scale of the shock, and therefore the potential threat posed to each government’s legitimacy in the eyes of their citizens, magnifies the incentive for political leaders to hide the truth. Furthermore, the pandemic affected nearly every territory in the world, so we can examine the phenomenon of data manipulation for a large sample of countries. The empirical analysis in this study encompasses 172 sovereign states that represent more than 99% of the world’s population.
Previous studies on the impact of regime type on data manipulation have mostly focused on the reporting of data in autocratic contexts (e.g., Carlitz and McLellan 2021; Chen et al. 2019; Kofanov et al. 2023; Lamberova and Sonin 2023; Wallace 2022, chap. 6). They do not, therefore, provide a systematic comparison between data manipulation in autocratic and democratic countries. A study by Hollyer and colleagues (2011) represents a partial exception, but their focus is on the withholding of policy-relevant data, rather than the misreporting of such data. As Carlitz and McLellen (2021) note, many autocracies are now more willing to provide development data because of the expectations of foreign aid donors as well as the targets specified in the UN’s Millennium Development Goals and Sustainable Development Goals. The question is whether they accurately report such data.
Magee and Doces (2015) and Martínez (2022) investigate the impact of regime type on the manipulation of economic data for a large number of democratic and autocratic countries. Using nighttime lights as a proxy for gross domestic product (GDP), both studies find that autocratic regimes tend to overstate yearly GDP growth. In the current study I shift the focus to the reporting of population health statistics. Five recent studies examine the relationship between regime type and the manipulation of COVID-19 data using either tests for statistical irregularities (Adiguzel, Cansunar, and Corekcioglu 2020; Kapoor et al. 2020; Kilani 2021) or the discrepancy between reported and excess mortality (Knutsen and Kolvani 2022; Neumayer and Plümper 2022), for a large sample of countries.1 All found evidence that autocracies doctored COVID-19 data more than democracies did.
In this study, I build on those earlier results by using three distinct measures of manipulation: ratio of excess deaths to reported deaths, lack of compliance with the expected distribution of digits in reported numbers, and lack of randomness in the sequence of daily counts across time. Moreover, for the sake of further robustness, I use four different estimates of excess mortality and five different measures of democracy. This multimeasurement approach helps reduce the likelihood that the overall conclusions of this study are an artifact of a particular estimation model. Finally, I move beyond existing studies by examining whether any of the components of democratic rule—free and fair elections, freedom of civil and political association, freedom of expression, suffrage, and elected officials—have an outsized influence on the level of data manipulation. This is important because it promises to provide a more precise indication of the regime characteristics that lead to the deliberate misreporting of official statistics. This will, in turn, help researchers and international donors pinpoint countries that are prone to fabricating policy-relevant data.
In what follows, I first elaborate on the theory behind the claim that autocratic governments manipulate policy-relevant statistics more than democratic governments do. I then present preliminary evidence supporting the claim that autocracies manipulate data more than democracies. Then I describe the regression estimation models, results, and 12 robustness checks. In the subsequent section I discuss how the results align with the existing literature on the link between regime survival and information control. In the penultimate section I present the limitations of this study along with avenues for future research. I conclude by highlighting the challenges that data manipulation creates for citizens, researchers, and international organizations.
Theory
Autocratic leaders have an incentive not to share accurate information if it might lead to their removal via mass mobilization. That is, full and accurate disclosure may enable citizens first to become aware of any failure in government policy, and second to realize that their fellow citizens are also aware of that failure. As a result, each citizen has enough information to judge whether protests will generate sufficient participation to remove the government (Hollyer, Rosendorff, and Vreeland 2015). Incumbent democratic leaders also have an incentive to tamper with published data when the actual data might threaten their performance at the ballot box. However, it is harder for them to hide poor performance in this way because they are subject to greater scrutiny from opposition parties, civil society, and media. There is a greater likelihood that manipulation by a democratic government will be detected, simultaneously drawing additional attention to the negative news it was attempting to hide and tainting its credibility in the eyes of voters.
Autocratic leaders may also attempt to use their control over traditional and online media to persuade citizens that they are not responsible for bad news (Rozenas and Stukal 2019). They may, for example, promote the view that the COVID-19 pandemic is an unstoppable natural phenomenon, or that its spread is due to the containment failures of other countries. Indeed, this may be the preferred approach in those cases where the disclosure of full and accurate data helps to combat the problem. Hiding the severity of an epidemic, for example, may lure citizens into a false sense of security, frustrating attempts to encourage life-saving changes in behavior (e.g., physical distancing, wearing masks, and vaccination).
However, autocratic leaders cannot solely rely on that strategy because of the risk that it will fail. They have a greater capacity to shape the perceptions of citizens than democratic leaders do, but they cannot rule out the possibility that a critical number of citizens will remain unpersuaded, thereby exposing them to criticism and protest. Ironically, the lack of communication openness that autocrats need to prevent collective action also makes it harder for them to gauge the proportion of citizens who do not believe their spin (Schedler 2013: 37–39). Fear of reprisal means the outward behavior or expressed opinion of citizens in an autocratic context may not track whether they actually believe the government’s narrative (Jiang and Yang 2016; Wedeen 2015). Under these conditions of uncertainty, autocrats are more likely to conclude that a combination of data manipulation and shifting the blame (i.e., hiding the true extent of the bad news and spinning that news) represents the optimal way to prevent collective action.
Generally, the exact balance between data manipulation and spin will depend on context. Autocrats may reduce the level of manipulation when citizens can reliably infer the real data values (e.g., by comparing official inflation statistics with supermarket prices) or when data manipulation starts to threaten the implementation of policy.2 Excessive underreporting of COVID-19 infections and deaths, for example, may frustrate attempts to encourage physical distancing, wearing masks, and vaccination. In such cases autocrats are more reliant on persuading citizens that they are not responsible for any misfortunes befalling society. By the same token, autocrats may increase the level of manipulation when citizens find it harder to verify data (e.g., deaths from a pandemic, a conflict, or a famine) or it is less likely to hinder the implementation of policy (e.g., famine response). In such cases they are less reliant on persuading citizens that they are not responsible.
Another strategy available to political leaders is to simply not publish any data that may reveal incompetence (Merridale 1996). However, citizens can detect the complete withholding of data about a politically sensitive topic, whereas it is harder for them to detect the misreporting of data. In this sense the withholding of data is akin to the overt censorship of content in that both are observable. As with overt censorship, the absence of data on a salient topic may rouse citizens’ suspicions, thereby encouraging them to expend more effort to learn about the topic (e.g., circumventing the government’s control over the internet by using virtual private networks to access blocked content) (Roberts 2020, chap. 4). Moreover, those efforts may lead them to uncover information about other politically sensitive topics (Hobbs and Roberts 2018). Because the withholding of data may backfire, the fabrication of published data typically represents a less risky way for autocratic leaders to forestall collective action.
The general implication of the above discussion is that governments will manipulate official statistics to conceal policy failures or embellish their achievements, but only if their attempt to deceive will go undetected by a critical number of citizens (and the deception does not threaten the implementation of policy). The risk of exposure is greater in democracies because opposition parties, media, and civil society groups can collect and publish their own data and publicly question the veracity of the government’s data (World Bank 2021a: 32–33, 69–70). Thus, we would expect data manipulation to be negatively associated with those features of democratic rule that protect the independence of opposition parties and their ability to win elections (e.g., absence of electoral fraud), media freedom (e.g., absence of censorship), and freedom of civil association (e.g., absence of threats to members of nongovernmental organizations such as medical associations and aid groups). In keeping with this expectation, I present evidence below of a negative association between data manipulation and the overall level of democracy as well as each of these key attributes of democratic rule.
Preliminary Evidence
During the COVID-19 pandemic, governments seeking to hide the true death toll had two options: either exclude some deaths from the publicly disclosed data, or attribute those fatalities to alternative causes. Opting for the former poses a risk for these governments, because local health officials may notice the mismatch between the total deaths they have recorded and the figures published by the government. In other words, concealing the total death count would require the censoring or co-optation of numerous local health workers, along with other officials engaged in formal postdeath procedures, such as burial or cremation, inheritance, taxation, and social security (Lamberova and Sonin 2023: 2–3, 16; Kofanov et al. 2023: 843). In the case of Iran, for example, the government’s attempt to conceal the pandemic’s true death toll was exposed via a data leak initiated by whistleblowers within the health care system (BBC 2020). Similarly, the release of internal health ministry data in El Salvador revealed that the government failed to publicly disclose two thirds of the country’s pandemic-related deaths (Taylor 2023).
The second option is less likely to be detected by citizens and public sector workers, or at least it will be harder for them to prove. This suggests that it will typically be the preferred method of deception. Indeed, there is strong evidence that the governments of Belarus, China, Kazakhstan, Nicaragua, Russia, and Tajikistan encouraged officials to assign alternative causes of death to those who died with COVID-19 symptoms. This involved attributing deaths to causes with similar symptoms (i.e., other respiratory illnesses such as regular influenza or pneumonia) or to other causes present at the time of death (i.e., comorbidities such as cancer, cardiovascular disease, etc.) (HRW n.d.; Ibbotson 2020: 529–31; Ivanova and Tsvetkova 2020; Kobak 2021: 16; Lamberova and Sonin 2023: 9; McMorrow and Liu 2023; SATIO 2020: 4).3 For example, following the easing of strict quarantine and testing measures in China in late 2022, the government refined the criteria for classifying COVID-19 deaths. Specifically, the definition was narrowed down to encompass only individuals who succumbed to respiratory failure or pneumonia after testing positive for the virus. As a result, deaths of individuals who were untested or that were attributed to nonrespiratory illnesses were not recorded as COVID-19 deaths (McMorrow, Liu, and Yu 2023). Subsequent cremation data inadvertently leaked by a regional government in early 2023 indicated a COVID-19 death toll far higher than that officially reported by the central government (Dyer 2023).
It is crucial to underscore that identifying case-study evidence of data manipulation in autocratic countries is inherently difficult because of restrictions on communication openness. Consequently, the eight countries mentioned above likely represent only a subset of the autocracies engaged in deliberate underreporting during the pandemic. Conversely, identifying instances of manipulation in democratic countries should be more straightforward, as the media, opposition, and civil society groups can challenge official statistics without fear of reprisal. Despite this fact, what we find is a dearth of clear-cut examples in democratic states. Nonetheless, it is important to recognize that such observations offer, at best, anecdotal support for the assertion that autocracies are more prone to manipulating data than democracies.
Figure 1 shows the predicted weekly undercount ratio for the first two years of the pandemic. As we can see, the ratio reached a higher level for the two autocratic regime types. In July 2021, for example, electoral autocracies typically reported 14 deaths for every 100 actual deaths, while electoral democracies typically reported 33 deaths. This supports the anecdotal evidence that autocratic regimes deliberately underreported COVID-19 mortality data more than democratic regimes did.
However, this variation may be explained by other factors that are correlated with regime type (e.g., data-collecting capacity). In other words, the misreporting of data may occur even when a government has no intention to mislead. In the case of Venezuela, for example, it is difficult to determine how much the high rate of underreporting during the pandemic was the result of deliberate attempts by the Maduro government to hide the true extent of the death toll and how much was due to the parlous state of the health care system (Taylor 2021). This implies that any analysis of data manipulation must take into account those factors independent of the incentives of political leaders that may produce mismeasurement. Thus, in the following empirical analysis I control for the overall capacity of the health care system, the capacity of the government to gather data, and the vulnerability of the population to pandemic deaths that result from epidemiological and geographic features.
Methods
where democracy is the main independent variable of interest, health system capacity is a set of variables capturing the ability of each country to respond to the pandemic, information-gathering capacity is a variable capturing the government’s ability to collect and process policy-relevant data, pandemic vulnerability is a set of variables capturing the extent to which each population is susceptible to COVID-19 mortality, Z is the set of additional control variables, and i is the set of countries.
I use three different indicators to estimate manipulation in each country: the undercount ratio and two indicators that gauge the extent to which daily reported cases and deaths depart from expected statistical patterns (Benford noncompliance and underdispersion). The undercount ratio is cumulative excess deaths divided by cumulative reported deaths (logged) as of December 31, 2021. The data source for reported deaths is the Johns Hopkins Coronavirus Resource Center (Dong, Du, and Gardner 2020). The data source for excess deaths is the COVID-19 projections produced by the Institute for Health Metrics and Estimations (IHME) (Wang et al. 2022). IHME used a three-step method to estimate excess deaths. First, in those locations with sufficient total mortality data, excess deaths were estimated based on a weekly or monthly comparison between mortality due to all causes and what would have been expected based on past trends and seasonality. Second, a statistical model was then constructed using covariates to predict the excess deaths in those locations. Third, the predictive model was then used to estimate excess deaths in those locations without sufficient mortality data. The procedure produced excess mortality estimates for 185 countries, encompassing virtually all of the world’s population. In the robustness section I also test whether the baseline results hold when I use three alternative estimates of excess mortality produced by the World Health Organization (WHO 2022), The Economist (2022), and Karlinksy and Kobak (2021).
Benford noncompliance is measured in terms of the Kolmogorov-Smirnov test statistic (logged), which allows us to estimate the extent to which daily reported cases and deaths (for the period January 22, 2020, to December 31, 2021) deviate from Benford’s Law. According to that law, first digits in nonmanipulated data should accord with a distribution where the number 1 is the most likely to occur and the remaining digits are increasingly less likely to occur (Tam Cho and Gaines 2007). I provide a complete description of the steps used to measure Benford noncompliance in the online appendix. I was able to construct this measure of manipulation for 183 countries and territories.
Underdispersion is measured using the index constructed by Dimitry Kobak (2022) (logged). That index gauges the extent to which reported cases and deaths (for the period March 3, 2020, to January 30, 2022) deviate from the expected variation in reported cases and deaths across time. Reported COVID-19 data should fluctuate randomly across days of the week because of the nature of the data-generating process. Thus, daily counts that vary smoothly across time suggest the presence of tampering. This measure of manipulation is available for 210 countries and territories.
The indicator for democracy is the Electoral Democracy Index from the Varieties of Democracy (V-Dem) project (Coppedge et al. 2023). That index is scaled to a continuous interval ranging from 0 (lowest level of democracy) to 1 (highest level of democracy). It combines five components of democratic rule that are intended to ensure that political leaders are sufficiently responsive to citizens: suffrage, elected officials, free and fair elections, freedom of civil and political association, and freedom of expression and access to alternative information. Those components also implicitly capture the extent to which the government’s data reporting is subject to scrutiny by media, opposition political parties, and civil society groups. In a subsequent section, I examine the ingredients of democracy at a more granular level to determine the relative importance of those three institutions for data manipulation. The V-Dem indicators are available for 172 sovereign states. Arguably, V-Dem’s indicators are methodologically superior to the other democracy indices that are currently available (Boese 2019; Coppedge et al. 2017). Nevertheless, in the robustness section I test whether the baseline results hold when four alternative indicators of democracy are used.
Health system capacity, information-gathering capacity, and pandemic vulnerability are intended to control for underreporting that is not deliberate. I use two variables to capture health system capacity: GDP per capita (logged) (base 2010 international dollars) and health service capacity and access (GBDCN 2020a; WHO n.d.). The latter index measures the density of hospital beds and health care professionals as well as the level of preparedness for public health events of international concern (WHO 2019). I constructed an indicator of information-gathering capacity based on the latent factor analysis of three input variables: Hanson and Sigman’s (2021) measure of census frequency, the World Bank’s (2021b) Statistical Capacity indicator, and Brambor and colleagues’ (2020) information capacity index. I describe those input variables and the method for identifying the underlying factor in more detail in the online appendix. To capture greater pandemic vulnerability due to factors such as seasonality, population density, and the presence of comorbidities, I use three variables: prevalence of lower respiratory diseases, prevalence of noncommunicable diseases, and median age (GBDCN 2020b; UNDESA 2022).
To address the possibility of region-specific factors, I also include World Bank regions as additional control variables in all the models. All the continuous independent variables are for the year 2019 (except for median age and information-gathering capacity, which are for 2015), given the likelihood that the pandemic affected political institutions, economic growth, the delivery of routine health care, and the spread of other respiratory pathogens. Variable descriptions, summary statistics, and correlation matrices are reported in the online appendix.
Estimation Results
Figure 2 presents the results of the regression models that take into account all the control variables. As we can see, the democracy indicator is negatively associated with all three indicators of manipulation: the ratio of reported to excess deaths (fig. 2a), the extent to which daily reported cases and deaths fail to comply with Benford’s law (fig. 2b), and the extent to which the sequence of daily reported cases and deaths is unexpectedly smooth (fig. 2c).
In terms of undercounting, a 10% increase in the level of democracy (i.e., a 0.1 increase in the Electoral Democracy Index’s 0–1 scale, roughly the difference between Belarus and Serbia in 2019) is associated with a reduction in the ratio of 9.45% (95% confidence interval [CI] = 3.99, 14.61). Put differently, excess mortality is predicted to be nearly 10 times greater than reported mortality (9.83; 95% CI = 7.35, 12.31) in countries with a democracy score of 0.25 in 2019 (e.g., Kazakhstan). By contrast, excess mortality is predicted to be nearly six times greater than reported mortality (5.98; 95% CI = 5.03, 6.93) in countries with a democracy score of 0.75 in that year (e.g., Romania) (fig. 2a). The average predicted ratio for all countries was 7.46 (95% CI = 6.33, 8.61).
In terms of the index for Benford noncompliance, a 10% increase in the level of democracy is associated with a 6.46% (95% CI = 1.91, 10.81) reduction in the index. Put differently, the predicted statistics for countries with democracy scores of 0.25 and 0.75 in 2019 are 0.081 (95% CI = 0.07, 0.093) and 0.058 (95% CI = 0.051, 0.066), respectively (fig. 2b). The average predicted noncompliance for all countries was 0.068 (95% CI = 0.063, 0.073).
In terms of the index for underdispersion, a 10% increase in the level of democracy is associated with a 11.02% (95% CI = 5.15, 16.53) reduction in the index. Put differently, the predicted indexes for countries with democracy scores of 0.25 and 0.75 in 2019 are 1.44 (95% CI = 1.09, 1.78) and 0.84 (95% CI = 0.74, 0.94), respectively (fig. 2c). The average predicted underdispersion for all countries was 1.07 (95% CI = 0.95, 1.17).
Regime Characteristics and Data Manipulation
I now turn my attention from the general level of democracy to the specific features of democratic rule. Figure 3 presents the relative importance of the components used to construct V-Dem’s Electoral Democracy Index. A random forest regression was used to determine the relative predictive strength of each component for the undercount ratio. Random forest analysis employs an ensemble learning approach to aggregate the results of many bootstrapped regressions. It provides more accurate predictions than standard regressions, especially when there is a large number of independent variables. Looking at the results in terms of the 24 subcomponents of democracy (y-axis variables), we can see that independence of opposition parties from the ruling regime is the most important predictor. Looking at the results in terms of the main components of democracy (see legend, fig. 3), we can see that free and fair elections is the most important predictor.
Overall this suggests that data manipulation during the pandemic was less prominent in those countries where the opposition were in a position to challenge the data presented by the government. Although this finding is inductively derived, it accords with this study’s basic theoretical claim. That is, political leaders will manipulate data to downplay bad news, to the extent that their subterfuge remains undetected by a critical number of citizens (and the deception does not fundamentally threaten the government’s ability to implement policy). Genuinely independent and competitive opposition parties threaten the government’s ability to manipulate data undetected because they can expose the incumbent’s fraud while in opposition or after being elected into office. The same expectation applies to independent media and civil society organizations, but the random forest results suggest that the risk of exposure by the opposition represents the main constraint on data manipulation. Nevertheless, it is worth noting that government efforts to censor media constitute a comparatively important subcomponent. The freedom of civil association subcomponents are comparatively unimportant (with the partial exception of government attempts to repress civil society organizations).
This finding is broadly consistent with results of a recent study that examined manipulation of COVID-19 data at the regional level in Russia (Lamberova and Sonin 2023). The authors found that the level of manipulation (measured in terms of the discrepancy between reported and excess mortality) was higher when the share of seats in the regional parliament held by members of Putin’s United Russia is higher. They also found that manipulation was negatively associated with an index that captures the competitiveness and autonomy of opposition parties at the regional level. On the other hand, they found no evidence of an association between manipulation and the share of the regional population employed in nongovernmental organizations. This also accords with the random forest regression results.
Robustness Checks
I report five types of robustness check in the online appendix (see section 8). First, I examine whether the results are affected when a dummy variable for island states is added to the set of covariates. It may be argued that island democracies such as Taiwan, Iceland, and New Zealand were blessed with a natural advantage when it came to slowing the spread of the virus. As a result, perhaps they were better positioned to keep an accurate count of the number of COVID-19 deaths (column 2). Moreover, five of those island states (Australia, Iceland, New Zealand, Singapore, and Taiwan) registered negative excess deaths during the first two years of the pandemic, in part because their public health response reduced the spread of other respiratory pathogens. It is not clear whether those five countries are biasing the results in some way. Nevertheless, the addition of the dummy variable for island states provides one way to control for that possibility. I also test whether the baseline result holds when those five countries are dropped from the sample (column 3).
Second, I include dummy variables for contemporaneous disasters and conflicts (column 4). Other shocks that occurred during the pandemic may have increased excess deaths even though those deaths were not due to COVID-19 and therefore do not indicate undercounting. To identify disasters that took place during 2020 and 2021, I used the Emergency Events Database compiled by the Center for Research on the Epidemiology of Disasters (Guha-Sapir n.d.). To identify armed conflicts that took place during those two years I used the Battle-Related Deaths Dataset (version 22.1) constructed by the Uppsala Conflict Data Program (Pettersson et al. 2021).
Third, I examine whether the results hold when three indicators for administrative capacity are included among the covariates (column 5): tax revenue as a percentage of GDP (Heritage Foundation n.d.) (logged), rigorous and impartial public administration (Coppedge et al. 2023), and mean war mortality during the 10 years before the pandemic (GBDCN 2020a) (logged). It may be argued that a negative association between democracy and underreporting is observed because democracies are typically characterized by a greater capacity to implement policy. That is, administrative capacity, rather than democracy itself, may explain the association with underreporting (Stasavage 2020). Having said that, some scholars contend that the capacity to gather and process information represents a suitable proxy for administrative capacity because it is a precondition for the collection of taxes as well as the successful implementation of law and policy (Brambor et al. 2020; D’Arcy and Nistotskaya 2017; Lee and Zhang 2017). If that is correct, then the baseline model already controls for administrative capacity. Nevertheless, I include these three additional controls to address the possibility that information-gathering capacity by itself does not fully capture overall administrative capacity.
Fourth, I examine whether the results hold when four alternative indicators of democracy are used (columns 6–9): the dichotomous Democracy-Dictatorship index (Bjørnskov and Rode 2020), the Lexical Index of Electoral Democracy (Skaaning, Gerring, and Bartusevičius 2015), the polychotomous Polity2 index (Marshall and Gurr 2020), and the continuous Machine Learning Democracy Index (Gründler and Krieger 2021). These indicators, along with V-Dem’s Electoral Democracy Index, represent distinct ways to conceptualize and measure the level of democracy in each country. Thus, it is important to check whether the baseline results are sensitive to the selection of democracy indicator.
Fifth, I examine whether the results hold when undercounting is calculated based on the estimates of excess deaths produced by the World Health Organization (WHO 2022), The Economist (2022), and Karlinksy and Kobak (2021) (columns 10–12). It is important to check whether the baseline results are not merely an artifact of the particular estimation method used by IHME. As before, I use cumulative excess deaths up to the end of December 2021. Like IHME, WHO and The Economist used covariate prediction models based on countries with sufficient all-cause mortality data to generate estimates for locations without sufficient data. However, it may be argued that the prediction models used by those three organizations are built based on countries whose characteristics do not adequately represent the countries for which they aim to provide estimates (Adam 2022). One advantage of Karlinksy and Kobak’s estimates is that they are restricted to the 101 countries with sufficiently complete vital statistics, so they are not model dependent. This comes at a cost, however, because the excluded countries may be self-selecting based on regime type.
Reassuringly, each of the robustness checks is consistent with the baseline finding. However, even with the inclusion of a number of control variables and a range of robustness checks, it remains possible that omitted factors are driving the results. Thus, these results should be seen as providing suggestive, rather than conclusive, evidence.
Discussion
Overall, these results indicate that autocratic political leaders are more likely to deliberately underreport COVID-19 data than their democratic counterparts are. This finding is consistent with existing research on the way in which the political survival of autocrats is dependent on their ability to manage the information available to citizens (Carlitz and McLellan 2021; Hollyer, Rosendorff, and Vreeland 2015; King, Pan, and Roberts 2013; Stockmann and Gallagher 2011). Indeed, there is growing evidence that the balance between information control and repression in autocracies has shifted over the last two decades. The new breed of autocrat places more emphasis on the manipulation of information than the inculcation of fear in order to prolong their tenure in power (Guriev and Treisman 2019). According to that approach to regime survival, the key is to prevent disgruntled citizens from becoming aware that there are a sufficient number of them to overthrow the government. The threat of repression remains as a deterrent, but that may not suffice if a critical number of citizens manage to overcome the collective action problem. Moreover, traditional repression is more likely to attract the attention of the international community, raising the prospect of sanctions and the withdrawal of financial aid. I have argued that the misreporting of policy-relevant statistics remains an important means for the autocrat to block access to the information necessary for collective action, even as they endeavor to persuade citizens that they are not responsible for bad outcomes. The autocrat’s own lack of information about the degree to which citizens believe their spin on bad news means they also prefer to hide the amount of bad news.
During the pandemic at least two authoritarian states adopted a different approach to information control. Rather than undercounting the number of COVID-19 deaths, the governments of Tanzania and Turkmenistan simply denied the presence of the virus in their countries (Carlitz, Yamanis, and Mollel 2021; Ibbotson 2020). Denial precluded the very need to report data or to spin bad news. However, even though it would have been difficult for citizens to ascertain the exact death toll, it would have been increasingly obvious that a deadly contagion was spreading through their communities. Persuading citizens that the death count was lower than elsewhere would have become an easier proposition than persuading them that there were no deaths to count in the first place. Moreover, denial would have made it very difficult to implement public health policies designed to limit the number of cases and deaths. Unsurprisingly, therefore, nearly all other governments chose to regularly release mortality data during the pandemic. The Tanzanian government did eventually report some mortality data, but the number of data releases were few and far between, and they likely severely understated the true death toll (by the end of 2021, for example, reported deaths in that country were 180 times lower than the excess mortality estimated by IHME). Generally speaking, the withholding of economic and development data is now less common than it used to be because of the expectations of international organizations, aid donors, and investors (Carlitz and McLellan 2021). Moreover, because the complete withholding of information is observable, it may raise suspicions among citizens, thereby encouraging them to seek out more information about the topic the government wishes to hide (Roberts 2020, chap. 4). However, while governments have an incentive to release policy-relevant statistics, the incentive to doctor them remains. Paying lip service to transparency while at the same time publishing falsified information represents a more nuanced and potentially more successful way for autocrats to prevent collective action.
Limitations and Future Research
One area that the current study does not explore is the extent to which deliberate underreporting is attributable to the revision of data received by the national government, or to the government’s conscious attempts to prevent the collection of accurate data in the first place. In the latter case the national government may, for example, deliberately curtail COVID-19 testing, such that it is harder for health care workers to assign the correct cause of death in each case. Alternatively, the government may implicitly or explicitly encourage local officials to generate underestimates (Kofanov et al. 2023). This comes at a cost, however, because the government itself may then lack the information necessary to develop an adequate policy response and thereby to forestall criticism and protest. This suggests that autocratic leaders will prefer the first approach: the collection of unmanipulated data, followed by public dissemination of manipulated data. Nevertheless, there remains the possibility that the intended target of the deception is the national government itself. Local officials beholden to the central government may submit fabricated estimates to exaggerate their achievements or hide policy failures (Wallace 2016). Regardless, I contend that manipulation by local officials is less likely to occur in a democratic context, because democratic governments are exposed to scrutiny from the opposition, media, and civil society.
It might be argued that citizens care more about economic outcomes than about population health outcomes, or that they are more likely to hold governments responsible for bad economic outcomes than for bad health outcomes. There is some evidence that citizens in democracies are already predisposed to treat pandemics as natural phenomena that are beyond the control of policy makers (Acharya, Gerring, and Reeves 2020; Achen and Bartels 2017: 140–42). Similarly, they may find it comparatively difficult to determine the extent to which bad health outcomes are attributable to government incompetence, as opposed to the behavioral decisions of their fellow citizens (Mani and Mukand 2007: 507). If that is correct, then there is less incentive for political leaders to manipulate population health data. The results of this study suggest that, even if that is the case, there remains an incentive for autocratic leaders to misinform citizens about health risks. Nevertheless, further research is needed to establish whether autocrats manipulate economic data more than health data.
Finally, it should be noted that the results presented here relate to data manipulation after an epidemiological shock. These findings likely apply to the propensity for manipulation following other kinds of shock, such as war, famine, natural disaster, or severe economic recession. Arguably, the threat to the government’s survival is less pronounced in the absence of severe shocks (Mani and Mukand 2007: 507–8), so the incentive to falsify data is reduced in such cases. Still, existing studies indicate that autocrats fabricate economic data even in the absence of a recession (Carlitz and McLellan 2021; Magee and Doces 2015; Martínez 2022). Nevertheless, further research is needed to determine whether autocrats are more likely to manipulate data relating to adverse health outcomes that are not due to a shock (e.g., death and disability due to noncommunicable diseases).
Conclusion
The primary theoretical claim of this study is that political leaders will manipulate reported data to downplay bad news, if there is a sufficiently low probability that their deception will be detected by a critical mass of citizens (and to the extent that the deception does not fundamentally threaten the government’s ability to implement policy). This implies that democratic governments are less likely to manipulate published data because the deception is more likely to be exposed by independent media, political parties, and civil society organizations. The COVID-19 pandemic presents a unique opportunity to examine this claim because of its widespread impact and the fact that it severely tested the competence of each government. Using estimates of excess mortality for a large number of countries, I find evidence that autocratic governments are more likely to deliberately undercount deaths attributable to the pathogen than their democratic counterparts are. Similarly, I find that the case and death counts reported by autocratic regimes are more likely to feature statistical anomalies. These results hold when controls are included for unintentional mismeasurement and after running a battery of robustness checks. In terms of the specific features of democratic rule, I find that lack of opposition independence from the ruling regime and the absence of free and fair elections were the most important predictors of deliberate undercounting. On the other hand, media freedom and especially freedom of civil association were found to be comparatively less important.
Overall, these results suggest that autocratic leaders opted to manipulate COVID-19 data, even though they could use their control over traditional and online media to disown responsibility for bad news. This conclusion is consistent with two previous cross-national studies on the association between regime type and the manipulation of national income statistics (Magee and Doces 2015; Martínez 2022). Taken together, these studies imply that politically sensitive data are systematically biased in favor of autocratic regimes.
This presents a significant problem for citizens, researchers, and international organizations. First, the absence of accurate information may prevent citizens from being able to make the decisions necessary to protect their own well-being; governments that deliberately understate the threat posed by a disease limit people’s ability to take steps to avoid preventable morbidity and premature mortality.4 Second, aggregate indices of economic and human development may overstate the performance of autocratic regimes if the input data are directly sourced from each government. Third, cross-national studies that examine the association between regime type and policy outcomes such as economic growth, educational attainment, and infant mortality may be biased in favor of autocratic regimes. Fourth, the manipulation of data makes it harder for international organizations and aid donors to determine whether recommended targets have been achieved and whether these actors are supporting the right policies. This suggests that health advocates and scholars should use alternative methods to estimate policy-relevant outcomes for autocratic states, particularly in cases where opposition political parties lack independence from the ruling regime or must compete in elections that are designed to prevent them from winning.
Notes
Two additional studies infer variation in the manipulation of data according to regime type by excluding alternative explanations for the cross-national variation in reported COVID-19 deaths (Annaka 2021; Cassan and Van Steenvoort 2021).
Manipulation that is clearly inconsistent with citizens’ experienced reality is both observable and ineffective (Cavallo, Cruces, and Perez-Truglia 2016). In addition, such manipulation may reduce citizens’ trust in official statistics in general, including those that have not been manipulated (e.g., citizens may interpret accurately reported epidemic deaths as underestimates if the government has a track record of understating inflation statistics). Nevertheless, autocrats may still engage in observable data manipulation because it enables them to signal strength and thereby deter dissent (Huang 2015).
WHO guidelines stipulate that COVID-19 should be listed as the cause of death if a person dies with a probable or confirmed case of COVID-19, unless there is “a clear alternative cause of death that cannot be related to COVID disease (e.g. trauma)” (WHO 2020).
There is already some evidence that manipulation reduced citizens’ trust in reported COVID-19 statistics and their compliance with antipandemic measures (Kofanov et al. 2023; Lamberova and Sonin 2023).