Abstract
One of the most common methods for estimating the U.S. unauthorized foreign-born population is the residual method. Over the last decade, residual estimates have typically fallen within a narrow range of 10.5 to 12 million. Yet it remains unclear how sensitive residual estimates are to their underlying assumptions. We examine the extent to which estimates may plausibly vary owing to uncertainties in their underlying assumptions about coverage error, emigration, and mortality. Findings show that most of the range in residual estimates derives from uncertainty about emigration rates among legal permanent residents, naturalized citizens, and humanitarian entrants (LNH); estimates are less sensitive to assumptions about mortality among the LNH foreign-born and coverage error for the unauthorized and LNH populations in U.S. Census Bureau surveys. Nevertheless, uncertainty in all three assumptions contributes to a range of estimates, whereby there is a 50% chance that the unauthorized foreign-born population falls between 9.1 and 12.2 million and a 95% chance that it falls between 7.0 and 15.7 million.
Introduction
Estimates of the size, growth, and composition of the unauthorized foreign-born population are important for understanding population distributions and trends in the United States. They also shape public debates about immigration and are important for the evaluation and administration of U.S. policies. For example, accurate estimates of the unauthorized foreign-born population can shed light on the scope and cost of proposed legislation to grant legal status to certain groups of unauthorized immigrants and help evaluate immigration enforcement efforts (Meissner and Mittelstadt 2020).
One of the most common methods for estimating the unauthorized foreign-born population is the residual method (e.g., Baker 2021; Bean et al. 2001; Warren and Passel 1987). In its most basic form, this method subtracts an estimate of the legally resident foreign-born population—composed of legal permanent residents; naturalized citizens; and refugees, asylees, and other humanitarian entrants (a group hereafter referred to as the “LNH” foreign-born) observed in administrative data—from the total foreign-born population recorded in the American Community Survey (ACS) or another major national survey. After accounting for the degree to which foreign-born individuals are underrepresented in the ACS and making other adjustments, the difference yields an estimate of the unauthorized foreign-born.
Residual estimates generated for the most recent ACS data years have typically fallen within a million of one another no matter which research group or organization produced the estimate. For example, the U.S. Department of Homeland Security (DHS) (Baker 2021) estimated there were 11.4 million unauthorized immigrants as of January 2018, and the Pew Research Center (Pew) (Passel and Cohn 2019) estimated a population of 10.5 million as of mid-2017. The consistency of these estimates has contributed to media and public confidence and driven consensus about the changing size and composition of the unauthorized immigrant population. However, the similarity of the estimates may convey a false degree of certainty. Residual estimates rely on assumptions about emigration, mortality, and coverage error among the foreign-born population, and the precise levels of these inputs are not known with complete certainty. Despite this uncertainty, none of the research organizations that produce residual estimates have provided plausibility ranges, yet doing so would help the demographic community evaluate whether there are meaningful differences among the various estimates. It would also be important to know if the plausible range around residual estimates is so wide as to render these estimates useless for public policy debates; a high level of uncertainty would motivate future research to narrow the range.
Here, we develop an estimate of the plausible range of residual estimates of the unauthorized foreign-born population. Our overarching strategy is to examine how uncertainty in key inputs translates into uncertainty in residual estimates. In what follows, we first review our approach to calculating residual estimates. We then use a simple simulation to estimate the sensitivity of residual estimates to changes in the method's three key assumptions: (1) the coverage error in the ACS and other nationwide surveys of the unauthorized and LNH foreign-born populations, (2) emigration rates of the LNH foreign-born, and (3) death rates of the LNH foreign-born.
Finally, we produce our own residual estimates using coverage error, emigration, and death rates that reflect the best available evidence, and assess their sensitivity to a plausible range of assumptions. We find that most of the uncertainty in residual estimates derives from uncertainty in emigration rates; residual estimates are less sensitive to assumptions about coverage error and even less sensitive to mortality assumptions. After accounting for uncertainty in all three assumptions, we estimate that there is a 50% chance that the unauthorized foreign-born population falls between 9.1 and 12.2 million, and a 95% chance that it falls between 7.0 and 15.7 million.
The Residual Method
As part of our effort to assess the uncertainty in residual estimates, we developed our own residual estimates of the unauthorized foreign-born population by age, sex, year of arrival, and country or region of birth using the best available data and methods available to us. We followed the same approach and obtained similar results as other researchers (i.e., Pew and DHS). Nevertheless, different researchers tend to rely on slightly different data sources and assumptions. We provide an overview of our method along with a full list of data sources in Appendix A and comparisons with the different assumptions used by DHS and Pew in Appendix B (see online appendix).
For the purposes of developing a residual estimate, we distinguish among three foreign-born groups, as shown in Box 1. The first, the unauthorized foreign-born population—or U—includes individuals who entered the country without inspection and those who arrived legally with temporary visas (e.g., student, tourist, temporary worker) but overstayed or otherwise violated the terms of their visas. We also include foreign-born individuals who have received an official, temporary reprieve from deportation but otherwise resemble unauthorized immigrants demographically—for instance, Temporary Protected Status (TPS) recipients, Deferred Action for Childhood Arrival (DACA) participants, and asylum applicants with work authorization. For estimation purposes, we limit this group to those who arrived in the country in 1982 or later, with the rationale that most immigrants who arrived before 1982 would have been legalized because they were eligible for amnesty under the Immigration Reform and Control Act of 1986 (IRCA). This group of post-1982 entrants who are unauthorized cannot be identified directly in administrative records or census data.
The second group, the legal permanent resident/naturalized/humanitarian population, includes all naturalized citizens; legal permanent residents (LPRs, or “green card” holders); and immigrants with humanitarian statuses, such as refugee or asylee, who have yet to adjust to LPR status. Like the unauthorized foreign-born, we limit this group to those who arrived in the country in 1982 or later. This group can be estimated using administrative data.
Finally, the third group, all other foreign-born, includes nonimmigrants admitted lawfully with temporary visas (such as international students, H-1B high-skilled workers, and H-2A agricultural workers) and all foreign-born persons who arrived in the country before 1982. Pre-1982 arrivals can be identified directly in the ACS, while nonimmigrants can be identified indirectly from their characteristics in the ACS.
As just noted, the unauthorized foreign-born population cannot be estimated directly. However, the combined unauthorized and LNH populations (C = U + LNH) can be estimated using the ACS by excluding the “other foreign-born” (nonimmigrants and pre-1982 arrivals) from tabulations of the total foreign-born population. Additionally, the number of LNH foreign-born can be estimated using administrative data. Therefore, after certain adjustments are made, the unauthorized foreign-born population can be estimated by subtraction (U = C – LNH).
Estimation occurs in four steps. First, we use the ACS (Ruggles et al. 2020) to estimate the combined unauthorized and LNH populations (C), disaggregated by sex (s), region or country of birth (r), birth cohort (c), year of entry (y), and year (t). To obtain these estimates, we tabulate the foreign-born population for each demographic subgroup using the ACS. Our sample includes all persons of foreign parentage born outside of the United States or outlying areas except those in the third “other foreign-born” group in Box 1. We drop this group (i.e., nonimmigrants and those who arrived in the United States before 1982) from the sample. Nonimmigrants can be identified in the ACS and other survey data with some precision. They include noncitizens whose occupations, immigration histories, and family/household characteristics are congruent with the eligibility criteria for specific nonimmigrant visa categories. For example, international students (F-1 visa holders) can be identified from age of arrival to the United States, full-time school enrollment, and lack of full-time employment. H-2A workers can be identified on the basis of years in the country, country of birth (nearly all are Mexican-born), and agricultural employment, while H-1B high-skilled workers can be identified from their educational attainment, years in the country, and employment in certain occupations such as information technology workers, engineers, researchers, and doctors and surgeons. Totals of nonimmigrants identified in the ACS are comparable with administrative data from DHS.
where A is the number of LNH admissions or entrants in year a, D is the annual number of deaths, and E is the annual number of emigrants. D and E are derived from a set of assumed mortality (m) and emigration (g) rates among the LNH foreign-born multiplied by the size in year i of the cohort that was admitted in year a.1 Even though the LNH foreign-born become eligible to naturalize after five years in LPR status (three years if married to a U.S. citizen), we do not construct separate estimates of those who naturalized and those who remained noncitizens in our methodology; doing so would unnecessarily introduce error into the estimates owing to known biases in self-reports of citizenship status (Brown et al. 2019; Van Hook and Bachmeier 2013).
Equation (1) demonstrates that the estimate of the LNH population is subject to uncertainty in assumptions about the mortality and emigration rates, and that systematic errors in assumptions about mortality and emigration rates (which are contained within the summation sign) accumulate over time. If the annual emigration rate were too large, for example, this would contribute to overestimates of the annual number of LNH foreign-born leaving the country, and the error in the cumulative number of emigrants would grow as time elapses since admission.
The LNH population ( is derived from administrative records and therefore is unaffected by coverage error. However, the combined unauthorized and LNH population ( is derived from ACS data and could be too low because of coverage error. Before subtracting the LNH population from the combined population, we adjust the LNH population downward (by multiplying by to reflect the number represented in the ACS. As a result, the numerator of Eq. (2) is the unauthorizedpopulation represented in the ACS, which we finally adjust upward by dividing by 1 – to yield the total unauthorized population. Note that a higher coverage error of either the unauthorized or the LNH population would inflate the unauthorized population.
The fourth and final step involves smoothing the unauthorized population estimates to account for heaping in reported year of entry every 10 years (i.e., around 1990, 2000, and 2010) to ensure continuity in entry and birth cohorts over time, and to reduce the incidence of negative population estimates.
A Simple Simulation
Although many observers see coverage errors in the ACS and other surveys as the major challenge for residual estimates, the residual method also relies heavily on assumptions about the emigration and mortality of the LNH foreign-born population. Indeed, emigration and mortality are key factors in determining the size of the LNH population, which is subtracted from the combined unauthorized and LNH foreign-born populations to derive the estimate of unauthorized immigrants. Higher estimates of coverage error for unauthorized immigrants and higher estimates of coverage error, emigration, and mortality among the LNH foreign-born all result in higher residual estimates. But which factors matter the most?
The effect of these assumptions can be assessed using a simple simulation. Imagine that 1,000 LNH foreign-born individuals are admitted each year. Similar to actual rates in the United States (Baker 2021), 1% of them emigrate each year, 0.1% die each year, and coverage error is 1% among the LNH foreign-born and 10% among unauthorized immigrants. After 35 years, 35,000 foreign-born individuals are enumerated in a census survey and this number increases by 1.5% annually. Under these assumptions, we would estimate 6,934 unauthorized immigrants. This is referred to as the “original estimate” in Table 1 (panel A, row 1).
Now imagine that these assumptions were 50% higher. The estimate of the unauthorized population would increase by 36% if the emigration rate for the LNH population increased by 50% (row 4), but only by 6%, 2%, and 4%, respectively, if the other assumptions increased by 50% (rows 2, 3, and 5). This simulation shows that the residual estimate is especially sensitive to changes in emigration rates.
Emigration rates have the largest impact because they are applied to the LNH population each year over the 35-year projection period, so their impact accumulates over time. While mortality rates are also applied over 35 years, they have less impact on the result because they are much lower (0.1% vs. 1% annually). This point is illustrated in panel B of Table 1, which projects the scenarios in panel A forward in time—40, 45, and 50 years after the initial starting point of the simulation. When the assumed emigration rate increases by 50%, the percentage difference from the original estimate grows over time: 49% after 40 years, 61% after 45 years, and 70% after 50 years. In contrast, when coverage error or mortality rates increase by 50%, the percentage difference from the original estimate remains low and nearly constant over time.
This simple illustration reveals an important point. Residual estimates are particularly sensitive to small changes in the emigration rate of the LNH population. Mortality assumptions could also become influential as U.S. residence increases and mortality rates rise, particularly for older immigrant cohorts.2 Coverage-error assumptions do not influence the estimates as much because changes in coverage assumptions are applied only once in the model; they do not accumulate over time. Of course, this simple illustration may not hold under more realistic conditions. Of note, this illustration does not account for the fact that emigration rates tend to decline with increased duration of residence, which may offset the tendency for errors in emigration rates to accumulate over time since admission. Additionally, the importance of coverage error may decline over time as the unauthorized population grows older, accrues more years of U.S. residence, and is more likely to be represented in household surveys.
A Plausible Range of Residual Estimates Under Realistic Conditions
We next approximate the plausible range of residual estimates for the unauthorized foreign-born population under more realistic conditions. To do this, we first review, and update as necessary, prior research on coverage error, emigration, and mortality. We pay attention not just to the levels of these assumptions, but also to the degree of variation among plausible estimates, which we interpret as an indication of uncertainty. We specifically use the standard deviation across plausible values of each assumption in prior research to produce probability distributions for each assumption. We draw random values from these distributions to use as inputs for residual estimates.
Coverage Error
The residual method relies on coverage-error estimates for both the unauthorized and the LNH foreign-born populations, in that higher levels of coverage error for either population would lead to a higher estimate of the unauthorized population. In the ACS and similar nationwide surveys, coverage error occurs when people are missed because they fail to respond to survey takers; they respond but provide insufficient or inaccurate information about their demographic characteristics (in this case, their place of birth and citizenship); or they live in nonresidential or unconventional locations. Coverage error could be particularly high among unauthorized immigrants because they may be more difficult to locate (e.g., they live in agricultural worker barracks or crowded multifamily housing units), or they may attempt to avoid detection owing to fear of government authorities.
Most prior research on the coverage error for unauthorized immigrants has focused on Mexicans, the largest national-origin group among them. Therefore, we first review the evidence about Mexicans before explaining how we extrapolate these results to other groups. In general, this research compares the population counted in the U.S. Census or ACS with an independent estimate of the same population derived or inferred from noncensus data sources, such as birth or death registrations, independent surveys, ethnographic studies of neighborhoods with large shares of unauthorized immigrants, and estimates of Mexicans living in the United States as derived from Mexican census data. The idea is that the unauthorized foreign-born population leaves “footprints” in statistical and administrative record systems even if they do not willingly participate in official U.S. Census and survey collection efforts (Gelatt et al. 2018).
In 1990, evaluations of the rate of coverage error for the Mexican unauthorized immigrant population fell in the range of 15% to 35% (Corona Vasquez 1991; de la Puenta 1992; U.S. General Accounting Office 1993; Van Hook and Bean 1998) and remained in this range until the middle of the 2000–2009 decade (Genoni et al. 2012; Hill and Wong 2005). Warren's recent analysis (2020) supports these findings. He examined the decline in cohort sizes (after accounting for mortality) between the 1990 and 2000 Mexican censuses and found that about 5.5 million people left Mexico during the 1990s. The 2000 U.S. Census counted 4.5 million such individuals, implying a coverage-error rate of 18% for 2000.
However, on the basis of their analyses of U.S. death records and Mexican census data, Van Hook and colleagues (2014) found evidence that coverage error declined substantially during the latter half of the 2000–2009 decade. Declining coverage error was apparently associated with substantial reductions in shorter term unauthorized immigrant laborers during the Great Recession (particularly in the hard-hit sectors of construction and services)—a group that is likely to be harder to count than longer term, more settled unauthorized immigrants. By 2010, coverage-error rates for the unauthorized Mexican-born population were estimated to be below 8%. These estimates are somewhat lower than the coverage-error assumptions made by DHS (10%) and Pew (13%)3 in the past, although Pew now assumes similarly low levels of coverage error.
To conduct the work presented here, we update Van Hook and colleagues’ (2014) estimates of coverage error with the latest available Mexican census data and U.S. death records. We find evidence of further declines in coverage error among women, but small increases among men, between 2010 and 2017. We produce these estimates by analyzing two different data sources: (1) death registrations of Mexican-born individuals in the United States and (2) net migration from Mexico based on Mexican census data. Table 2 presents ranges for these estimates to reflect uncertainty in the mortality rate of Mexican immigrants and coverage error in the Mexican census. The methodology underlying these estimates is described in Appendix C of the online appendix.
We next extrapolate the estimates for Mexicans to non-Mexicans and adjust levels of coverage error to account for likely variation by year and duration of residence in three ways. First, we linearly interpolate values for the years not shown in Table 2 (see Table C2 in the online appendix). Second, we assume that coverage error for Latinos was the same as for Mexicans, but that coverage error for non-Latinos (chiefly those from Africa, Europe, and Asia) was 25% lower than the values shown in Table 2. We make this assumption because almost all non–Latin American unauthorized immigrants overstay their visas rather than enter the country illegally and tend to be more highly educated and therefore live in better housing. Both of these factors make them more likely to be represented in ACS and census data. This is also consistent with estimates produced by the U.S. Census Bureau for 2010 (Jensen et al. 2015), showing that coverage error for the Hispanic foreign-born population is much higher than for the non-Hispanic foreign-born population. Third, we assume that recent arrivals (those with fewer than five years of U.S. residence) have coverage-error rates that are three times as high as rates for longer term residents (10 or more years of residence), consistent with evidence of high coverage error among recent arrivals (Van Hook et al. 2014).
To estimate uncertainty in coverage error among unauthorized immigrants, we use the standard deviations in Table 2 to produce probability distributions of coverage error from 2000 to 2018. We use a gamma function to constrain the distribution to positive values. Figure 1 provides an example of the probability distribution of coverage error for Mexican men in 2018 (average = 10%, SD = 3%). When averaged across all demographic groups, the mean coverage error of unauthorized immigrants is 18.9% (SD = 10.0%) in 2005; 5.8% (SD = 3.8%) in 2010; and 5.1% (SD = 3.2%) in 2018.
Finally, although we know less about coverage error among the LNH population, we assume that it is low given that net coverage error was virtually zero for the entire U.S. population and only 1.54% for all Hispanics in 2010 (Mule 2012). DHS and Pew both estimate coverage error among the LNH population to be 1.5%, so we assume the same but with a standard deviation of 0.5% to account for uncertainty.
Emigration
Besides coverage error, the residual method relies on estimates of emigration among the LNH population; emigration rates are needed to estimate how many in this population left the country following their admission. Higher levels of emigration lead to lower estimates of the LNH population and correspondingly higher estimates of the unauthorized population. Unfortunately, official government statistics on emigration from the United States have not been published since 1956, mainly owing to concerns about the incompleteness and poor quality of emigration administrative records (Kraly 1998). Therefore, out of necessity, foreign-born emigration has been estimated with a variety of indirect demographic methods.
The U.S. Census Bureau estimates net emigration using a residual method (not to be confused with the residual method for estimating the unauthorized foreign-born). This method compares the size of foreign-born cohorts between two decennial censuses or surveys after adjusting for mortality, yielding estimates of emigration among the entire foreign-born population. Residual-based estimates of the annual foreign-born emigration rate tend to fall between 1% and 1.2% (Warren and Peck 1980: 1.2%; Ahmed and Robinson 1994: 1.2%; and Mulder 2003: 0.9%). A limitation of this method is its inability to estimate emigration for recent entrants (i.e., those arriving during the period between the two decennial censuses). Borjas and Bratsberg (1996) overcame this problem by using immigrant-admission records collected over multiple years in place of the first census. Their estimates imply annual emigration rates of 3.8% in the first five years and 0.8% in the second five years of U.S. residence. Leach and Jensen (2013) also overcame this problem by tracking the size of immigrant entry cohorts across adjacent years of the ACS. They too found higher annual rates of emigration for recently arrived immigrants: 0.6% for all immigrants and 1.3% among those in the country less than 10 years, which implies an annual rate of about 0.4% for longer term residents. Leach (2017) later revised these estimates upward, implying rates of 0.8%, 1.8%, and 0.5%, respectively.
Other researchers have used linked administrative records to estimate emigration levels and rates among LPRs and naturalized citizens. Jasso and Rosenzweig (1982) linked immigrant admissions data from 1971 (which contain a record for all immigrants who were granted LPR status in that year) to data from the now defunct Alien Address Report Program, finding an annual emigration rate of 2.1%. Duleep (1994) used Social Security Administration (SSA) records matched across years to estimate the emigration of all immigrants with work authorization, whereby a discontinuation in earnings across multiple years (without retirement) was interpreted as emigration. She found that about 30% of the immigrants in the SSA earnings file eventually emigrated, implying an annual emigration rate of 2.8% in the first decade of U.S. residence but less than 1% in subsequent decades. More recently, Schwabish (2009) used a similar approach to estimate emigration among immigrants in the SSA earnings file, finding somewhat lower levels of emigration: 1.3% overall and 2.3% in the first decade of U.S. residence.4
Our residual estimates require estimates of emigration for the LNH foreign-born population. No published emigration rates perfectly align with this specific population, but we selected the emigration rates pertaining to immigrants in the SSA earnings file. Although this file includes some unauthorized immigrants who have fraudulent Social Security numbers, and some classes of nonimmigrants who do not eventually adjust to LPR status, it excludes a greater share of both of these types of immigrants than does the ACS—the basis for other estimates of emigration such as those produced by the Census Bureau.5 This suggests that the SSA earnings file may be a more accurate source of information about emigration of the LNH foreign-born population.
Among the SSA-based emigration rates, we choose those by Schwabish because they are the most recent and because he provided us with a prediction model of the annual probability of emigration, which we use to produce annual emigration rates broken down by age, sex, duration of residence, and country or region origin.6 We adjust Schwabish's estimates to account for annual trends in emigration. We specifically use the ACS to produce annual residual estimates by country or region of birth from 2005 to 2018 following Leach's (2017) methodology. Emigration among the foreign-born tended to be low in the years before the Great Recession but increased between 2007 and 2009, fell between 2010 and 2014, and then increased again after 2015. We adjust the Schwabish estimates to account for annual fluctuations while maintaining the average probability of emigration by age, sex, and duration of residence as designated by Schwabish's prediction model (estimates are shown by region of birth, year, and duration of residence in Table 3).
To estimate the level of uncertainty in emigration among the LNH foreign-born, we examine the variation in estimates in prior literature. If we consider all of the studies cited above, the standard deviation of the estimates is 0.75%. However, if we confine ourselves to studies of immigrants who attained LPR status or are present in the SSA earnings file (the group of greatest relevance), the standard deviation is 0.42%; if only the census studies are considered, the standard deviation drops further to 0.26%. Because of our focus on emigration among the LNH population, we select a moderate level of uncertainty. We center the probability distribution around the Schwabish, trend-adjusted emigration rate, and we set the standard deviation of the probability distribution at half the level of the emigration rate, and again, we use a gamma distribution to constrain the distribution to positive values. When averaged across all demographic groups, the mean emigration rate of the LNH population was 1.1% (SD = 0.53%) in 2005; 1.1% (SD = 0.56%) in 2010; and 1.8% (SD = 0.9%) in 2018.
Mortality
Finally, the residual method relies on estimates of mortality among the LNH population. Higher mortality rates lead to lower stock estimates of the LNH population and a higher estimate of the unauthorized population.
Most researchers who produce residual estimates assume that the LNH population has the same age- and sex-specific mortality rates as the U.S. population. But given the well-documented mortality advantage of immigrants (Hummer et al. 2000; Riosmena et al. 2017), we adjust the U.S. mortality rates downward; the adjustments are based on our analysis of the 1997–2009 National Health Interview Survey (Blewett et al. 2019). We first estimate Cox proportional hazard models predicting the hazard of dying as a function of region of birth (Latino, Asian, and other foreign-born vs. U.S.-born), by sex. We then use the estimated hazard ratios (see Table 4) to adjust the mortality rates for the United States (Human Mortality Database n.d.), thus obtaining sex-, age-, and year-specific rates for Latino, Asian, and other immigrants. Uncertainty in these estimates derives primarily from sampling error, so we use the standard errors of the coefficients to determine the spread of the probability distribution of coefficients, using a normal distribution.
Baseline Residual Estimates
Our assumptions lead to estimates that are similar to those produced by others, for both the total unauthorized foreign-born population (Figure 2) and the unauthorized Mexican-born population (Figure 3). On closer inspection, however, our estimates of the total tend to be higher than DHS and Pew estimates in 2005 and 2006 and lower than their estimates between 2010 and 2015. Our estimates of the unauthorized Mexican-born population follow a similar pattern, except that they closely conform with Pew estimates between 2010 and 2015. Our 2005–2018 estimates differ from the others by about 756,000 (6.8% of the average) for the total unauthorized foreign-born population and by about 424,000 (6.7% of the average) for the unauthorized Mexican-born population. Estimates by country or region of birth also differ somewhat. For example, our method estimates more Mexicans and Europeans/Canadians than the Pew method (Figure 4). We could not compare our estimates with DHS estimates because of inconsistencies in country/region categories.
Are these differences meaningful, or do they fall within a range of equally plausible estimates? We turn to this question next.
Plausible Range of Residual Estimates
To ascertain the uncertainty of residual estimates, we draw random values from the distributions of assumptions and use them to calculate residual estimates. We repeat the process 1,000 times to obtain a distribution of residual estimates associated with uncertainty in underlying assumptions. To isolate the effects of each assumption, we conduct three different simulations, whereby we allow each assumption—coverage, emigration, and mortality—to vary while holding values of the remaining assumptions fixed at their average levels. Finally, to gauge the combined effects of uncertainty, we conduct a fourth simulation in which we allow all assumptions to vary simultaneously.
The resulting distributions of residual estimates are summarized in Table 5 and Figure 5. Table 5 displays the average residual estimates by year in the first column and the standard deviations of the distributions for simulations that vary by coverage error, emigration, mortality, and all factors simultaneously in the remaining columns. The magnitudes of the standard deviations indicate distribution spread, and hence the degree of uncertainty in the estimates. To further illustrate the uncertainty in the estimates due to uncertainty in all three assumptions, Figure 5 depicts the probability distribution of residual estimates over time from 2005 to 2018.
The results in Table 5 show that the residual estimates are most sensitive to uncertainty in emigration rates, particularly during the 2010–2018 period, and least sensitive to uncertainty in mortality rates. As discussed earlier, uncertainty about emigration in prior research led us to postulate a probability distribution with a standard deviation equal to half the emigration rate; when averaged across different demographic groups, the standard deviation of the emigration rates was about 0.65%. In 2018, this amount of uncertainty about emigration was associated with an estimated 2.3 million unauthorized immigrants.
In contrast, prior research on coverage error led us to postulate a probability distribution for the amount of coverage error with an average standard deviation of about 3.2% as of 2018. But because this coverage-error rate is factored in only once, instead of annually over 36 years, it is associated with 507,000 unauthorized immigrants in 2018—less than one quarter of the uncertainty associated with emigration.
Finally, assumptions about mortality have far less impact on the estimates than emigration and coverage error. Mortality rates among the foreign-born are fairly well documented yet still subject to sampling error, leading us to postulate a narrow probability distribution. Moreover, the impact of mortality tends to be small given the youthful age structure of the immigrants in our analysis. Accordingly, we find that a one-standard-deviation increase in the assumed mortality rate was associated with only an additional 25,000 unauthorized immigrants in 2018.
Looking at earlier years in the simulations, emigration has not always been the most important factor. In 2005, the uncertainty in residual estimates associated with emigration (SD = 1,342) was less than the uncertainty associated with coverage error (SD = 1,778). However, uncertainty associated with coverage error declined over time as the unauthorized population grew more settled (Van Hook et al. 2014). Additionally, uncertainty associated with emigration and mortality increased over time because errors in these factors compounded as they were repeatedly applied to each LPR admission cohort every year since admission, as illustrated in our simple simulation in Table 1.
When uncertainty in all assumptions was considered simultaneously, the variation across estimates tended to run parallel to the most uncertain underlying assumptions, that is, coverage error in the earlier years and emigration in the later years (shown in the last column of Table 5). Uncertainty initially peaked in 2007 (SD = 2,380), declined between 2007 and 2010 (SD = 1,472), and then increased again between 2010 and 2018 (SD = 2,232). As of 2018, the 95% confidence interval of plausible residual estimates ranged from 7.0 to 15.7 million, meaning that there is a 95% probability that the true value lies within this range (see Figure 5). The interquartile range—within which half of the plausible estimates lie—is narrower, ranging from 9.1 to 12.2 million.
Conclusions
The residual method is one of the most common ways of estimating the size of the unauthorized foreign-born population, but it remains unclear how sensitive residual estimates are to uncertainty in their underlying assumptions. This makes it difficult to assess the plausible range of estimates of the unauthorized foreign-born population, and whether differences between estimates are meaningful. In this article, we produced a new series of residual estimates using the highest quality data we could identify, and we updated and improved assumptions about coverage error, emigration, and mortality. Beyond this, we examined the extent that residual estimates may plausibly vary because of uncertainties in their underlying assumptions about coverage error, emigration, and mortality.
The results of our simulations suggest that the estimates produced by Pew and DHS, which range from 10.5 to 12 million, may not be meaningfully different from one another. These research groups may use slightly different assumptions, but their estimates fall within a narrow plausible range of 9.1 million to 12.2 million, the interquartile range in our simulations. It would be difficult to conclude that one estimate is superior to another.
Our results also suggest that it is very unlikely that the unauthorized foreign-born population is larger than about 15.7 million. This is important in light of a recently published study (Fazel-Zarandi et al. 2018) in which the authors expressed skepticism that a significant portion of unauthorized immigrants are counted in census data. On the basis of an inflow–outflow estimation method, they claimed that the number of unauthorized immigrants living in the country in 2016 was much higher than estimated by the residual method—ranging from 16.7 to almost 30 million, with a midpoint of 22.1 million (Fazel-Zarandi et al. 2018). The lower bound of their estimate (16.7 million) is outside the upper bound of the 95% confidence interval produced by the residual method as described in this paper: 7 to 15.7 million. Across the 1,000 simulations varying emigration, mortality, and coverage-error rates conducted for our analysis, only 2% yielded estimates of 16 million or higher, and none was as high as 22.1 million. Several commentators have already published critical evaluations of the Fazel-Zarandi et al. study and have shown that its estimates are too high because it fails to account for the circular migration patterns of unauthorized immigrants during the 1990s (Capps et al. 2018; Gelatt et al. 2018; Warren 2018). Our evaluation of the plausible range of residual estimates further supports these critiques.
Finally, our results demonstrate that most of the uncertainty in residual estimates derives from uncertainty in emigration rates among the LNH population. Coverage-error assumptions matter much less, and mortality assumptions scarcely matter at all. The sensitivity of residual estimates to assumptions about emigration stems from a feature of the residual method whereby errors in emigration (and mortality) accumulate over time. Emigration rates (and to a much lesser degree, mortality rates) determine the size of surviving LNH foreign-born cohorts living in the United States, so that when emigration is overestimated, the unauthorized population is also overestimated. In our simulations, a one-standard-deviation increase in the assumed emigration rate (or about half of a percentage point) was associated with nearly 2.3 million more unauthorized immigrants in 2018. Because error in emigration rates accumulates from the time of admission to the present, this type of error will increase in the future. Similarly, emigration errors of unauthorized immigrants compound over time in the inflow–outflow model employed by Fazel-Zarandi and colleagues (2018), greatly affecting their estimates. Unfortunately, the United States does not collect high-quality data on emigration. Researchers have had to rely on indirect methods, which tends to lead to inconsistent and imprecise estimates. It would be very easy for emigration estimates to differ by half a percentage point or more on account of any number of seemingly arbitrary methodological decisions. For example, when Leach (2017) updated his earlier work (Leach and Jensen 2013) , his estimate of the emigration rate among new arrivals (less than 10 years in the United States) increased from 1.3% to 1.8%. Moreover, it is possible—even likely—that emigration rates vary over time and across demographic groups. We attempted to account for this variation by using a prediction model to estimate emigration rates by age, sex, country of birth, and duration of residence, and by further adjusting the emigration rates to account for annual trends in emigration, yet very little of this potential variation in emigration has been formally studied.
In conclusion, we still view the residual method as more robust than other available methods, and we believe the strength of existing evidence supports the assumptions that have been used in generating these estimates. Even if these assumptions are slightly wrong, it is unlikely that the unauthorized immigrant population is far outside the range of current, widely used residual estimates. However, to move the field forward, it will be important to continuously develop new and better methods and data sources with which to estimate the number of unauthorized immigrants. This will become especially important as time passes and error associated with uncertainty in emigration rates continues to accumulate. Government agencies with the ability to contact and track immigrants after their admission may offer new avenues for research and development in this area. For example, DHS may have the capacity to produce precise and detailed estimates of emigration rates by using its own administrative data, but it has not produced such estimates for researchers' use.
Acknowledgments
We acknowledge assistance provided by the Population Research Institute at Pennsylvania State University, which is supported by an infrastructure grant from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (P2CHD041025).
Notes
This point can also be shown mathematically. The LNH population at time t, where t is the number of years following admission and the mortality and emigration rates are constants, can be expressed as If g is too low by .01, then is overestimated by a factor of Since the error is multiplied by t in the exponent, errors in mortality and emigration rates are compounded over time.
DHS rested its assumption about coverage error on a survey conducted in Los Angeles that was then compared to 2000 census counts (Marcelli and Ong 2002). Pew also based its assumption on the 2000 census, with coverage error calculated by incorporating data from the Census Bureau’s Accuracy and Coverage Evaluation (ACE) post-enumeration survey (U.S. Census Bureau 2003). Like previous such surveys, the 2000 ACE re-interviewed a stratified sample of households shortly following the decennial census. Respondents in the post-enumeration survey were matched to census respondents in order to assess rates of omission, duplication, and net coverage error. Although the ACE did not produce separate estimates for the foreign-born, the Pew Hispanic Center used the ACE to arrive at a 13% figure by assuming the coverage error for unauthorized immigrants was two to three times as high as that for others within the same race/Hispanic origin, age, and sex grouping.
A third approach for estimating foreign-born emigration is to analyze longitudinal surveys; examination of longitudinal data allows one to infer emigration by assessing attrition from the survey (Borjas 1989; Reagan and Olsen 2000; Van Hook et al. 2006). However, it is difficult to separate emigration from other reasons for attrition, such as failure to recontact participants and participant nonresponse, leading to some of the highest estimated rates of emigration in the literature. For example, Van Hook and colleagues (2006) found an annual emigration rate of 2.9% overall in an analysis of the rotating panels of the Current Population Survey.
Nonimmigrants who have Social Security numbers include a mix of visitors who stay for short periods and those who stay longer and may eventually adjust to LPR status. H-2B nonagricultural workers are admitted seasonally and therefore generally stay in the United States for less than a year. H-1B high-skilled workers, by contrast, are admitted for three-year periods and may renew once (for a total of six years), unless they apply to adjust to LPR status, in which case they can renew indefinitely. (Data on the proportion who stay longer and apply to become LPRs are not available.) International students who stay past their period of study to work under the Optional Practical Training program may later adjust their status to H-1B or another high-skilled nonimmigrant visa and eventually to LPR status. Despite these potential differences in length of stay among nonimmigrants, their small total number means they have relatively little influence on emigration rates, particularly for the lawfully present population with more than five years of U.S. residence.
We gratefully acknowledge the assistance of Jonathon Schwabish for providing his discrete-time event-history model (logistic regression) predicting the odds of emigrating in a given year. The model was estimated on a person-year file that contains a record for every foreign-born Social Security recipient from the time of entry into the Social Security system until emigration or censorship. We use the coefficients to calculate the log-odds of annual emigration for each demographic group, which we then convert to predicted probabilities (i.e., annual emigration rates). The prediction equation is: log-odds(emigration) = −7.59 + male(.05449) + age(.19721) + age-square(–.002) + Central American(–.440) + Caribbean(.017) + S. American(–.130) + European/Canadian/Aust(.526) + Asian(–.025) + Other(.133) + 5–9 Years US res(–.830) + 10–15 years US res(−1.273) +16–20 years US res(−1.650) + 21 + years US res(−2.900).