Abstract
We examine inferences about old-age mortality that arise when researchers use survey data matched to death records. We show that even small rates of failure to match respondents can lead to substantial bias in the measurement of mortality rates at older ages. This type of measurement error is consequential for three strands in the demographic literature: (1) the deceleration in mortality rates at old ages; (2) the black-white mortality crossover; and (3) the relatively low rate of old-age mortality among Hispanics, often called the “Hispanic paradox.” Using the National Longitudinal Survey of Older Men matched to death records in both the U.S. Vital Statistics system and the Social Security Death Index, we demonstrate that even small rates of missing mortality matching plausibly lead to an appearance of mortality deceleration when none exists and can generate a spurious black-white mortality crossover. We confirm these findings using data from the National Health Interview Survey matched to the U.S. Vital Statistics system, a data set known as the “gold standard” (Cowper et al. 2002) for estimating age-specific mortality. Moreover, with these data, we show that the Hispanic paradox is also plausibly explained by a similar undercount.
“By my troth, I care not. A man can die but once. We owe God a death … and let it go which way it will, he that dies this year is quit for the next.”
William Shakespeare, Henry IV Part 2, Act 3 Scene 2
Introduction
Both the scientific community and the general public have a deep interest in the processes that shape old-age mortality. Many facets of old-age mortality have elicited intense evaluation by scholars, including (1) plasticity of longevity, (2) the black-white mortality crossover, and (3) the so-called Hispanic paradox.
First, the plasticity of longevity is perhaps the most compelling topic in the study of mortality. Life expectancy has increased in a remarkably linear manner since the 1840s (Oeppen and Vaupel 2002), naturally giving rise to a debate about whether there is some built-in age limit to human life. Relevant to this debate are studies that have indicated a late-life mortality deceleration law—the empirical generalization that death rates level off at advanced ages, forming a late-life mortality plateau.1
A second widely studied phenomenon is the black-white mortality crossover. At younger ages, blacks have higher mortality rates than corresponding whites—a result that is unsurprising given racial differences in socioeconomic factors. However, at older ages, mortality rates become lower for blacks than for whites (Berkman et al. 1989; Dupre et al. 2006; Kestenbaum 1992; Lynch et al. 2003; Manton et al. 1979; Masters 2012).
Third, there is substantial interest in the empirical generalization that Hispanic and Latino Americans have mortality rates that are similar to, or lower than, their non-Hispanic white counterparts. This finding has been dubbed the Hispanic paradox (Markides and Coreil 1986), given the socioeconomic disadvantages experienced by Hispanics relative to whites.
Findings about racial and ethnic differences in mortality are so anomalous that some scholars have explored the possibility that the results are due primarily to “bad data” (e.g., Coale and Kisker 1986; Lynch et al. 2003; Preston and Elo 2006; Preston et al. 1996, 1998; Rosenberg et al. 1999; Swallen and Guend 2003). Despite decades of work to identify and correct data issues, demographers have largely concluded that careful data correction mitigates, but does not eliminate, the black-white crossover and Hispanic paradox. These findings have motivated a host of theories on why Hispanic Americans generally—and older blacks specifically—appear to die at lower rates than corresponding non-Hispanic whites (e.g., Abraido-Lanza et al. 1999; Manton and Stallard 1984; Manton et al. 1979; Shai and Rosenwaike 1987; Vaupel et al. 1979; Wrigley-Field 2014).
Our contribution to the study of old-age mortality falls squarely in the “bad data” domain. We show that even a small amount of a particular data problem—which we denote the “Methuselah effect”—biases inferences about old-age mortality in predictable ways. Our theoretical analysis shows that the Methuselah effect can create the appearance of the three aforementioned phenomena—namely, mortality deceleration associated with “plasticity of longevity,” the black-white crossover, and the Hispanic paradox—even when none exist. Empirical evaluation, using data from the National Longitudinal Survey of Older Men (NLS-OM) and National Health Interview Survey (NHIS), demonstrates that the Methuselah effect indeed operates as predicted by theory.
To see the issue at hand, consider a research design that relies on data structure such as the “gold standard” (Cowper et al. 2002) of NHIS data matched to National Death Index (NDI). It is easy to see why these data earned the gold standard imprimatur. After all, the NHIS data set is large and population-representative; it includes multiple birth cohorts; and it is constructed using administrative records, which are generally viewed as accurate. Even so, it is inevitable that for at least a few individuals who appear in the base sample, there will be a failure to match the death to administrative death records. This form of measurement error creates the Methuselah effect, so named because it produces a set of respondents who appear to live forever. The Methuselah effect will typically create small biases to the measurement of mortality at relatively young ages, but the bias grows precipitously as the population ages.
The intuition of the Methuselah effect is straightforward. Suppose that a respondent dies at age a, but the death is not matched to the base sample respondent. This causes us to underestimate the numerator in our age a death rate—an inconsequential problem if such an occurrence is relatively rare. But then we will overestimate the denominator of the age-specific mortality rate for all ages greater than a. As a cohort ages, the fraction of missing deaths rises relative to individuals who actually remain alive, and the Methuselah respondents inexorably come to dominate.2 Mortality estimates thus become progressively more downward-biased, leading mechanically to an inference of mortality rate deceleration. Moreover, if the failure-to-match rate varies by some group characteristic, this form of measurement error will affect group comparisons of mortality. Methuselah error is thus potentially salient for studies of the black-white mortality crossover and the Hispanic paradox because of racial and ethnic differences in failure-to-match rates (Hsu 2012; Lariscy 2011).
In this article, we begin by noting several theories of aging and race-based differences in mortality selection (e.g., physiological degeneration, vitality loss, and frailty). We also provide a brief overview of previous work on measurement error as it pertains to the study of old-age mortality. We then provide a theoretical investigation of the biases introduced by the Methuselah effect, demonstrating how the Methuselah effect generates spurious mortality rate deceleration, and showing that even when mortality rates are higher for one subpopulation (e.g., black men) than a second subpopulation (e.g., white men) at all ages, a mortality crossover can spuriously appear if the failure-to-match rate is higher for the first group than for the second. Finally, we explore the empirical relevance of our argument using two data sources. The first is the NLS-OM, a nationally representative survey of men born in the United States, 1906–1921, which has recently been matched to death records in the U.S. Vital Statistics system and the Social Security Death Index (SSDI). This is a remarkable data set in that the matched data set gives us three potentially useful death reports: two administrative reports on the date of death, and also the date of death reported in the survey (for the more than one-half of respondents who died during survey years). The survey occurred long enough ago that the sample can now be considered nearly an extinct generation. Our second data source is the NHIS, matched to administrative death records, primarily to the NDI but also to Social Security (SS) and Medicaid and Medicare records.
Our empirical work shows that the Methuselah effect falsely creates the inference of mortality rate deceleration. Also, in both data sets, we find that deaths are matched at higher rates for non-Hispanic whites than for blacks or nonblack Hispanics. Failure to account for the resulting measurement error leads us to estimate a black-white mortality crossover at approximately age 85; after we account for measurement error, we find little evidence of a black-white crossover. Similarly, any evidence of a Hispanic mortality advantage disappears after we account for the Methuselah effect.
The Current Literature on Old-Age Mortality
Mortality Deceleration
Late-life mortality deceleration—a phenomenon in which mortality hazard rates tend to stabilize at advanced ages—is an empirical prospect that stands in contrast to a hazard that increases exponentially, as characterized by the Gompertz law. Mortality deceleration has been observed in insects, but it is controversial in mammals (Gavrilov and Gavrilova 2011). Gompertz (1825) first proposed late-life mortality deceleration in human aging, and Greenwood and Irwin (1939) observed it in humans. Others disagree. For instance, Gavrilov and Gavrilova (2011) concluded that mortality deceleration is negligible up to the age of 106 and suggested that the Gompertz law provides a good fit.
Gavrilov and Gavrilova (2011) suggested that several factors might contribute to a spurious finding of mortality deceleration; when they adjusted for each factor, the fit with the Gompertz law improved. First, old individuals may exaggerate their ages, which reduces apparent mortality. Second, the use of discrete-time hazard estimation methods, rather than instantaneous hazard rate analysis, can be a problem. Third, issues arise when multiple birth cohorts are combined if those cohorts have differing age-specific mortality rates. Finally, the use of cross-sectional data, rather than cohort data, can lead to estimated deceleration. Gavrilov and Gavrilova (2011) showed that there is no evidence of mortality deceleration in analyses that follow individual birth-year cohorts through extinction.
Race-Based Mortality Crossovers
In the United States, a black-white mortality crossover has been found for both men and women in a number of studies (Arias 2006; Johnson 2000; Kestenbaum 1992; Lynch et al. 2003; Parnell and Owens 1999). In this crossover literature, mortality rates are found to be higher for blacks than for whites at younger ages, but they are lower for blacks than whites at older ages. The crossover appeared, for instance, in the 1910 U.S. life tables, with the crossover occurring at age 79 for men and age 78 for women (Department of Commerce 1921). The age of the mortality crossover now appears to be closer to age 85, and evidence suggests that the age of the crossover has been increasing across birth cohorts (Lynch et al. 2003; Masters 2012). As we noted earlier, demographers disagree on whether the crossover is the result of selection or bad data.
Beginning with the pioneering work of Vaupel et al. (1979), there has been widespread appreciation that population heterogeneity in the susceptibility to mortality (i.e., “frailty”) can lead to a surviving population being positively selected for survival (Lynch et al. 2003; Manton and Stallard 1981; Manton et al. 1984; Nam 1995; Vaupel and Yashin 1985). Under this theory, individuals who survive to older ages are especially robust, given that higher mortality across the life course has culled those who are relatively frail. In the study of black-white differences, this theory postulates that because blacks have higher mortality rates than whites earlier in life, older blacks are more robust, on average, potentially explaining why blacks have lower mortality rates than whites at old ages.
An alternative explanation emphasizes how poor data quality biases estimates of older-age mortality rates (Coale and Kisker 1986; Preston and Elo 2006; Preston et al. 1996, 1998). To date, the principal data quality issue investigated has been the tendency for individuals to overstate their age, especially at older ages. This phenomenon appears especially pervasive among black Americans (Preston and Elo 2006). For example, Preston et al. (2003) matched 2,990 death certificates for blacks aged 60 or older in 1980–1985 to corresponding census records from childhood, finding that only 45 % of women and 51 % of men had an age at death reported on the death certificate consistent with the recorded census age. An incomplete Vital Statistics system, especially in the South in the early twentieth century, left many older black Americans without a birth certificate and hence a known age, likely contributing to the high levels of age misreporting among older black Americans.
Proponents of the frailty view point to several findings. First, some studies have found that correcting for age misreporting changes the age of the mortality crossover but does not eliminate it (Lynch et al. 2003; Preston and Elo 2006). Second, scholars using survey data, such as the NHIS matched to the NDI, have argued that this mitigates the issue of age misreporting at old ages because age is taken from a survey administered years before death (Eberstein et al. 2008; Lynch et al. 2003). Third, evidence suggests that the mortality crossover occurs in some causes of death but not others. For example, Eberstein et al. (2008) reported a notable crossover for deaths by heart disease but no evidence of a crossover for deaths by malignant neoplasms or by diabetes. This led these authors to conclude, “Although age misreporting probably contributes to some of the discrepancy in age trends of mortality rates for Whites and Blacks (less so based on the NHIS-NDI dataset), it seems unlikely that it can be a major contributor to mortality crossovers. That would necessitate an age-misreporting pattern that varied by cause of death in a peculiar manner” (p. 226). Finally, Masters (2012) suggested that adjusting for cohort of birth can eliminate the mortality crossover. This is consistent with the frailty point of view if differential mortality selection for blacks versus whites has been declining over the course of the twentieth century, perhaps because of declining racial economic inequality.
General Empirical Approaches to Estimating Mortality
In the vast literature on the estimation of mortality, the classic approach to the estimation of period age-specific mortality rates (ASMRs) entails estimation of two objects—target population cell estimates (for the denominator) and corresponding estimates of deaths (for the numerator)—from two data sources. This procedure is used, for example, in the construction of the U.S. life tables (e.g., Arias 2006). At most ages, population estimates are formed using census data, and mortality is estimated using data from the National Vital Statistics System, although at older ages (ages 85 to 99), estimation also relies on data from the insured Medicare population.
There are well-known concerns about the “two-sample issue,” some of which are relevant to the black-white crossover and the Hispanic paradox. For example, in the study of mortality in the Hispanic population, a central concern is that individuals in household surveys record their own ethnicity; comparatively, in vital statistics data, a proxy reports ethnicity (Swallen and Guend 2003). If Hispanics are occasionally misreported as non-Hispanic white in vital statistics data, this would elevate estimated ASMRs for non-Hispanic whites while depressing them for Hispanics. As for the black-white crossover, the reporting of age at death by a proxy, combined with mortality rates that rise with age, can lead to an overstatement of age at death in vital statistics records (Coale and Kisker 1986; Myers 1978). For example, Preston et al. (1998) showed that even if age misreporting is symmetric, because mortality rises with age, a relatively larger fraction of individuals recorded in any five-year age interval in the vital statistics will generally be younger than they appear.3
A potential solution to the “two-sample issue” is to estimate mortality rates from a single prospective study in which age is recorded when respondents are young and respondents are followed until death. If race and ethnicity are reported in the prospective study, no reassessment is needed at death. A further potential advantage of prospective studies is that the documentation of date of death can often be directly established from respondents’ contacts—typically family members who help data collection agencies track the respondents’ whereabouts.4
More generally, deaths of respondents in the base study are often recorded by linking prospective survey data to subsequent mortality records: for example, records in the NDI, which are thought to be highly accurate. The NHIS-NDI data are an example. As we noted earlier, researchers such as Cowper et al. (2002) have highlighted the high quality of these data. Even so, concerns have been raised about the extent to which mortality estimates might be affected by the match quality between the data sources (Patel et al. 2004). Matching is done on the basis of multiple factors, including SS number, first and last name, middle initial, date of birth, and other factors. The best match is determined by a probabilistic match score, which sums a set of weights assigned to each of the items on which records are matched; however, there is no guarantee of complete match accuracy.
Lariscy (2011) investigated how match quality between the NDI and the 1989–2006 NHIS affected inference about mortality differences between Hispanics and non-Hispanic whites. He had two central findings: (1) the quality of matches is poorer for Hispanic Americans than for non-Hispanic white Americans; and (2) inferences about the Hispanic paradox depended on the cutoff criteria used in the match score when determining the acceptability of a data match. For example, for foreign-born Hispanics, relaxing and tightening the NCHS-recommended cut point for match acceptance resulted in substantial changes to mortality risk relative to U.S.-born non-Hispanic whites. Lariscy (2011) also found that among the oldest individuals, foreign-born Hispanics are less likely to die during follow-up than U.S.-born non-Hispanic whites, regardless of the matching standard he applied.
Our work is very much in the spirit of Lariscy (2011). We proceed first with a theoretical investigation of measurement error induced by the failure to match death records. We then turn to an empirical evaluation using NLS-OM data and NHIS data, in both cases matched to administrative death records.
Estimation of Mortality in Longitudinal Studies: A Theoretical Analysis
A Single Population
The realized proportion of unobserved accumulated deaths is , again with .
The first term of the far right of Eq. (7) is smaller than ht if there have been any missing deaths matches prior to time t. The second term further reduces the observed hazard rate if there is at least one missing death in the current period. Hence, as Hsu (2012) argued, this type of measurement error results in an underestimation of the hazard rate, and it does so in a way that grows over time. In particular, as a cohort becomes aged, the stock of accumulated deaths, Dt – 1, grows, which means that the denominator in Eq. (7) increasingly diverges from the true number of survivors, Nt – 1. As a mechanical matter, then, the observed hazard shows spurious deceleration.
This standard error is increasing in the hazard rate up to ht = 0.5 and is decreasing in Nt – 1. Thus, as long as the hazard rate is below 0.5, the measurement error will incorrectly reduce our estimate of the standard error of the hazard rate because and . Our measurement error, therefore, typically leads to an overstatement of the precision of our hazard rate.
Comparing Mortality Across Populations
The observed difference understates the true difference when and , which is likely, given that αb > αw. Also, using logic from Eq. (8), the estimated standard error of the differences will typically be understated.
Figure 1 illustrates via simulation. Panel a shows two populations: one with a 40 % higher mortality rate at every age. We introduce Methuselah error to the high-mortality group at rates of 1 %, 5 %, 10 %, and 15 %, in each case generating a spurious crossover. The age of the crossover is decreasing in the error rate. In panel b, we introduce a 4.2 % match error rate in the higher-mortality group and a 1.8 % error rate in the low-mortality group, choosing these rates because they correspond to the lowest error rates we find, respectively, for blacks and whites in the NLS-OM data. Again, we observe a spurious crossover.
Our results are potentially germane to both the black-white mortality crossover and to the Hispanic paradox, given that death matching accuracy rates are higher for non-Hispanic whites than Hispanics and blacks, as we show later herein.
Measuring Death in Two Data Sets
Missing Deaths in the National Longitudinal Survey: Older Men Cohort
Our empirical analysis proceeds with the NLS-OM data. Initiated in 1966, this is a nationally representative survey of men, largely those in the 1906 to 1921 birth cohorts. The Census Bureau, which conducted the interview for the Older Men Cohort, reported that as of 1990, 2,660 of the 5,020 men were deceased; we refer to this as the “survey” report. Given concerns that some deaths may have been missed in the data collection process, the census attempted in 1990 to match respondents to their vital statistics death certificate records. The match was possible because the Census Bureau had respondents’ names, date of birth, and SS numbers. This was a painstaking undertaking; given resources available at the time, there was a hand-match in which Census Bureau employees attempted to hand match data directly to their death records. This gives researchers two possible reports of a death by 1990. As we discuss shortly, the vital statistics 1990 match (the “VS90 match”) matched only 2,083 of the 2,660 deaths.
Our research team was interested in updating the mortality records in this cohort, and so we contracted with the Census Bureau to match the respondents with their death certificates through 2008, using vital statistics records, giving us deaths through the age of 102 years for the 1906 birth cohort (the oldest cohort) and through the age of 87 for the 1921 birth cohort (the youngest cohort). This new match was done electronically, and we expect computers to be better at matching than humans. Our match, the “VS08 match,” yielded 2,749 deaths as of 1990—666 more than the VS90 match and 89 more than the survey report.
In addition, the Census Bureau matched the Older Men cohort to the SSDI file, which recorded deaths through the end of 2012.5 Through 1990, the Social Security Administration (SSA) report (the “SSA12 match”) produced 2,407 deaths. The SSA12 match has 324 more deaths than the VS90 match, but it is 253 short of the survey report and 342 short of the VS08 match. These reports are broken out by the race and ethnicity of the respondents in Table 1. We believe that this table represents compelling evidence that the VS08 match and the survey report are the highest quality. The VS90 match clearly failed to identify numerous deaths that appear in the other three sources, which is hardly surprising given it was a hand match. The SSA12 match also caught fewer deaths than either the survey report or VS08 match. Given that we wish to measure deaths up to 90 years of age, we focus on the VS08 match.
Because survey houses are reluctant to accept a report of a death without documentation,6 it is reasonable to believe that there is a very low rate of false positives. This suggests that we may evaluate the quality of the administrative data matches by comparing rates of missing matches when there are death reports in the survey. We pursue this strategy in Table 2, which reveals high rates of likely false negatives—a failure to match when in all likelihood a death has occurred—for both the VS90 and SSA12 matches. The error rate for the VS08 match is smaller in magnitude but is still nonnegligible, in the 2 % to 4 % range. Moreover, there are racial/ethnic differences in errors from all matches, with black respondents appearing to be much harder to match to death records than are nonblack respondents in the NLS-OM.
Finally, we are interested in understanding who is failing to be matched in each of the respective reports. We examine the age of death as recorded in the survey. If there was no death recorded in the VS08 match, individuals were, on average, 59.3 years of age at death. Comparatively, if there was a death recorded in the VS08 match, individuals were instead 67.3 years of age (t test p value < .01). We find similar results for the SSA12 matches: those who were not matched were, on average, 63.9 years old at death, but those who were matched averaged 67.7 years old at death (t test p value < .01). These results suggest that failures to match are occurring disproportionately at earlier ages. Such failures to match make particular sense for the SSA12 match because most individuals did not start to receive SS benefits until age 65 (unless they took early retirement at 62).
Black-White Mortality Crossovers in the NLS-OM
In the top panel of Fig. 2, we plot the hazard curves using the SSA12 match. We provide nonparametric estimates of the black-white gap in hazard rates in panel A of Table 3. Consistent with numerous previous studies, we find a black-white mortality crossover; it occurs at age 76 and becomes statistically significant at age 80. Of course, from our earlier analysis, we know that the SSA12 match misses a substantial fraction of death reports and we know that the fraction missed is relatively higher for black respondents. Given our theoretical arguments, we have concerns that the observed crossover could be a consequence of measurement error.
In the bottom panel of Fig. 2, we instead use the VS08 death reports to estimate the hazard curves, and we again show the corresponding nonparametric estimates in Table 3. With these data, which have fewer false negatives, the estimated black-white crossover occurs at age 83, and this gap is not statistically significant at any age above 83. The improvement in the data quality has a significant effect not only on the absolute levels of estimated mortality but also on inferences about the black-white gap. As shown in Fig. 1, though, even the low error rates of the VS08 matching (which are correlated with race/ethnicity) can generate a spurious mortality crossover. We return to this issue in our upcoming model-based analysis.
Missing Deaths in the National Health Interview Survey
We next study mortality using the NHIS, accessed through the Integrated Heath Interview Series at the University of Minnesota (Minnesota Population Center 2016). As noted earlier, these data have been referenced as the gold standard for the study of mortality rates (Cowper et al. 2002). We select respondents from the 1986 to 1989 surveys who were at least 85 years old at the time of the interview. As a part of the NHIS, the National Center for Health Statistics (NCHS) matched participants who were at least 18 years old at the time of the interview to death records, primarily to the NDI but also to the SS and Medicaid and Medicare records (for a description, see NCHS 2013). The NCHS matched respondents to their death records through 2011.
Our sample comprises 3,736 individuals. As expected from this sample of older individuals, it is mostly female, at nearly 68 %. We begin our analysis by estimating the nonparametric probability of death by age group for non-Hispanic whites and blacks. Because there are only 80 Hispanics in the sample, we do not attempt to estimate the probability of death for this group. In the 1980s, the public use files of the NHIS top-code age at 99, so we drop the 44 cases with a reported age of 99 in 1986–1989. This results in an analysis sample of 409 blacks and 3,203 non-Hispanic whites.
In Fig. 3, we report surprising results. The mortality hazard rates of both blacks and non-Hispanic whites increase until the age of 100 but then experience a decline from ages 101 to 105 (the oldest age that we analyze). By way of comparison, we plot the probability of death by age from the 2013 SSA life tables for females. Because 30 % of our sample is male and this group is from older cohorts, we expect the SSA estimates to be a lower bound for our estimates, but they diverge after approximately age 94. Moreover, our estimated age profiles, which feature clear mortality deceleration, look suspiciously like our simulations of the Methuselah effect (Fig. 1).
To ascertain whether the type of measurement error we document in the NLS-OM cohort also afflicts the NHIS study, we pursue a new strategy based on the “extinct generation” argument (Kannisto 1994), using the full sample of 3,736 individuals. Because our sample has a minimum age of 85 in 1989, the minimum-aged individual in our sample by 2011 is 106. Almost everyone in these cohorts should be deceased. To estimate the number of people who should likely have survived through year 2011, we calculate the expected number of survivors from the 2013 Social Security life tables for females. We expect 5.3 non-Hispanic whites to survive, 0.69 blacks to survive, and 0.13 nonblack Hispanics to survive (see Table 4). These estimates are optimistic for two reasons. First, 30 % of our sample is male, and males have higher mortality. Second, the 2013 life tables use data from younger cohorts, and there has been a secular decline in mortality.
We present results in Table 4. Rather than the 5.3 whites that we expect from the life tables, we have 216 non-Hispanic whites. Similarly, for blacks, we have 45 respondents rather than the estimated 0.69. Finally, for nonblack Hispanics, we have 15 “survivors” rather than the expected 0.13. Clearly, many individuals died but had unreported deaths. Importantly, the rates of this Methuselah effect vary by race/ethnicity. Among non-Hispanic whites, the apparent “survivor rate” to age 106 is 6.7 %; among blacks, it is 10.7 %; and among nonblack Hispanics, 18.8 % survive.
Estimating Parametric Models of Mortality in the Presence of Matching Error
An Econometric Strategy
As our theory and simulations show, accounting for matching error is essential to the accurate estimation of the hazard rate of mortality at older ages. We have also seen that in the NLS-OM and NHIS-NDI data matches, these errors are consequential. We proceed with a simple empirical model designed to allow for the type of matching error we have described. Here we do not strive to find the best model of matching error; rather, our intent is to demonstrate that even simple models of matching error can profoundly affect estimated old-age mortality profiles. Our model provides insight and demonstrates a potential empirical way forward.
Our basic approach, which estimates the age-specific hazard rate using discrete-time methods, has a long track record in demography (Allison 1982). Allison suggested reformatting the data so that individuals have an observation for each period in which they are at risk, and then parameterizing the hazard as a logistic model. As Allison discussed and demonstrated empirically, this model is not the exact equivalent of a proportional hazard model. In practice, however, the difference between the two models is likely to be trivial, and the logistic model converges to the proportional hazard model as the time interval becomes small.
Our innovation is to modify the likelihood function to account for missing death record matches—that is, Methuselah cases. We work with the case in which the probability of missing a death is constant across ages, although generalizations to more complex cases are possible. Suppose the probability of missing a death is α. In this case, “observed survivors” are a mixture of “true survivors” and Methuselah cases. Similarly, “observed deaths” account for only a fraction of “true deaths.”
When α = 0, this expression, of course, simplifies to the standard likelihood function Eq. (11).
To implement our estimator, we use Allison’s (1982) familiar logistic form, setting ha to be a function of such covariates as age, race, and (where applicable) Hispanic ethnicity.
There are two ways to estimate our model. First, we can pick parameters to minimize the logarithm of the likelihood function Eq. (16). Second, a simpler alternative is possible if age is large enough for all observations. In this latter case, every apparent survivor is a Methuselah case. Thus, we can simply remove these records and then set α = 0 for the remaining records—that is, simply estimate a logistic regression on records that have a recorded death. This is a variant of the extinct generation estimator first used, according to Preston et al. 2001, by Vincent (1951). An extinct generation approach appears in a number of subsequent studies (e.g., Gavrilov and Gavrilova 2011).
Results
We first implement our econometric strategy with men in the NLS-OM cohort (who were aged 52 to 59 in 1966). We estimate the model for non-Hispanic white men because this is the largest of our demographic groups. We use the SSA12 match because that version of the data has the most acute problem with Methuselah error and thus presents us with the most difficult inference problem.7 As reported in Table 2, Methuselah error is on the order of 11 % for the non-Hispanic white men in the sample. We estimate Eq. (16), using a baseline hazard that is quadratic in age. As a technical matter, when setting up the maximization routine, we implement a standard strategy that ensures that the matching error rate is between 0 and 1.8
In the public release of the data, all observed deaths are recorded, but the age of death is top-coded at 90 (i.e., the age of death is provided only among those dying at age 89 or younger). By 2012, individuals in the cohort are aged 98 to 105, at which age nearly all men have likely died; therefore, it is reasonable to also implement our extinct generation estimation procedure.
Using the measurement error (ME) model (Eq. (16)), we estimate α to be 0.105. This is very close to the error that we calculated directly, as reported in Table 2. The clearest way to illustrate the hazard estimates is using plots. Figure 4 plots hazard curves for the ME model and the extinct generation model, as well as for the naïve logit model, which makes no correction for Methuselah error (see the online version of the article to view the figure in color). As expected, the ME and extinct generation models give very similar inferences, and estimated mortality is substantially higher after we correct for Methuselah error. It is notable that the extinct generation approach yields estimates that are more precise than the ME model; this is sensible because the former model makes assumptions that are more restrictive.9
We next apply our empirical methods to the NHIS-NDI data. These data have a number of advantages relative to the NLS-OM data. First, the data set is much larger than the NLS-OM data set, so we can proceed with a model that estimates mortality separately for non-Hispanic white, black, and Hispanic samples.10 More importantly, we can evaluate mortality post age 90 because the data are not top-coded. Finally, this exercise is important given that these data are often used to estimate mortality models.
To maintain comparison with the NLS-OM, we focus on men in birth cohorts 1907–1914 who appear in the 1986–1999 NHIS. These are men who were aged 72 to 92 in these years of the NHIS. Our hazard estimation now begins at age 72, the youngest age we observe the 1907–1914 cohort in the NHIS, and continues through age 105, the oldest potential age in 2012, which is the last year available to us in the NHIS data. We estimate a logit model with a baseline hazard that is quadratic in age and fully interacted with race/ethnicity indicator variables.
We report results in Table 5. We estimate nonnegligible Methuselah error rates for all groups. The error rate is lowest for non-Hispanic whites, higher for blacks, and higher yet for Hispanics. Using estimates of Methuselah error rates, along with the fraction of the sample for which a death was not observed in the NHIS data match, we can calculate the fraction of deaths that were not observed because of matching error. For whites, the fraction of observations for which a death was not recorded in the 2012 match is 0.0488; because our estimated Methuselah error rate is 0.037, this suggests that 76 % (0.037 / 0.0488) of observed surviving whites are Methuselah cases. For blacks, the fraction of individuals for whom a death was not recorded is 0.0792, suggesting that 90 % of observed surviving blacks are Methuselah cases. Finally, among Hispanics, the fraction of individuals for whom a death was not recorded is 0.1192, suggesting that 92 % of observed Hispanic survivors are Methuselah cases.
We observe that for each racial/ethnic group, the estimated coefficient on age squared is close to 0. There is no evidence of mortality deceleration.
In Figs. 5 and 6, we plot our estimated hazard curves to explore how our inferences differ from the naïve approach. Thus, Fig. 5 displays the hazard rates for blacks versus non-Hispanic whites—first with the naïve logit model and then with our ME model. In the NHIS cohorts, a naïve model (which does not adjust for matching error) shows a black-white mortality crossover at approximately age 85. A very different pattern appears when we correct for matching error. At no age is the estimated mortality hazard lower for blacks than for whites. In Fig. 6, the naïve analysis shows Hispanic men having a mortality advantage, especially at older ages. However, we find no statistically significant differences between Hispanic and non-Hispanic white mortality rates after we correct for differential measurement error. In these data, both the black-white crossover and the Hispanic paradox appear to be due entirely to the Methuselah effect.
Concluding Remarks
The study of old-age mortality holds an important place in demography. Observed statistical patterns have motivated a large literature that explores plausible biological mechanisms that underlie human aging. Among the key issues that have occupied scholarly attention are whether (1) the mortality rate decelerates in old age; (2) there is a black-white mortality crossover, with blacks having lower mortality at advanced ages; and (3) there is a paradoxical Hispanic advantage in mortality.
We explore these issues using data from two studies, the NLS-OM and NHIS, both of which have been matched to administrative death records. Our key contribution is a careful analysis of a pernicious form of measurement error: the failure to record some deaths. We show that both data sources contain a nonnegligible amount of such measurement error and that error rates differ by race and ethnicity. Failure to recognize the error and correct for it could lead us to believe in mortality rate deceleration, the black-white mortality crossover, and the Hispanic paradox. Correction for the error reverses all these inferences.
We recognize that other studies, especially those on animal populations, do not rely on data matching, and there is evidence of mortality deceleration in nonhuman populations. Having said that, our analysis leads us to be skeptical of evidence about deceleration in the mortality of humans.11 Similarly, black-white differences in match rates seem to be a natural feature of data that demographers often use, and we have seen that a logical consequence of this error structure is to generate a spurious black-white mortality crossover. Similar logic pertains to the Hispanic paradox.
Although we have made some progress in addressing the Methuselah effect, substantial issues remain. Our model is very simple; we assume that the only systematic factor affecting error is racial/ethnic group. Improvements in the Vital Statistics system likely mean that matching error varies also by birth cohort. Also plausible is that matching error may be a function of age at death because data matching for deaths at unusual ages may be easier than matches at common ages. Our analytical structure could be adapted to account for these features.
In addition, we have not addressed an alternative source of relevant measurement error: the misreporting of age. In the NLS-OM and the NHIS-NDI, age misreporting is probably not as common as in other data because age was collected from individuals themselves, typically many years before death (Eberstein et al. 2008; Lynch et al. 2003). Nonetheless, some failure to match may be due to age misreporting in the survey or death records. If such errors depend only on race/ethnicity, the extinct generation variant of our model properly addresses the problem because the approach simply drops these cases, but more complex cases are possible. If the age-misreporting process can be systematically modeled by demographers, perhaps methods we develop here can be adapted to reduce biases in a more general setting.
A great deal more work lies ahead for demographers interested in the accurate assessment of mortality at older ages.
Acknowledgments
We gratefully acknowledge support from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (RO1 HD062747). We thank the editors, three anonymous reviewers, David Card, Amitabh Chandra, James Saxon, Douglas Staiger, Lydia Veliko, and seminar participants at UC Berkeley, Indiana University/Purdue University of Indianapolis, New York University, and Washington University for helpful comments. All errors are our own.
Notes
This law was first quantified in 1939, when Greenwood and Irwin (1939) found that the one-year probability of death at advanced age asymptotically approaches a limit of 0.44 for women and 0.54 for men. Such a law suggests that there may be no fixed upper limit to human longevity and thus no fixed maximal human lifespan (Gavrilov and Gavrilova 2006; Thatcher et al. 1998).
One could imagine an error process operating predominantly in the other direction, if deaths are recorded that do not occur. This could be called the “Twain effect” (per Mark Twain’s suggestion that “the report of my death was an exaggeration”). The Twain effect would cause underestimation of the denominator in subsequent years. Here again, consequent error would progressively worsen as the cohort ages. Later, we argue that this effect is not important in our application.
Preston et al. (1998) showed that when age is recorded in single years earlier in life (instead of five-year intervals), this virtually eliminates biases in ASMRs from age misreporting.
Yet another advantage of this approach is that longitudinal surveys often collect data on family and other contextual variables, which then allows researchers to assess relationships between early-life factors and subsequent mortality.
Readers may wonder why the vital statistics match used data through 2008 whereas the SSA Death Index is through 2012. Although we informally refer to the Census Bureau as “matching on the respondent’s Social Security number (SSN),” this is true only in an indirect sense. The Census Bureau matches on a personal identification key (PIK), which is a number that has a one-to-one match with the respondent’s SSN, to protect the confidentiality of respondents. All files used for matching must, therefore, contain this unique PIK for matching, a process known as “being PIKed.” At the time of our match, the most recent PIKed file from vital statistics was through 2008, whereas the files from SSA were through 2012.
For instance, in collecting data for the current NLS cohorts (the NLSY 1979 and NLSY 1997 cohorts), NORC requires interviewers’ proxies to provide either an obituary or a death certificate before recording a death.
We also estimated the measurement model with the VS08 match, and it performed equally well with those data.
Programs are available from the authors. Our procedure for limiting the measurement error to be between 0 and 1 entails the use of a logistic error model, specifying error to be α = eγ / (1 + eγ) and then allowing γ to differ for whites and blacks.
Specifically, the extinct generation approach stipulates that all missing deaths are Methuselah error, while the ME model estimates the rate of Methuselah error.
The number of person-years in our NHIS-NDI analysis is 119,007.
Biological models that allow for mortality deceleration in drosophila and other nonprimates may be quite different than the human biology of aging. Older theories, such as those that posit limits to cell replication, should perhaps be reconsidered.