Does education change people’s lives in a way that delays mortality? Or is education primarily a proxy for unobserved endowments that promote longevity? Most scholars conclude that the former is true, but recent evidence based on Danish twin data calls this conclusion into question. Unfortunately, these potentially field-changing findings—that obtaining additional schooling has no independent effect on survival net of other hard-to-observe characteristics—have not yet been subject to replication outside Scandinavia. In this article, we produce the first U.S.-based estimates of the effects of education on mortality using a representative panel of male twin pairs drawn from linked complete-count census and death records. For comparison purposes, and to shed additional light on the roles that neighborhood, family, and genetic factors play in confounding associations between education and mortality, we also produce parallel estimates of the education-mortality relationship using data on (1) unrelated males who lived in different neighborhoods during childhood, (2) unrelated males who shared the same neighborhood growing up, and (3) non-twin siblings who shared the same family environment but whose genetic endowments vary to a greater degree. We find robust associations between education and mortality across all four samples, although estimates are modestly attenuated among twins and non-twin siblings. These findings—coupled with several robustness checks and sensitivity analyses—support a causal interpretation of the association between education and mortality for cohorts of boys born in the United States in the first part of the twentieth century.
The association between educational attainment and adult mortality in modern developed societies is well known and virtually universally observed (Elo and Preston 1996; Hummer and Hernandez 2013; Hummer and Lariscy 2011; Kitagawa and Hauser 1968, 1973; Lleras-Muney 2005; Phelan et al. 2004; Preston and Taubman 1994). Sizable educational gradients in longevity and cause of death have been detected across birth cohorts, in different population groups, and in many social and institutional contexts (Hayward et al. 2015). What is less clear is why these educational gradients exist. Are educational attainment and human survival etiologically linked, such that obtaining more schooling causes people to enjoy lower levels of mortality and longer lives? Or are the two variables related to each other because common endowments influence both, generating a spurious (or partially spurious) association?
Answering these questions is of profound scientific and policy significance. The magnitude of the association between education and mortality is large (Hummer and Lariscy 2011). If education causally affects mortality, investments in schooling could be an efficient and cost-effective means to reducing the longevity penalty that some groups face. On the other hand, if obtaining more schooling on its own does not cause people to live longer, then efforts to reduce mortality differences by increasing education would be of little value (Hummer and Hernandez 2013). In the first scenario, education is a causal variable and could be the target of longevity-enhancing interventions. In the second, it is at least a partial proxy for the actual drivers of human survival.
In this article, we link records across different administrative data sources to create a set of nationally representative longitudinal samples of male-male twins, non-twin siblings, unrelated neighbors, and unrelated non-neighbors. We then use these samples to derive causal estimates of the relationship between education and longevity for the most recent cohort of American men to complete their lifespan. The unique data at our disposal allow us to build up from a conventional covariate adjustment design to a more strenuous test of the causal relationship between education and mortality that accounts for hard-to-observe confounders at several theoretically relevant levels of analysis. By comparing estimates obtained using different estimation strategies and across different strategically selected subsamples of the adult population, we (1) assess the degree to which specific background and contextual characteristics confound the association between education and mortality and (2) evaluate variation in education effects across subgroups defined by their socioeconomic characteristics. We know of no prior work in the United States (or elsewhere) that has carried out such an exercise.
Understanding the origins of educational gradients in health and mortality has long been a priority of America’s research and public health agenda. Causal accounts have traditionally focused on the importance of resources, skills, and knowledge (Baker et al. 2011; Link and Phelan 1995; Mirowsky and Ross 2003; Rogers et al. 2013; Ross and Wu 1996), acquired through education, and then translated through multiple mechanisms into better health behaviors and outcomes (Denney et al. 2010; Hayward et al. 2015; Hummer and Hernandez 2013; Mirowsky and Ross 1998; Phelan et al. 2010). The basic conceptual model that underlies this account—which flows directly from Link and Phelan’s (1995) work on fundamental cause theory and which complements other foundational work in medical sociology (see, e.g., Cockerham 2005)—is summarized graphically in Fig. 1. In the figure, causal arrows connect education, E, to a series of mediating variables: economic and social resources (R), cognitive skills (S), knowledge and other flexible resources (K), and health behaviors (V). These mediating variables in turn influence health (through a variety of more proximate channels not shown) and mortality, M.
Associations between education and mortality could also arise if the two variables share common causes, inducing a spurious (or partially spurious) relationship between E and M (Behrman et al. 2011). Potential confounds include people’s social or economic background, their intelligence, their early-life health, and any other hard-to-observe endowments or contextual exposures that jointly predict educational attainment and survival (Hayward et al. 2015). Adding these variables—labeled B, I, H, and Z, respectively—to the causal diagram specified earlier, as in Fig. 2, opens a series of backdoor (i.e., noncausal) paths that connect E to M (E←B→M, E←I→M, E←H→M, and E←Z→M), raising doubts about the causal nature of the association between the two variables. This concern has led some researchers to question whether education causally affects mortality (as medical sociological theory suggests), or whether the observed association is merely the end state of a more complicated sequence of selection processes (Behrman et al. 2011; Gottfredson and Deary 2004).
Empirically adjudicating between these perspectives is methodologically challenging. Most work has relied on covariate adjustments to rule out possible confounders and isolate (presumably causal) effects (Kitagawa and Hauser 1973). Findings from these analyses have shown that the association between education and mortality is robust to the inclusion of several covariates, including measures of intelligence (Link et al. 2008), race (Montez et al. 2011), childhood socioeconomic status (Montez et al. 2011), and early-life health endowments (Montez and Hayward 2011). If these statistical controls are enough to eliminate the threat posed by omitted variable bias (i.e., all the backdoor paths running from E to M can be closed by conditioning on observed confounds), then the parameter of interest (the effect of education) is identified, and the conditional association between education and mortality can be said to be causal.
Efforts to validate this assumption have taken several forms. One approach is to instrument education using historical information about compulsory school attendance and/or child labor laws, mimicking an experimental setup where exogenous factors sort individuals into different levels of the treatment (educational attainment). Although early proponents of this strategy observed significant (and substantively large) effects associated with education (Lleras-Muney 2005), replication attempts have frequently failed (see, e.g., Black et al. 2015; Mazumder 2008). One reason could be the strength of the instrument. Compulsory schooling laws are, in many cases, only weakly related to variation in educational attainment (and are relevant for only the subsample of students who were induced to obtain additional schooling), making it difficult to estimate effects with sufficient precision (or to make claims about effects for students not on the margins) (Fletcher 2015). Studies that used birth registry data from outside the United States have sought to circumvent this problem by analyzing larger samples (Lager and Torssander 2012; Meghir and Palme 2005), but findings there have been mixed as well. In some cases, researchers have observed sizable declines in mortality for birth cohorts (or jurisdictions) that were compelled to attend school for an additional year (Fischer et al. 2013; van Kippersluis et al. 2011); in other cases, they have not (Braakmann 2011; Clark and Royer 2013; Lager and Torssander 2012; Meghir et al. 2018). Scholars have speculated that these discrepancies could be due in part to heterogeneity in the effects of education across birth cohorts and/or social and political contexts (Hayward et al. 2015), but even this hypothesis has been difficult to confirm.
As an alternative to this approach, a small but growing number of studies have turned to within-twin pair comparisons as a way to difference-out observed and unobserved factors (e.g., B, I, and Z) that could confound the association between education and mortality (see, e.g., Behrman et al. 2011; Lundborg et al. 2016; Madsen et al. 2010). Twins—even those who end up with different levels of schooling—experience very similar social, economic, family, school, neighborhood, and other environmental exposures, and have identical (in the case of monozygotic (MZ) twins) or similar (in the case of dizygotic (DZ) twins) genes. If educational attainment is associated with mortality among pairs of twins that are concordant (or mostly concordant) with respect to these endowments but discordant with respect to their educational attainment, the association is (conditional on several identifying assumptions) less likely to be spurious. All shared genetic and environmental exposures fall out of the model, providing arguably cleaner estimates of the effect of obtaining higher levels of education.
Applications of this strategy have produced intriguing, and sometimes surprising, results. Using a large population-based data set from Denmark that included just over 2,500 identical (MZ) twin pairs born between 1921 and 1950, Behrman et al. (2011) showed that the estimated causal effect of education on mortality in Denmark is reduced to zero when comparing the mortality outcomes of MZ twins who are discordant on education; broadly similar patterns were observed for DZ twins. That the same was not true for pairs of unrelated individuals suggests that shared early-life endowments and exposures—common within pairs of identical twins but not within pairs of unrelated adults—may explain, or partially explain, the existence of educational gradients in mortality. This inference is generally consistent with a noncausal interpretation of the diagram presented in Fig. 2. Other researchers who have used the same Danish data have reached similar conclusions, observing null or attenuated effects when modeling within-twin pair differences in mortality or related health outcomes (Madsen et al. 2010; Osler et al. 2007).
We believe that these findings are important and provocative, but we also see reasons for skepticism. First, it is unclear whether similar findings would hold if the same twin-differencing models were fit using data from outside the Danish context (Hayward et al. 2015; Lundborg et al. 2016). It could be the case that Denmark’s set of social and educational policies render the education-mortality relationship less important than in the United States or other Western countries (Lundborg et al. 2016), where the social safety net is less comprehensive. In fact, the role of the welfare state, levels of inequality, demographic differences, and differences in life expectancy could all contribute to cross-national differences in the education-mortality relationship, as well as variation at the subnational level (see, e.g., Montez et al. 2019). Until now, it has been impossible to consider this possibility (at least as it pertains to the United States) because there have been no large, nationally representative samples of U.S. twins with requisite information about education and mortality. Current U.S.-based twin data repositories (e.g., the NAS-NRC Twin Registry of WWII Military Veterans and the Minnesota Twin Registry) have been used for a variety of research on health gradients (Amin et al. 2015a), but these studies pertain to particular subpopulations (e.g., WWII veterans) or to people in particular geographic regions (e.g., Minnesota or southern California), and only the NAS-NRC data include mortality information for most respondents.
Second, it is not clear that effects estimated using twin-differencing models pertain equally within all population subgroups. Prior twin-based estimates of the effects of education on health and mortality can be thought of as estimates of the average treatment effect (ATE). A recent study by Hayward et al. (2015), however, suggested that the association between education and mortality may be stronger for certain segments of the population—in particular, for men, Whites, and younger adults. These findings are consistent with a growing number of other studies showing that the association between education and health can vary significantly by social background characteristics (Andersson 2016; Bauldry 2014; Conti and Heckman 2010; Ross and Mirowsky 2006, 2011; Schafer et al. 2013).
Variation by social background characteristics could be a byproduct of resource substitution or cumulative advantage processes (Ross and Mirowsky 2011). If the education-mortality relationship is characterized by resource substitution, we would expect to see larger effects among individuals from less advantaged backgrounds because those individuals have fewer alternative resources to fall back on, making education more decisive. The cumulative advantage perspective, on the other hand, predicts the opposite: the effects of education should be greater for the most advantaged individuals, not the least, because the former are in a better position to leverage the health-enhancing potential of education. Although these perspectives lead to fundamentally different predictions about how the effects of education will be distributed, their methodological implications are the same. If the health returns to education vary in meaningful ways across subgroups, estimates of the ATE—which represent a weighted average of all group-specific estimates—would obscure this variation and potentially miss nonzero (or null) effects within certain segments of the population.
Finally, it is not known why twin-based studies have produced results that diverge from findings obtained using more conventional covariate adjustment designs and/or alternative identification strategies. In their conclusion, Behrman et al. (2011:1367) stated that education may serve as “a marker for parental family and individual-specific endowments that are uncontrolled in the usual estimates,” but they did not provide additional information about what those endowments might be. Because twins share the same (or most of the same) genetic, family, neighborhood, and school characteristics, finely grained analyses of individual confounds (i.e., the additional variables included in Fig. 2) are generally not feasible. Although this does not diminish the overall contribution of their research, it does lead to a fairly coarse reduced-form assessment of the underlying causal model.
Data and Measures
We address the aforementioned issues using a unique and untapped data resource: the digitized complete-count U.S. Censuses for 1920 and 1940. With support from the U.S. National Science Foundation and the U.S. National Institutes of Health, and in collaboration with Ancestry.com, the Minnesota Population Center has now finished work on complete-count versions of the 1850–1940 census files (Ruggles et al. 2019). These data are freely available at ipums.org.
From the 1920 U.S. Census, we extracted records for all male children born in the United States between 1910 and 1920 (n = 11,749,361).1 Then, using techniques described shortly, we linked those records to the 1940 U.S. Census, from which we obtained information about educational attainment. Of the 11,749,361 U.S.-born boys in 1920, we were able to confidently and uniquely link 34% (n = 4,153,206) to 1940 census records. Finally, information about age at death was obtained by linking 1920–1940 census records to death records in the NUMIDENT and to the Social Security Death Master File (SSDMF). Of the 4,153,206 males linked across 1920 and 1940, we were able to link 41% (n = 1,720,980) to mortality records. In all, then, we were able to fully and confidently link records for about 14% of the baseline population. Although simple side-by-side comparisons can be misleading (because of differences in the underlying quality of record linkages), we believe that these results stand up favorably to other historical record-linking efforts (see, e.g., Beach et al. 2016; Ferrie 1996). In the following subsections, we (1) describe our record-linking procedures in more detail, (2) compare the characteristics of successfully linked cases with the characteristics of the baseline population, and (3) discuss the strategies we used to make adjustments for nonrandom selection into our fully linked sample.
Linking 1920 to 1940 U.S. Census Records
Linking the 1920 baseline population of 11,749,361 boys to the 1940 census requires that we first define the universe of potential matches. To make the task computationally tractable, we restricted the population of potential matches to records that display identical or similar characteristics on features that should be consistent over time (e.g., gender and place of birth).2 As an example, when attempting to find Michael Corcoran, male, born in Massachusetts according to the 1920 census, we would limit the population of potential matches in 1940 to males who, in 1940, reported Massachusetts as their place of birth. Given that age is reported in the 1920 and 1940 censuses rather than date of birth, and because of reporting inaccuracies, we allowed for deviations in birth year across data sources. Specifically, we stipulated that birth years must be within +/– 3 years, implying that each unique individual who, according to the 1920 census, was born in 1919, will be compared with and possibly linked to individuals who in the 1940 census, conditional on sex and place of birth being the same, were recorded as being born between 1916 and 1922.
Allowing for a broad range of potential matches has advantages as well as disadvantages. One key advantage is the ability to link individuals for whom year of birth in either census was reported, enumerated, or digitized incorrectly by more than a few years. This reduces the chances of false negatives (i.e., rejecting a candidate match that is in fact correct). The main disadvantage is the increased risk of false positives (i.e., declaring a match when the match is incorrect). If our hypothetical Michael Corcoran, born in 1919 and observed in the 1920 census, died at age 2, he would not be enumerated in the 1940 census. If his parents had another male child, born in 1922, and decided to also name him Michael, this identically named but different individual would be one of the candidate matches. More generally, the wider the birth year window, the larger the pool of potential matches and the higher the probability of (1) finding the right individual and (2) making an incorrect link. This is an issue that we return to shortly.
Employing a probabilistic method of record linkage means that an algorithm is trained to recognize patterns in a data set of potential matches that are consistent with a true match. We used a modification of Feigenbaum’s (2016) probit regression approach, which—like other methods of supervised machine learning—requires input from training data. The training data represent a subsample of the population that one wishes to link, but where links have been declared by a trained human in order to ascertain that confirmed links are as accurate as possible. We used the training data not only to calibrate the linking algorithm but also to evaluate how well it performs at declaring matches and avoiding false positives.
To start, we randomly selected 1,000 individuals from the 1920 sample who were linked to a similarly defined universe of possible 1940 matches. Here, the universe of potential matches was limited to cases where the name similarity scores on both the first and last name (using the Jaro-Winkler algorithm) were at least .8 (e.g., “Bertus Wilson” and “Burtis Watson”). We assessed all potential matches using the wealth of digitized historical information available from Ancestry.com. Historical information about parents, siblings, and the focal individual’s place of residence from the time between the 1920 and the 1940 census allowed us to make confident assessments regarding the validity of potential matches, and death records allowed us to cut down on false positives. Using these procedures, we were able to manually declare 50.2% of the training data sample as uniquely matched across 1920 and 1940.
To calibrate our linking algorithm, we implemented a train-test-split procedure using our training data (in which true matches are known). In the first part of the procedure, we split our training data into two equally sized parts. To train the algorithm, we fit a probit regression model on one-half of the sample and then evaluated its out-of-sample performance on the other. The model specification we used is similar to the one proposed by Feigenbaum (2016), but we added additional individual- and household-level covariates to reduce the risk of false positives. Results from the model inform the algorithm as to which, if any, of the plausible set of matches should be considered a valid link. The algorithm declared a unique link based on (1) the greatest similarity between any 1-to-1 match (technically the predicted probability based on the probit regression estimates) and (2) the relative difference between the best and second-best possible match. By looping multiple times over a range of realistic values on both parameters, we were able to choose values for (1) and (2) that optimized the overall performance of the linking algorithm. We judged overall performance by the algorithm’s ability to minimize false positives (incorrectly linked cases) while maximizing true positives (correctly linked cases) and true negatives (correctly unlinked cases).
Linking Census Records to Mortality Records
Our strategy for linking the 1920–1940 census sample to mortality records proceeded in a similar manner, with some unavoidable differences due to variable availability. The primary source for death records was a merged version of the Social Security Administration (SSA) NUMIDENT data files containing Social Security claims data (N ≈ 35,000,000 for both men and women). NUMIDENT includes several pieces of information not provided by the Social Security Death Master File (SSDMF). In particular, it includes state of birth, the person’s gender, and mother’s and father’s first and last names. We used this information to improve the precision of the training data file and create linking features for the machine learning algorithm to use. To provide better coverage of deaths (the NUMIDENT data contains no deaths after 2007 and very few deaths prior to 1973), we supplemented the NUMIDENT records with a secondary source of mortality data: the publicly available SSDMF. The SSDMF—which records mortality information based on reports from funeral directors, family members, financial institutions, the post office, and various government agencies—includes deaths occurring as recently as May 2013 (N ≈ 93,000,000 for both men and women).
In both data files, the first recorded deaths are from the early 1900s, but coverage during the first half of the century—into the 1960s and 1970s—is less complete. A comparison of our death data to published life tables from the SSA (Bell and Miller 2005) suggests that our count of the cumulative percentage dead by 1960 is about 6 percentage points too low, our count of the cumulative percentage dead by 1970 is about 11 percentage points too low, and about 2.5% of our sample should have survived beyond 2013. If we make the reasonable assumption that early deaths were more (less) likely to occur among those with lower (higher) levels of education, then our estimates of the effects of education on mortality should be attenuated toward 0. In supplementary analyses, we evaluated the extent of this bias using a reweighting procedure that aligned our observed distribution of deaths to the distribution inferred from published tabulations (see the online appendix). Results from these analyses suggest that our within-pair estimates may be downwardly biased by as much as 8%, which makes our estimates of education effects necessarily conservative.
To be included in our analyses, both members of pairs of twins, non-twin siblings, neighbors, or unrelated individuals had to be fully linked across the 1920 and 1940 censuses and mortality records. We also dropped a small subset of pairs for whom ages at death were implausible or in which at least one member of the pair was missing information on education. For the twins, displayed in Table 1, this restriction resulted in a sample size of n = 5,216 unique individuals in 2,608 complete twin pairs.3 For our subsamples of random individuals, nonrelative neighbors, and non-twin siblings, the final sample sizes are n = 1,658,836, 1,604,936, and 328,352, respectively.
One concern is that sample selection—occurring either through incomplete record linkage or missing data—results in an analytic sample that differs from the target population in nontrivial ways. Table 1 indeed shows differences by race and geographic region in the likelihood of remaining in sample after our various selection filters are in place (parallel tables for the non-twin subsamples can be found in the online appendix, Tables A1–A3). Non-White individuals and individuals born in the South are less likely to be in the linked sample than in the original 1920 sample, with the opposite applying to Whites and individuals from the northeastern part of the country. Apart from this, there does not seem to be any selection into the linked sample on the basis of observed characteristics.4 As described in the online appendix, in our analyses, we reweight the data to adjust for discrepancies between our analytic sample and the population we are trying to describe. Weights were generated by calculating the inverse of the probability of successful linkage, where the probability of linkage was determined using a simple logit model. Predictors in the model included race, region, family size, and householder’s occupation category.5
This modeling strategy can be modified in three respects to assess the degree (and nature) of omitted variable bias in prior U.S.-based work that uses data on unrelated individuals and more standard estimation procedures. First, we can estimate unpaired ordinary least squares (OLS) models of age at death for our subsample of unrelated males who live in different neighborhoods, and parallel models for our subsamples of neighbors, siblings, and twins.6 In these analyses, we include covariates for family socioeconomic origins, race, family structure and composition, nativity status, and geography; we expect the results to reproduce findings from prior research. Second, we can estimate within-neighborhood fixed-effects models for our sample of unrelated pairs of males who lived in the same neighborhood. These models allow us to consider the degree to which the association between education and mortality is confounded by geographic and neighborhood factors that might be unobserved using a more conventional covariate adjustment design. Third, we can estimate within-family fixed-effects models for our sample of pairs of male-male non-twin siblings. These models assess the degree to which the association between education and mortality is confounded by shared environmental, neighborhood, geographic, family, and genetic conditions, but it is a less stringent test than the within-twin pair analyses because non-twin siblings generally share fewer genetic endowments. Together with the estimates obtained using our sample of twins, these analyses provide useful information about the magnitude of education effects on mortality and the role played by different sets of theoretically relevant confounds that are typically hard to observe directly.
In Table 2, we present key descriptive statistics for education and mortality variables for each of the four analytic subsamples: unrelated non-neighbors, unrelated neighbors, non-twin siblings, and twins. The mean years of education were 10.13 to 10.40 across analytic samples. Within pairs, the rate of educational discordance—the percentage of pairs in which the two differed in their years of schooling completed—was high among unrelated non-neighbors (84%) and unrelated neighbors (80%), lower among non-twin siblings (63%), and lowest among twins (40%). Likewise, the mean absolute difference in years of schooling within pairs was highest among unrelated non-neighbors (3.2 years) and unrelated neighbors (2.7 years), lower among non-twin siblings (1.6 years), and lowest among twins (0.9 years). The mean age at death was between 74.8 and 76.0 across the four groups.7
In Table 3, we present estimates of the effect of education—expressed as years of schooling completed—on age at death. All models, as noted earlier, were weighted to account for differential probabilities of selection into the final linked sample.8 In the four leftmost columns of results, we present unpaired (OLS) models for unrelated non-neighbors, unrelated neighbors, non-twin siblings, and twins. In these models, we treat members of each pair as individuals and ignore pair structures (standard errors are clustered at the pair level to account for nonindependence). All unpaired models adjust for the demographic variables listed in Table 2, plus state of residence in 1920. As expected, there is a positive and significant association between education and age at death. For each additional year of schooling an individual completes, they live about four-tenths of a year (or 4.8 months) longer, on average. The fact that this result is so consistent across subsamples suggests that when unobserved similarities are ignored, twins and non-twin siblings are unremarkable relative to each other and relative to subsamples composed of unrelated individuals. We take this as a sign of external validity.
In the right four columns of results in Table 3, we present paired models—corresponding to Eq. (3)—for pairs of unrelated non-neighbors, unrelated neighbors, non-twin siblings, and twins. The model for unrelated non-neighbors adjusts for the same set of covariates as the unpaired model. The model for unrelated neighbors makes the same adjustments while also differencing out all aspects of the neighborhood environment that neighbor pairs have in common. The model for non-twin siblings and twins difference out all aspects of the shared neighborhood environment, all aspects of the shared family environment, and any other endowment both members of the pair possess. For unrelated non-neighbor pairs and unrelated neighbor pairs, the within-pair estimates of the effect of years of schooling on age at death (= 0.40 in both cases) are about the same as the unpaired versions (= 0.39 in both cases). For siblings and twins, the estimates are modestly attenuated (= 0.34 and 0.35) but are still nonzero and significant.9 These estimates suggest that a conventional covariate adjustment design may modestly overstate the magnitude of the education-mortality relationship, insofar as it omits important but hard-to-observe family or individual endowments, but that the causal path from education to survival remains intact.
In Table 4, we repeat the analyses in Table 3 using a categorical parameterization of education to allow for possible nonlinearities (Montez et al. 2012). Here, people are classified as having completed either fewer than 12 years of schooling or 12 years of schooling or more, which corresponds to the margin between high school completion or not.10 The results are virtually identical to those presented previously. In the unpaired models, there are sizable and significant differences in life expectancy by level of education. Compared with those who did not complete secondary school, those who did lived between 2.2 years (the unrelated non-neighbor, unrelated neighbor, and non-twin sibling subsamples) and 2.5 years (the twin subsample) longer, on average. We see the same thing in the paired fixed-effect estimates but with attenuated coefficients for non-twin siblings and twins. Once we difference out everything twins and siblings have in common, the coefficient for completing at least 12 years of schooling is reduced to approximately 1.6. All coefficients retain their significance at the p < .10 level or better.
Limitations and Robustness Checks
The results presented to this point provide evidence of a relationship between education and survival, but justifying a stronger causal interpretation of our estimates requires certain assumptions. Questions about measurement error, residual within-pair variation, and outliers have all been raised in response to prior studies, especially those that involve twins (Boardman and Fletcher 2015; Bound and Solon 1999; Gilman and Loucks 2014; Kaufman and Glymour 2011). To assess the sensitivity of our estimates to these concerns, we carried out a series of additional robustness checks.
Random Measurement Error
Attenuation bias is more pronounced in fixed-effects models because of the weaker signal-to-noise ratio (Ashenfelter and Krueger 1994; Griliches 1979). For our purposes, this leads to (1) an increased chance of making a Type II error as we move from an unpaired (OLS) estimator to paired fixed-effects models; and (2) possible underestimates (but not overestimates) of the true effect of education on length of life—particularly in our within-pair twin models, where the signal we wish to detect is at its weakest. In principle, we would be more concerned about this issue if we saw evidence of substantial attenuation across subsamples (e.g., an estimate centered over zero for twins and a positively signed nonzero estimate for siblings), but this is not the case. When we formally compared the fixed-effects estimates obtained for twins (where attenuation resulting from measurement error should be more pronounced) and non-twin siblings (where attenuation should be less pronounced), we were unable to reject the null that they are equal (p = .82). The same was true when we compared estimates for neighbors to estimates for unrelated pairs (p = .81). We interpret this to mean that measurement error is likely to be minimal.
A second and more important concern relates to unobserved differences within pairs of twins. The model specified in Eq. (3) differences out all characteristics that are shared by both members of a twin (or non-twin) pair, but it does not account for characteristics that vary between members within pairs. This could bias our estimates if there are unobserved individual-specific factors, Zij, that are correlated with both amount of education completed and longevity. Differences in childhood health, intelligence, personality characteristics, and/or genetic endowments (because of the presence of DZ twins in our twin sample) are all possibilities.11 The direction of the bias depends on the nature of the relationship. If Zij correlates with schooling and longevity in the same way (e.g., rz,x > 0 and rz,y > 0), our estimate of β will be an overestimate of the true education effect. If the three variables correlate in opposite ways (e.g., rz,x > 0 and rz,y < 0), it will be an underestimate (Kohler et al. 2011). To evaluate the risk that such bias poses for our analyses, we carried out a simple Monte Carlo–style simulation study. In the simulation, we randomly generated an unobserved variable, Z, whose correlation to years of schooling and age at death followed a prespecified structure (we allowed the pairwise correlations to run from –.30 to .30 in increments of .10). We then added Z to our within-twin pair specification, collected the resulting point estimate for years of schooling, and averaged across 1,000 replications to stabilize the results.12
Findings from this exercise, which we have arranged into a simple matrix, are provided in Table 5. In scenarios where the unobserved confound is unrelated to age at death (rz,y = 0), education (rz,x = 0), or both (rz,y = 0 and rz,x = 0), we see little to no movement in our point estimate relative to the estimate presented in Table 3. This makes good intuitive sense given that Z is not a confound under these conditions. In scenarios where Z and years of schooling are positively (negatively) related but Z and age at death are negatively (positively) related, we see evidence of a suppression effect that ranges in magnitude according to the strength of the correlations. We are less concerned about this possibility because it is difficult to think of an unobserved variable that exhibits this correlation structure. What we are more concerned about is the final scenario where Z correlates with education and age at death in the same way (rz,x and rz,y > 0 or rz,x and rz,y < 0). Results from the simulation suggest that under this scenario, our within-twin pair estimate will be too large, but that the size of the overestimate is likely to be modest in absolute terms. Only under fairly extreme conditions (rz,x = –.30 and rz,y = –.30, or rz,x = .30 and rz,y = .30) do we obtain coefficients that approach 0. Although we cannot definitively rule out the existence of an unobserved confound that fits this description, we can say with certainty that no observed variable in our data set comes close. Most observed measures that produce the required correlation with years of schooling completed (e.g., householder’s socioeconomic status) are only weakly related to age at death (e.g., the correlation between householder’s socioeconomic status and age at death is .01 in our sample of twins) and vice versa. We think that these results provide reassurance against the threat of residual variation.
A third concern raised—particularly about twin studies that use a within-pair estimator like ours—is that non-null results could be driven by the presence of extreme values on key explanatory variables (Amin 2011; Lundborg et al. 2016). To consider this possibility, we pooled our twin and non-twin sibling subsamples (to maximize power) and then dropped all pairs where the within-pair difference in education was greater than or equal to four years of schooling (eliminating about 15% of all observations). Imposing this constraint did not diminish our point estimate for education (= .39, p < .01) as one would expect if extreme values were driving the results.13 Instead, in the pooled sample—and also in supplementary analyses where we disaggregated by subsample—our estimate for years of schooling remained the same or even increased marginally in size.
The findings to this point suggest that the effects of education on mortality are positive, on average, and that methodological complications are unlikely explanations for the observed relationship. Whether the same pattern holds across population subgroups, as defined by their socioeconomic status, is an open and important question. Prior theoretical work—mostly focusing on educational gradients in physical health—has developed competing theories for who stands to gain the most from completing additional schooling. One possibility is that the biggest returns go to individuals with the fewest advantages because their socioeconomic success depends more critically on their educational attainment. This argument can be traced to Ross and Mirowsky’s (1989) work on resource substitution theory. Another possibility is that the biggest returns go to individuals who are the most advantaged because they are in a better position to leverage and consolidate the multiple social, economic, and health-related resources that education is thought to provide. This argument, which implies the presence of a cumulative advantage process, is typically referred to as the resource multiplication hypothesis (Andersson and Vaughan 2017; Ross and Mirowsky 2011; Schafer et al. 2013).
To test these propositions, we fit an augmented version of our within-pair model that included an interaction between the respondent’s completed education (expressed using a linear measure of years of schooling completed) and a measure of their parents’ occupational standing in 1920 (derived from a constructed variable that assigns occupational income scores to each occupation based on the median income within that occupation).14 Although the main effect of occupational standing cannot be estimated in our models—because it is perfectly correlated with the within-family fixed effects—the coefficient on its interaction with education is estimable and provides information about effect heterogeneity across the distribution of socioeconomic status. For these analyses, we pooled our sibling and twin subsamples (again, to maximize power) and used the same set of inverse propensity of linkage weights as used earlier. All other aspects of our model specification remained unchanged.
Figure 3 visualizes the main result. Occupational income scores are plotted along the x-axis; estimated effects, measuring the expected change in life expectancy associated with an additional year of schooling, are given by the y-axis. The shaded regions, going from light to dark, provide the 95%, 75%, and 50% confidence intervals around the estimated effect at each level of occupational income. That the estimates presented in the graph slope upward suggests that the longevity returns to additional school are not uniform with respect to social background but instead grow (nearly doubling in size) as one moves from the very low to very high ends of the occupational income distribution.15 This pattern is broadly consistent with the idea of resource multiplication. Completing additional schooling seems to have had beneficial effects regardless of a person’s social background, but the benefits appear to be most pronounced for those who were raised in more advantaged circumstances.
Educational gradients in mortality are strong and well documented (Hummer and Hernandez 2013), but recent work has raised questions about their etiology. One possibility is that the link between education and mortality is causal: completing additional schooling promotes the acquisition of skills, resources, and knowledge that as a package increase a person’s chances of survival (Phelan et al. 2004). Another possibility is that the two variables share common causes (Fuchs 1982), confounding effect estimates and inducing a spurious (or partially spurious) relationship. In this project, we sought to adjudicate between these possibilities using an approach that allows for credible estimates of causal effects. In a series of increasingly stringent model specifications, we were able to difference out all features of the neighborhood, family, and genetic endowment that strategically paired members of our sample had in common. What we were left with was a slightly attenuated but still strong and significant relationship between education and survival, with unobserved aspects of family environment acting as the most important confound. This result held across alternative parametrizations, persisted except under fairly extreme empirical conditions, and does not appear to be an artifact of errors in our data, censoring, and/or other methodological considerations.
These findings help to extend the already well-developed literature on education and mortality. Determining whether educational attainment is a cause or simply a correlate of survival requires specific data and a strong research design. Scholars working in the U.S. context have made considerable headway using a combination of observational and quasi-experimental approaches (Link et al. 2008; Lleras-Muney 2005; Montez and Hayward 2014), but concerns regarding identification have lingered. Although the twin-differencing strategy that we employed in our analysis has been used in prior studies to address this issue (Behrman et al. 2011; Ericsson et al. 2019; Lundborg et al. 2016; Madsen et al. 2010; Søndergaard et al. 2012; van den Berg et al. 2015), applications in the United States have not been possible because of a lack of appropriate data. Using a supervised machine-learning algorithm, we were able to link large samples of U.S.-based twins, non-twin siblings, unrelated neighbors, and unrelated people living in different neighborhoods across censuses and to administrative records containing information on the timing of their deaths. This new data resource—which includes nearly 2 million fully linked records all told—allowed for a careful consideration of confounding across several levels of analysis and new estimates of effect heterogeneity.
Although we consider these to be valuable contributions, we also recognize the need for caution. Prior research on education and mortality suggests that the strength of the relationship—and the extent to which it derives from a true causal process—may vary substantially across time, space, and populations (Cutler et al. 2015; Galama et al. 2018; Gathmann et al. 2015; Hayward et al. 2015; Kunst and Mackenbach 1994; Smith et al. 2015). In our analyses, we considered variation in educational effects across the distribution of social background (operationalized in terms of father’s occupational status) using one of the first birth cohorts to experience increased access to education (Goldin 1998), but similar comparisons across birth cohorts and/or by race/ethnicity or gender were not possible given the nature of our data.16 It may very well be that the patterns we observed for mostly White boys living in the United States during the first part of the twentieth century do not hold for girls or minorities from the same birth cohort, for earlier or subsequent cohorts of Americans, or for individuals living in other countries.17 The good news is that some of these questions may be answerable in the near future. In ongoing work, we are (1) using parallel machine-learning procedures to link boys who were enumerated as a part of the 1900 and 1910 censuses (born between 1890 and 1910), and (2) developing specialized routines (that capitalize on parental surname information included as a part of NUMIDENT) to link large subsamples of girls across censuses and to mortality records. Our hope is that these efforts will facilitate new analyses of the causal relationship between education and mortality and the way it is conditioned by specific historical, social, demographic, and epidemiological factors.18
Caution is also warranted with respect to internal validity. Within-family designs, including within-twin pair and within-sibling designs, have a number of well-known methodological issues (McGue et al. 2010). The most important one for us has to do with identification. The within-pair estimator that we used allowed us to difference out the influence of unobserved factors operating at the family and neighborhood levels, but there is no guarantee that residual within-pair differences (in specific environmental exposures, early-life health conditions and illnesses, personality characteristics, and/or genetic endowments) did not remain. In our analyses, we did what we could to assess the severity of this threat via targeted simulations. Results from this exercise suggest that the amount of residual variation would have to be extensive, and of a certain type, in order to invalidate our inferences regarding the effects of education on mortality. Although this does not confirm that education is uncorrelated with the individual-level error term in our main estimating equation (and thus unconfounded by residual differences that exist within pairs of twins and non-twin siblings), it does help to provide a plausible lower bound on the effects we are estimating. We think this is about the best one can do using observational data.
Just over 45 years ago, Kitagawa and Hauser (1973) published results from a large-scale record-linking project that matched a sample of death records to microdata from the 1960 census long form. In their analyses, they found that education and life expectancy were positively correlated and that this association existed, to varying degrees, within different subgroups of the population. In the years since, there has been a push to extend Kitagawa and Hauser’s findings in ways that allow for stronger statements regarding causality (Montez and Friedman 2015). We believe that this line of inquiry is crucial. If education and mortality are causally related, then intervening in a way that promotes schooling could have tangible benefits for survival and other health-related outcomes (Hummer and Hernandez 2013). Our own analyses, which were also based on a large-scale record-linking project, provide at least some reason for optimism in this regard. Men in the United States who were born between 1910 and 1920 tended to live longer if they completed additional schooling, and this pattern was not readily explained by differences in environmental exposures during childhood or variation in other hard to observe endowments. Although we prefer to be as circumspect as possible when making causal inferences, we think that these results are at the very least consistent with the notion of a true education effect.
This research was supported by a grant (1R21AG054824-01A1) from the Eunice Kennedy Shriver National Institute for Child Health and Human Development (NICHD). Research support was also provided by the Minnesota Population Center, which receives core funding (P2CHD041023) from NICHD. Thanks are due to participants at several conferences and seminars for their constructive feedback and comments, and to the anonymous reviewers. All errors and omissions, however, are the responsibility of the authors.
Halpern-Manners, Warren, Roberts, and Helgertz conceived of the project; Helgertz designed and implemented the record-linking procedure; and Halpern-Manners carried out the analyses and drafted the manuscript, with input and contributions from the other three authors. All authors read and approved the final manuscript.
The complete count 1920 and 1940 censuses are publicly available at https://www.ipums.org. The linked data files used in this project are available upon request.
Compliance With Ethical Standards
Ethics and Consent
No ethical approval or consent was required for this study, as all records are public, and nearly all participants are deceased.
Conflict of Interest
The authors report no conflicts of interest.
In ongoing work, we are exploring the feasibility of implementing similar machine-linking procedures for a subsample of female children (in female-female sibling and twin pairs and female-male sibling and twin pairs). Unfortunately, the technical challenges involved in obtaining reliable links for girls are much steeper because of more frequent name changes at marriage. We discuss this issue in more detail in our conclusion.
The assumption regarding place of birth is probably not entirely accurate, but the implications thereof should not be important. Furthermore, the potential benefits of relaxing this assumption should be weighed against the obvious downside of increasing the population of potential matches and, thereby, also the risk of declaring false positives.
We could not distinguish MZ from DZ twins in our analyses, but a publication from the period in question—which estimates that among same-sex twin pairs born between 1922 and 1930, 50% were MZ (Hamlett 1935)—provides a rough guide. It is possible that the actual percentage we end up with in our analytic sample is somewhat lower (because pairs have to be discordant on education to contribute to our preferred within-pair estimates and rates of discordancy are likely to be lower among MZ twins), but we do not expect the difference to be especially large. Supplementary analyses of data from the Virginia Twin Registry show that among male MZ twins born between 1910 and 1920 (and still alive in 1987), the rate of discordancy was 36%. The same figure for male-male DZ twins born during the same period was 45%. If we take these percentages at face value, they imply that roughly 45% of discordant pairs [(0.36 / (0.36 + 0.45) × 100% = 45%] in our twin sample are likely to be MZ.
The primary determinant of successful linkage was name commonality (i.e., the number of people living in the same state with the same first and last name). In supplementary analyses, described in the online appendix, we show that name commonality is orthogonal to educational attainment net of basic sociodemographic and geographic controls.
There is room for debate about whether such weighting adjustments are necessary in the first place (Amin et al. 2015b; Boardman and Fletcher 2015). Our within-pair models provide protection against differential selection during the linkage stage (and other related concerns about external validity) by adjusting for all characteristics (observed or otherwise) that are shared within pairs. We suspect that this is why weighted and unweighted estimates closely agree with one another.
Fixed-effects models for pairs of unrelated individuals will produce point estimates (but not variance estimates) that are equivalent to an unpaired model with identical controls. We opted to use pairs for this subsample to ensure consistency with our treatment of the other subsamples.
The intrapair correlations presented in Table 2 for twins and non-twin siblings can be used to back out a rough estimate of broad sense heritability. If we assume that the twin sample is approximately 50% MZ and 50% DZ—and if we invoke the usual assumptions regarding equal environments, minimal gene-environment interactions, and comparable shared environments within pairs—then Falconer’s (1960) formula suggests that the broad sense heritability of age at death for members of this cohort was approximately 1.5 × (rtwins – rsiblings) = 1.5 × (0.21 – 0.12) = 0.14. We reiterate that this is a rough estimate.
The unweighted estimates (not shown) were substantively identical.
In supplementary analyses, we pooled the non-twin sibling and twin subsamples and fit a model interacting an indicator of subsample membership and years of schooling. The results suggest that the sibling and twin estimates are not significantly different from each other (p = .82). The same is not true for a comparison of the sibling and neighbor estimates, which produces significant differences at the p < .01 level.
We also experimented with a three-category measure of education, where education was coded as less than 12 years, 12 years, and more than 12 years of schooling. The three-category version produced a very similar (and statistically significant) educational gradient in age at death. We present results from the two-category version because cell sizes for some of the comparisons in the three-category version (e.g., more than 12 years vs. less than 12 years of education) are small in the twin subsample.
The estimates presented in Tables 3 and 4 give at least some reason to think that unobserved differences in genetic endowments within twin pairs may be less consequential for our analyses. The within-twin pair estimates that we provide represent a weighted average of estimates for MZ and DZ twins (Conley et al. 2006). Prior research, as noted earlier, suggests that male-male twin pairs born during this period were approximately 50% MZ and 50% DZ (Hamlett 1935). If we set the DZ estimates equal to the age-adjusted estimates that we obtain for non-twin siblings (who, like DZ twins, share 50% of their genes), we can calculate the MZ contribution to our within-twin pair results using Weinberg’s (1901) method. For the within-pair model that uses a linear parameterization of education, we get a coefficient of [0.347 − 0.338 × (1 − 0.5)]/0.5 = 0.356. The fact that we do not see much of a difference between siblings and twins (and between the sibling estimates and our inferred estimates for MZ twins, who are genetically identical) does not imply that genes are somehow irrelevant to a person’s educational attainment or longevity. It simply suggests that the additional endowments we are differencing out as we move from a within-sibling to within-twin pair model are not predictive of educational outcomes and survival. Prior work in other contexts has reached similar conclusions (Lundborg et al. 2016).
This setup is conceptually similar to the type of bounding analysis performed in Rosenbaum and Rubin (1983) and Rosenbaum (1995) except that we are deploying it within the context of a within–twin pair fixed-effects model.
These results are available upon request.
We used the occupational income score of the householder (Hauser and Warren 1997), which in most cases meant the focal individual’s father as opposed to mother.
The p value on the interaction term is .03.
If it is the case that education (and, in particular, higher levels of education) has become an increasingly important vehicle for obtaining valuable health-enhancing resources—as work by Hayward et al. (2015), Masters et al. (2012), Sasson (2016), and others clearly suggests—then we would expect to see larger and potentially more discontinuous education effects for later cohorts of adults (e.g., Baby Boomers).
Rates of smoking could also contribute to cross-cohort differences. The 1910–1920 cohort shared with its predecessors and immediate successors high rates of smoking initiation and continuation (Preston and Wang 2006), with little variation by education (Escobedo and Peddicord 1996). If anything, this should suppress education effects relative to later cohorts, where educational gradients in smoking were more pronounced (Ho and Fenelon 2015).
Another possible extension would be to link to the National Death Index (NDI), which provides information on cause of death. Based on the conceptual model presented in Fig. 1, we would expect to see a more robust relationship between education and deaths that were caused by chronic diseases linked to unhealthy lifestyles (Masters et al. 2015; Phelan et al. 2004), as opposed to deaths from less preventable causes where education (and the various personal and social resources it affords) should be of less benefit.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.