When investigating relationships between education and health, one has to take age into account. Conditioning on age entails conditioning on surviving, which has been argued to lead to a potential selection bias. In this note, I argue that surviving should be considered as a necessary precondition for the relationships of interest and, therefore, not as a possible source of bias. I criticize models of health trajectories that do not condition on surviving.
When investigating relationships between education and health, one has to take age into account. Conditioning on age entails conditioning on surviving, which has been argued to lead to a potential selection bias and should be coped with in some way (e.g., Beckett 2000; Chen et al. 2010; Kim and Durden 2007; Lynch 2003). In this note, I argue that surviving should be considered as a necessary precondition for the relationships of interest and, therefore, not as a possible source of bias.
I begin with a brief introduction of the conceptual framework and then consider mean health trajectories, which are defined conditional on surviving. I criticize the argument that references to selective mortality can help to explain differences of mean health trajectories of persons with different educational levels. I then consider age-specific changes of health and argue that these changes must also be defined conditional on surviving. Subsequently, I show that this condition is no hindrance to a causal interpretation of the relationship between education and health. Finally, I briefly consider growth curve models. If estimated in a temporally local way that allows conditioning on surviving, growth curve models are tools for modeling mean health trajectories. In contrast, results of hierarchical growth curve models are difficult to interpret and are potentially misleading because these models implicitly assume that all individual trajectories are defined for a common temporal domain. I end with a brief conclusion.
Comparing Mean Health Trajectories
Several studies found evidence for an age-as-leveler hypothesis, meaning that values of ∆t become smaller at higher ages (e.g., Beckett 2000; Dupre 2007; Herd 2006). 'This hypothesis motivated a discussion of whether a leveling effect of age could be explained by selective mortality. The basic question concerns how the subsets Cx are changed through mortality. Obviously, their size becomes monotonically smaller, which depends on the educational level, x, as described by the probabilities Pr (L ≥ t | X = x). Because surviving depends on health, one can also think that mortality changes the distribution of health in the surviving population.
This is illustrated in Fig. 1, based on 30 individual trajectories hit = αi + βi(t − 30).3 Values of αi are random draws, uniformly distributed in the interval (0, 4), and βi = −0.05. In accordance with the definition of Ht, individuals are assumed to be dead if hit< 0. The bold line shows the mean values of the surviving individuals’ health.
However, the implications of mortality illustrated in Fig. 1 do not allow drawing any definite conclusions for the comparison of mean health trajectories of two groups, Cx′ and Cx′′. This is illustrated in Fig. 2, which compares mean health trajectories of two groups. Cx′′ consists of the 30 trajectories shown in Fig. 1. comprises 30 trajectories that equal those in Cx′′ at t0, but decline with slope –0.07 (instead of –0.05). Clearly, the gap between the two mean health trajectories rises.
Changes of Health
While δt(x) is the mean of the age-specific changes of the individual health trajectories, δt*(x) is the age-specific change of the mean health trajectory. The difference is immediately visible in Fig. 1. In this example, all individual trajectories change in the same way, independent of age. The mean of these changes is simply the gradient –0.05, which is obviously different from the age-dependent gradients of the mean trajectory.
The difference between the two ways of assessing change is due to mortality. To think of an individual’s change of health between t − 1 and t requires that the individual survives at least until t. In contrast, the mean health trajectory relates to a group of individuals that changes continuously through mortality. However, these changes are not a source of bias: δt(x) and δt*(x) are simply different concepts, both providing relevant information.
I have argued that a person’s surviving is a necessary precondition for a meaningful reference to health. In this section, I briefly point to a consequence for a causal interpretation of the relationship between education and health.
Health is a time-varying variable, and health at age t + 1 depends on health at age t. Considerations of causally relevant conditions of health therefore require a temporally local (age-specific) approach. The following diagram relates to age t.
I use X (education) and Ht (health) as defined earlier. In addition, the variable Dt represents the survival status (1 = dead, 0 = still alive). Of course, a reference to relationships between education and health at age t presupposes survival at least until this age (Dt = 0).
However, this expectation is deterministically known to be –1, and the relationship is independent of X and Ht. In other words, conditional on Dt + 1 = 1, neither X nor Ht can be attributed a causal effect on Ht + 1. This has an important consequence: namely, that there is no indirect effect of education on health mediated by survival; only the direct effect has a meaningful causal interpretation.
Growth Curve Models
Without explicitly assuming a distribution of residuals, the model can be estimated with ordinary least squares (OLS). The resulting growth curves are then parametric models of mean health trajectories. As illustrated in Fig. 4, the growth curve is derived from OLS estimation of Eq. (6), with x = 0 for the 30 trajectories in Fig. 1.
This structural core of the model is identical with Eq. (6) and entails a single expected health trajectory for all persons with the same educational level.
To estimate expected health trajectories with a hierarchical growth curve model, I first consider again the 30 trajectories from Fig. 1. Because all individual trajectories have the same time-constant slope, it suffices to consider the model Ht* = α + tβ + να + ε. The solid line in Fig. 5 shows the estimated growth curve ( = 3.364, = −0.05). This curve is obviously neither a possible individual trajectory nor some mean of the individual trajectories. The curve might be interpreted as a fictitious reference that allows one to think of the individual trajectories as random deviations (as defined by the stochastic part of the model).6
In this article, I argue that a person’s surviving is a necessary precondition for a meaningful reference to her health. Statements about the dependence of health on age, education, and/or other covariates must be understood as being conditional on surviving. Selective mortality should not be considered a source of bias, which can hypothetically be dismissed.7
Given this understanding, mean health trajectories that are defined conditional on surviving provide meaningful descriptions of the health of the surviving members of a cohort. However, age-dependent changes of a mean health trajectory must be distinguished from mean values of health changes of individual persons.
Selective mortality also has consequences for the understanding of growth curve models. Simple growth curve models (in which residuals have a temporally local definition) can be understood as tools for investigating how mean health trajectories depend on education and/or other covariates. Hierarchical growth curve models, in contrast, implicitly assume that all individual health trajectories have an identical temporal extension. Thus, growth curves estimated with these models do not represent the actually observed health trajectories. Because they do not condition on surviving, these models also misrepresent age-dependent changes of health.
Additional problems occur when cohorts are based on a broad range of birth years; for some discussion, see Lauderdale (2001).
This model, of course, is extremely simplified. Real individual health trajectories show a wide variety of different, generally nonlinear, and often nonmonotonic forms.
The plot is inspired by aging-vector graphs as used by Kim and Durden (2007).
For discussion of this question, see also Kurland et al. (2009). They considered the hierarchical growth curve model as an unconditional model (with regard to surviving), which requires values of the dependent variable for deceased persons as well. This model is contrasted with a partly conditional model, which relates to the surviving members of a cohort and is basically equal to Eq. (6).
Of course, it is possible for omitted variables to distort an assessment of the relationship between education and health: for example, if an omitted variable affects both health and mortality such that, conditional on surviving, its correlation with education changes. However, this problem cannot be avoided by hypothetically dismissing the conditioning on survival.