For more than a century, researchers from a wide range of disciplines have sought to estimate the unique contributions of age, period, and cohort (APC) effects on a variety of outcomes. A key obstacle to these efforts is the linear dependence among the three time scales. Various methods have been proposed to address this issue, but they have suffered from either ad hoc assumptions or extreme sensitivity to small differences in model specification. After briefly reviewing past work, we outline a new approach for identifying temporal effects in population-level data. Fundamental to our framework is the recognition that it is only the slopes of an APC model that are unidentified, not the nonlinearities or particular combinations of the linear effects. One can thus use constraints implied by the data along with explicit theoretical claims to bound one or more of the APC effects. Bounds on these parameters may be nearly as informative as point estimates, even with relatively weak assumptions. To demonstrate the usefulness of our approach, we examine temporal effects in prostate cancer incidence and homicide rates. We conclude with a discussion of guidelines for further research on APC effects.
Researchers in a wide range of fields have long sought to understand social and cultural change by identifying the separate contributions of age, period, and cohort (APC) effects1 on various outcomes.2 The core idea is that any temporal change can be attributed to three kinds of processes (Glenn 2005; Yang and Land 2013): (1) changes over the life course of individuals, or age effects; (2) changes due to events in the year the outcome has been observed, or period effects; and (3) changes arising from the replacement of older cohorts of individuals with younger ones with different characteristics, or cohort effects. However, in what has been called the APC identification problem (Mason and Fienberg 1985), the slopes3 of an APC model cannot be uniquely estimated because of the linear dependency among the age, period, and cohort variables. Researchers have used a variety of techniques in their attempts to overcome the APC identification problem, with recent attempts based on constraining the temporal effects through hierarchical (or multilevel) models or variations of the Moore-Penrose generalized inverse (Fu 2000, 2016; Fu and Land 2015; Fu et al. 2011; Knight and Fu 2000; Land et al. 2016; Yang 2008; Yang and Land 2006, 2013; Yang et al. 2004).
In this article, we propose an alternative approach based on placing bounds on one or more of the temporal effects from an APC model. Our approach has important similarities to Charles Manski’s work on partial identification (1990, 1993, 2003). In contrast to problems of statistical inference, which involve understanding how sampling variability can affect conclusions based on samples of limited size, problems of identification entail understanding what conclusions can be drawn even with a sample of infinite size. The lack of a unique solution for the linear APC effects is a classic identification problem because it cannot be resolved by gathering larger samples (Manski 2003). Our approach begins by examining what the data alone, with as few constraints as possible, can tell us. Then, given that full identification of the parameters from an APC model is not possible, we proceed with the goal of partially identifying the APC effects using explicit theoretical considerations that are based on the expected size, direction, or overall shape of one or more of the temporal effects.
The rest of this article is organized as follows. First, we offer a brief history of APC analysis, summarizing recent approaches to the APC identification problem in sociology, epidemiology, and demography. Second, we discuss in detail the formal model of APC analysis, showing how the identification problem is restricted to the linear components. Third, we demonstrate how, despite the nonidentifiability of the individual slopes, the data provide important information that allow us to derive formal bounds on the linear effects of APC models. Fourth, based on a geometric derivation of the identification problem, we outline a novel graphical tool that summarizes what can be known about the linear effects from an APC model. Fifth, we discuss various strategies for bounding APC effects by specifying the size and direction of one or more slopes, fixing only the sign of one or more slopes, or applying a shape constraint using data on the nonlinear components. Sixth, we provide two examples of our approach, examining temporal effects in prostate cancer incidence and homicide rates. Finally, we conclude with a summary of our framework and an outline for further research on APC effects.
The literature on APC effects has interdisciplinary origins, with seminal works arising in epidemiology and medicine in the 1930s (e.g., Frost 1939/1995) and later in sociology and demography in the 1950s and 1960s (Mannheim 1952; Ryder 1965).4 Since these early works, a dizzying array of methods have been proposed to address the APC identification problem (for reviews of the literature, see O’Brien 2015a; Yang and Land 2013). In sociology and demography, these proposed solutions have appeared in two distinct waves of articles and books. The first set of solutions to the identification problem, developed in the 1970s and 1980s (see Mason and Fienberg 1985), consists of omitting one of the linear components of an APC model; using a proxy variable in place of age, period, or cohort; or constraining some set of the parameters to be equal (Yang and Land 2013). Among the most widely used of these techniques is the equality constraints model, in which the researcher specifies each age, period, and cohort group as a dummy variable in a regression model but collapses two groups into a single group (Fienberg and Mason 1979; Mason et al. 1973; Mason and Fienberg 1985). As some researchers have noted (Glenn 1976; Palmore 1978; Rodgers 1982a, b), models using these kinds of equality constraints are problematic in that they are very sensitive to minor differences in model specification and impose implicit assumptions about the size and direction of the underlying age, period, and cohort slopes (O’Brien 2015a).5 Moreover, despite the caution by Fienberg and colleagues that equality constraints should be based on overt theoretical assumptions (Mason and Fienberg 1985; Smith et al. 1982), in practice, researchers have used such constraints in arbitrary and often atheoretical ways.6 Furthermore, researchers may be misled by failing to recognize that models with differing equality constraints have identical fit statistics (O’Brien 2015a; Yang and Land 2013).
Over the past two decades, a second set of models for disentangling the unique contributions of age, period, and cohort effects has emerged in sociology and demography (Fu 2000; Fu et al. 2011; Yang and Land 2006). These approaches have led to a proliferation of prominent studies on various topics, including verbal ability (Yang and Land 2006), infant mortality (Powers 2013), heart disease (Lee and Park 2012), obesity (Reither et al. 2009), and perceived happiness (Yang 2008). To achieve identification, these models generally use either shrinkage (or regularization) of the parameters or some variant of the Moore-Penrose generalized inverse. The two most widely used techniques are the intrinsic estimator (IE) and the hierarchical age-period-cohort (HAPC), but similar results can be obtained from ridge regression, partial least squares regression, and principal components regression (O’Brien 2011a, 2015b). These techniques share many of the same shortcomings as earlier approaches, as demonstrated by a spate of recent studies (e.g., see Bell and Jones 2014a, 2014b, 2015a, 2015b; Fienberg et al. 2015; Luo et al. 2016; Pelzer et al. 2014).7 Like earlier techniques, these methods require researchers to impose implicit constraints on the model parameters, relying on potentially strong assumptions that may fail to consistently estimate the true underlying age, period, and cohort parameters.8 Moreover, recent studies have shown that these new methods can be very sensitive to model parameterization in ways that are not likely to be obvious to applied researchers (Bell and Jones 2015b; Luo 2013; Luo et al. 2016; Pelzer et al. 2014).
In short, some progress has been made since the early works on APC analysis, but problems remain. Either the assumptions used to identify an APC model are ad hoc or the estimates are highly sensitive to the exact model specification (Bell and Jones 2015b; Luo et al. 2016; O’Brien 2015a; Rodgers 1982a). Furthermore, many researchers appear to be unaware of the identification problem altogether, dropping one or more of the temporal variables without an explicit proxy (O’Brien 2015a). As we show in the next section, this entails an unnecessarily strong assumption about one or more of the APC effects, given that the nonlinear components are fully identifiable.
Modeling Temporal Effects
The parameterization shown in Eq. (1) is very flexible, allowing the age, period, and cohort effects to be highly nonlinear because there is one parameter for each age, period, and cohort category (Mason et al. 1973). In an age-period array, each cell is represented by a unique set of parameters (O’Brien 2011a). However, the C-APC model suffers from a fundamental identification problem due to linear dependence in the columns of the design matrix (O’Brien 2015a; Yang and Land 2013). As a result, the model in Eq. (1) cannot be estimated using conventional statistical techniques.
The C-APC and L-APC are equivalent representations of temporal data grouped by age, period, and cohort.11 As with the C-APC, each cell in an age-period array is modeled by a unique combination of parameters under the usual zero-sum constraints. For example, the ith age effect in the C-APC is represented in the L-APC by the overall linear age effect along with a unique parameter for the ith age nonlinearity: . In other words, each age effect αi is decomposed into the sum of a common parameter α representing the age slope for the entire array, with a value shifting across rows (or age categories) as a function of the age index i, and a unique parameter , which is a nonlinearity specific to each row (or age category). We can similarly decompose each of the period and cohort effects into linear and nonlinear components.
Despite widespread use of the APC model in the social and biological sciences, there has been little formal discussion in the literature about how to interpret the coefficients from an APC model. Some researchers appear to interpret these variables as producing causal effects (e.g., Mason and Fienberg 1985; Mason et al. 1973; Rodgers 1990), while others seem to view the variables as providing descriptive comparisons (e.g., Firebaugh 1989; Ryder 1965). Notwithstanding, when researchers talk about age, period, and cohort as having an “effect,” they are generally referring to a set of underlying causal processes that are indexed by the three APC variables (e.g., see Mason and Smith 1985). Putatively, these processes are potentially subject to manipulation and thus have well-defined causal effects in the sense of having relatively unambiguous potential outcomes or counterfactuals (Morgan and Winship 2015).14
The coefficients of an APC model can be interpreted in two main ways. First, the coefficients can be understood as representing the differences in the group means for each of the three APC variables conditional on the other two. However, because of the linear dependence, we cannot directly observe the three conditional mean differences; rather, we can observe only α + π = θ1 and π + γ = θ2. Second, the coefficients can be viewed as representing the causal processes that generate observed mean differences across the cross-classified age, period, and cohort categories. These causal processes are not directly observed but rather are indexed by the APC variables. We provide a more extended discussion of how to interpret the coefficients in the online appendix.
APC Bounding Formulas
Bounds on the APC effects can be represented algebraically or graphically. Readers who prefer a graphical representation may skip to the next section (Graphical Tools for APC Analysis). In this section, we first show how, although the true APC slopes can lie anywhere on the real number line, particular combinations of the slopes are identifiable from the data. From these interdependencies, we derive a set of formal bounds on the size and sign of the slopes. If one is willing to make an informed statement about the size and sign of at least one of the slopes from an APC model, then the data may be quite informative about the range of the remaining parameters.
Interdependencies Among the Slopes
Bounding the Slopes
For example, suppose we estimate an APC model and find that θ1 = 3 and θ2 = –2, such that θ2 – θ1 = –5 and θ1 – θ2 = 5. If we have reason to assume that the age slope has a lower bound of αmin = 0 and an upper bound of αmax = 2, then we can use the θ1 and θ2 to bound the period and cohort slopes as well. Specifically, by setting αmax = 2, we know that the lower bound for the period slope is θ1 – αmax = α + π – 2 = 3 – 2 = 1. Likewise, by setting αmin = 0, we know that the upper bound for the period slope must be θ1 – αmin = α + π – αmin = 3 – 0 = 3. Similarly, we can find the bounds for the cohort slopes. By setting αmin = 0 and αmax = 2, we know that the lower bound of the cohort slope must be θ2 – θ1 + αmin = γ – α + αmin = –5 + 0 = –5, and the upper bound must be θ2 – θ1 + αmax = γ – α + 2 = –5 + 2 = –3. Thus, by restricting the age slope to lie between 0 and 2, we can conclude from the bounding formulas that the period slope must range from 1 to 3, and the cohort slope must range from –5 to –3.
Graphical Tools for APC Analysis
Although data visualization has a long history in APC modeling (Keiding 2011), there are no widely used graphical tools for evaluating and understanding the slopes of APC models. In this section, we show how the bounding formulas in Table 1 can be aided by a visualization of the interdependencies among the slopes, which we derive from a geometric interpretation of the APC identification problem (see O’Brien 2015a).
The Solution Line
In the absence of data, the age, period, and cohort slopes may take on any combination of values in a three-dimensional space. In this space, the equations θ1 = α + π and θ2 = π + γ each define a two-dimensional plane. Again, suppose we have APC data with θ1 = 3 and θ2 = –2. Panel a of Fig. 1 depicts the age-period plane defined by θ1 = 3, while panel b depicts the period-cohort plane defined by θ2 = –2. The intersection of these two planes then defines a line, as illustrated in panels c and d. This line is known as the solution line because any point on the line is a potentially valid set of estimates for the unknown parameters α, π, and γ (O’Brien 2015a).17 That is, all points on the solution line give exactly the same predicted values of the outcome, and the slopes may take on any real number between negative and positive infinity. As a result, without additional assumptions, the data are uninformative as to which combination of parameter values on this line should be preferred.
What is important to understand from Fig. 1 is what, precisely, is accomplished when data are applied to an APC model. In our simple case involving only linear components, the data have taken us from a three-dimensional space in which all parameter values are possible to a one-dimensional space in which only certain combinations of estimates lying on a line are consistent with the data. This same extent of reduction holds even if our data have nonlinearities because they are fully identified. As such, the data have gone a long way toward constraining the possible estimates for the age, period, and cohort slopes, restricting an initial set of parameters that could be anywhere in a three-dimensional space to a set of values lying on a line. The data have been quite informative about our parameter values but not as informative as we might like in the sense of providing unique point estimates for the linear effects.
2D-APC Graph of the Solution Line
Because the solution line is a geometric representation of the APC identification problem, we can use panel d of Fig. 1 to evaluate the consequences of setting bounds on one or more of the slopes. Although possible, this is difficult given that the solution line runs through three dimensions. Fortunately, because the solution line is determined by only two quantities, θ1 = α + π and θ2 = π + γ, we can flatten our three-dimensional representation of the solution line to two dimensions without loss of information. One way of doing this is by having the horizontal axis represent the period slope; the left vertical axis, the age slope; and the right vertical axis, the cohort slope. We call this a 2D-APC graph.
In Fig. 2, we depict a 2D-APC graph based on APC data with values of θ1 = 3 and θ2 = −2. The solution line shown in Fig. 2 is identical to that in panel d of Fig. 1. The left vertical axis refers to the values of the age slope; the top and bottom horizontal axes, to period; and the right vertical axis, to cohort. We will refer to each point in the coordinate space in terms of age, period, and cohort so that, for example, the point (1, 3, –5) refers to α = 1, π = 3, and γ = –5. The solution line runs from the upper left to the bottom right of the 2D-APC graph; the three dashed lines indicate when each of the age, period, and cohort slopes are set to 0, respectively. The point at which each dashed line intersects with the solution line gives the estimates for models that set the age, period, or cohort slope to 0. For example, the bottom horizontal dashed line, indicating when α = 0, intersects the solution at (0, 3, –5). If the true slope for age were 0, then we would know that the period slope is 3 and the cohort slope is –5. The three dashed lines intersect at the origin for the age-period dimension and that for the period-cohort dimension.18 In Fig. 2, we denote the age-period origin and the period-cohort origin with hollow circles at (0, 0, –5) and (5, 0, 0), respectively.
Importantly, Fig. 2 provides information on three key features of the data: (1) the slope of the solution line, which is always –1; (2) the direction and scale of the axes; and (3) the values of θ1, θ2, and their difference. For all APC data, features (1) and (2) will be the same: the slope of the solution line relating period to age and cohort will always be –1; accordingly, the age and cohort axes will always run in the same direction, whereas the period axis will run in the opposite direction.19 However, depending on the unknown parameters α, π, and γ, the data will generate different values of θ1, θ2, and their difference. These values in turn affect the bounding formulas as well as the location of the solution line in the 2D-APC graph relative to the age-period origin and period-cohort origin.
To clarify the connection between the θs and the 2D-APC graph in Fig. 2, it is useful to think of the relations among the APC slopes in terms of simple linear equations.20 Given that θ1 = α + π = 3, we can write α = 3 – π, which can be thought of as a linear equation relating the scale for age (α) to the scale for period (π) with an intercept of 3 and a slope of –1. Because the slope is –1, the age and period scales have flipped signs; and because the intercept is 3, age and period are offset by three units. The vertical (and horizontal) distance from the age-period origin (0, 0 − 5) to the solution line is given by θ1 = 3, as shown in Fig. 2. Thus, if the age (or period) slope is set to 0, then the value of the period (or age) slope must be θ1 = 3.
Similarly, because θ2 = π + γ = −2, we can write γ = −2 − π. This is a linear equation relating the scale for cohort (γ) to the scale for period (π) with an intercept of –2 and a slope of –1. Again, because the slope is negative the cohort and period scales have flipped signs and because the intercept is –2, cohort and period are offset by two units. As shown in Fig. 2, the vertical (and horizontal) distance from the period-cohort origin (5, 0, 0) to the solution line is given by θ2 = −2 because if the cohort (or period) slope is fixed to 0, then the period (or cohort) slope must be θ2 = −2. Finally, the offset between the age and cohort scales is given by θ2 − θ1 = γ − α = −5. For example, when α = 0 (as indicated by the bottom dashed horizontal line in Fig. 2), then we know γ = −5. Because the scales for age and cohort are offset by a known quantity, we can use the vertical axis to represent the slopes for both age and cohort.
Bounds With the 2D-APC Graph
The 2D-APC graph is a visual representation of the solution line and, by extension, the APC identification problem. As a result, the bounding formulas in Table 1 have a one-to-one correspondence with any set of bounds placed over a corresponding 2D-APC graph. Continuing with our example in which θ1 = 3 and θ2 = −2, suppose we have solid theoretical reasons to restrict the range of the cohort slope to − 4 ≤ γ ≤ −2. Then, using the bounding formulas in Table 1, we know that (θ1 − θ2) − 4 ≤ α ≤ (θ1 − θ2) − 2 and θ2 + 2 ≤ π ≤ θ2 + 4. Because we know from the data that θ1 = 3 and θ2 = −2, then we can conclude that if the cohort slope ranges from – 4 to –2, and it must also be the case that 1 ≤ α ≤ 3 and 0 ≤ π ≤ 2. Because the APC identification problem can be represented both mathematically and geometrically, we can visualize the bounds in a 2D-APC graph.
Panel a of Fig. 3 shows the bounds for the cohort slope (− 4 ≤ γ ≤ −2) in three-dimensional space. The shaded box denotes the region of the parameter space corresponding to a range of – 4 to –2. The solution line cuts through this region. Because we have ruled out cohort slopes greater than –2 and less than – 4, we have restricted the set of feasible APC slopes to those lying on the line in the feasible region (i.e., to those points lying on the solid line rather than the dashed line in panel a of Fig. 3).
However, the consequences of assuming these particular bounds for the cohort slope are not clear in panel a of Fig. 3 because of the inherent difficulties of interpreting a three-dimensional visualization on a two-dimensional surface. Panel b of Fig. 3 depicts the same information in a 2D-APC graph, which clarifies the nature of our assumptions. Restricting the cohort slope to range from – 4 to –2 reduces the parameter space to the shaded rectangle. Because each of the three APC slopes must lie on the solution line, the data inform us that with this assumption, the two extreme points on the line are (1, 2, − 4) and (3, 0, −2). Thus, we can conclude that the age slope must range from 1 to 3 and that the period slope must range from 0 to 2. We can arrive at the same conclusion using the bounding formulas in Table 1, but our 2D-APC graph provides a visual analogue that can help researchers understand the implications of setting one restriction rather than another on the slopes for age, period, and cohort. In the next section, using the bounding formulas and our 2D-APC graph, we provide a more detailed overview of various strategies that can be used to produce informative bounds from an APC model.
The bounding formulas and 2D-APC graph provide tools for evaluating the consequences of restricting the range of one or more slopes in an APC model. In practice, a researcher may not have enough information to specify the size and sign of one or more of the slopes. Moreover, we have said little so far about the constraints involving the nonlinearities, which are identified. Other strategies are possible, some of which may entail considerably weaker assumptions. In this section, we discuss two such strategies: (1) restricting only the direction of one or more APC slopes; and (2) constraining the shape of one or more overall effects, which entails incorporating the nonlinear components. We review these strategies in turn.
Constraints on the Sign
In many empirical examples, a relatively weak assumption is to constrain just the direction of one or more slopes. These bounds are necessarily one-sided, with one end of the interval set at 0 and the other at either positive or negative infinity. Remarkably, by constraining the direction of one temporal slope, we can in many cases make conclusions about the direction and magnitude of at least one of the other slopes. The top part of Table 2 shows the simplified form of the bounding formulas when only the sign of a single linear effect is considered.
To illustrate the application of bounds based only on specifying the sign of a slope, consider APC data with values of θ1 = 1 and θ2 = −1. Suppose we have theoretical reasons to claim that the age slope is nonpositive. Using the one-sided bounding formulas in Table 2, we know that θ1 ≤ π < + ∞ and − ∞ < γ ≤ θ2 − θ1. Because θ2 − θ1 = −2, we can conclude that the period slope is 1 ≤ π < + ∞ and the cohort slope is − ∞ < γ ≤ −2. In other words, by positing the direction of the age linear effect, we can conclude that the cohort slope must be negative. Moreover, even though we just constrained the sign of a single slope, we can make conclusions about the magnitude of the cohort slope; specifically, we know that the cohort slope must be less than or equal to –2 if the age slope is nonpositive. Likewise, we can conclude that the period slope must be greater than or equal to 1.
One drawback of setting the sign of a single slope is that the final set of bounds is not finite. However, we obtain finite bounds for the slopes only when one-sided intervals operate in the opposite directions. Because the age and cohort slopes have a perfectly direct relationship with each other but a perfectly inverse relationship with the period slope, some sets of one-sided intervals will never result in finite bounds. For example, suppose we assume that α and γ are both nonnegative and that we know from our data that θ1 = 1 and θ2 = −1. By assuming the age slope is nonnegative, we can conclude that 0 ≤ α < + ∞, − ∞ < π ≤ 1, and −2 ≤ γ < + ∞. By assuming that the cohort slope is nonnegative, we know that −2 ≤ α < + ∞, − ∞ < π = −1, and 0 ≤ γ < + ∞. Combining these inequalities gives us final bounds of 0 ≤ α < +∞, −∞ < π ≤ −1, and 0 ≤ γ < + ∞. Unfortunately, because the one-sided bounds are infinite on the same side (i.e., the bounds are in the same direction), our final bounds are not finite.
In the lower part of Table 2, we show the combinations of one-sided bounds that can result in final bounds that are finite. Bounds based on these assumptions will result in finite bounds or bounds that can be ruled out as inconsistent with the data. Returning to our example in which we have θ1 = 1 and θ2 = −1, suppose we assume that age and period are both nonnegative. The formulas in Table 2 tell us that 0 ≤ α ≤ 1, 0 ≤ π ≤ 1, and −2 ≤ γ ≤ − 1. Consequently, we can conclude that if the age and period slopes are both nonnegative, then the age slope must lie between 0 and 1, the period slope must lie between 0 and 1, and the cohort slope must lie between –2 and –1. Conversely, if we assume that the age and cohort slopes are both nonpositive, then we know that one or both of these assumptions must be incorrect: if both were true, we would obtain θ1 ≤ α ≤ 0, which is impossible because θ1 = 1.
The key to our approach entails using the bounding formulas and a 2D-APC graph to analyze the consequences of assumptions regarding the size and sign of the slopes from an APC model. However, so far we have considered only the case in which age, period, and cohort consist solely of linear effects. This is a worst-case scenario because in reality temporal effects are composed of both linear and nonlinear components, the latter of which are fully identified and will help us narrow the bounds even further.
So far, we have considered bounding only the linear effects. However, as shown in Eq. (2), a more general APC model partitions the overall temporal effects into linear and nonlinear components. Because the nonlinearities are identified, these can help us further reduce the overall bounds through the specification of shape constraints. We introduce some additional notation to clarify how we can incorporate nonlinearities into our bounding analyses. Let αm.i. denote a monotonically increasing (m.i.) age slope, and let αm.d. denote a monotonically decreasing (m.d.) age slope. If we have an APC model with only linear components, the situation is simple: a monotonically increasing age effect is equivalent to a positive slope, and a monotonically decreasing effect is equivalent to a negative slope. However, with nonlinear components, we can distinguish between shape (e.g., monotone decreasing or increasing) and direction (e.g., nonpositive or nonnegative), resulting in a considerably expanded set of possible bounding strategies.21
One possibility is to specify that the overall age effect is monotonically increasing over some range of values corresponding to an age slope of αm.i. ≤ α < + ∞. For example, it is reasonable to assume that the probability of developing certain kinds of cancer will either increase or stay the same as one ages (e.g., Ames et al. 1993; Harman 1956; Pryor 1982). A second possibility is to specify an overall effect that is decreasing monotonically, with an age slope of − ∞ < α ≤ αm.d.. For instance, crime rates among men are widely believed to decrease or stay the same after young adulthood (e.g., Farrington 1986; Hirschi and Gottfredson 1983). In some cases, it might be reasonable to assume that an effect is monotonically increasing over one range but monotonically decreasing over another. For example, crime rates among men might be assumed to increase monotonically in adolescence but decrease monotonically in later adulthood, corresponding to what has been called the “age-crime curve” (Loeber and Farrington 2014).
Nonmonotonicity constraints are also possible, as in the assumption that an age effect is not monotonically increasing (− ∞ < α < αm.i.) or not monotonically decreasing (αm.d. < α < + ∞). For example, one might assume that the probability of developing a particular cancer increases with age but allow for declines so long as they are not monotonically decreasing across all age groups. Furthermore, in some instances, it might make sense to assume a temporal effect is neither monotonically increasing nor monotonically decreasing, such that αm.d. < α < αm.i.. For example, when examining unemployment rates, it might make sense to rule out any monotonic effect across periods because it may violate theoretical assumptions regarding a business cycle (e.g., Burns and Mitchell 1946).
We can determine the implications of a monotonically increasing age effect by using our 2D-APC graph. The results for age are shown in panel b of Fig. 5. By assuming the age effect increases monotonically, we can conclude that the period slope must be negative and that the cohort slope must be positive. Although we have determined the direction of the period and cohort slopes, we may want finite bounds for our slopes. One possibility is to assume that the overall period effect is not monotonically decreasing. This is a weaker assumption than assuming the effects are monotonically increasing because we allow downward deviations as long as they are not always decreasing or staying the same across adjacent period groups.
However, in our case, we would like to assume that the overall period effect is not monotonically decreasing. Thus, the solid line in panel c of Fig. 4 is the most downward effect we are willing to consider; that is, we are assuming that the period slope is greater than –1. The consequences of this assumption are shown in panel c of Fig. 5, which restricts the age slope to be less than 3 and the cohort slope to be less than 2. Reflecting the weak assumptions of our constraint, we cannot make any decisive conclusions about the direction of any of the temporal slopes.
Notwithstanding, we can combine our assumptions to obtain finite bounds for all three slopes, as shown in panel d of Fig. 4. By assuming the overall age effect is monotonically increasing and the period effect is not monotonically decreasing, we can conclude that the age slope is between 2 and 3, the period slope is between –1 and 0, and the cohort slope is between 1 and 2.
The bounding strategies outlined here are flexible and can be combined in a multitude of ways. In the worst-case scenario in which we have no nonlinearities, we can still obtain finite bounds under certain conditions by specifying only the direction of two or more slopes. With the more realistic case of incorporating nonlinearities, we can include assumptions about the shape, size, and direction of the overall effects for specified ranges of each variable, greatly expanding the variety of bounding strategies available.
To demonstrate the applicability of our approach, we provide two empirical analyses. In the first example, we bound the temporal effects of the incidence of prostate cancer; in the second, we examine temporal effects in homicide rates. Both examples have received considerable attention in the APC literature (e.g., see Holford 1983; O’Brien 2015a).
A large body of research shows that the risk of prostate cancer increases with age and is diagnosed in very few people aged 50 years or younger (Holford 1983; James and Segal 1982). After this age, incidence rates increase greatly. The reason for this increase is not clear, but it is generally thought to be due to an accumulation of exposure risks combined with a loss of effectiveness in cellular repair mechanisms. Untangling temporal effects in prostate cancer is crucial for understanding likely causes, particularly societal or environmental, that may be increasing or decreasing the incidence. In light of this background knowledge, a reasonable assumption for APC modeling is that the risk of prostate cancer among men increases monotonically over at least some range of ages. To examine prostate cancer, we collected data from the Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute. The SEER program is the primary source of over-time population-based cancer incidence in the United States and is considered the “gold standard” for cancer registration data worldwide (Siegel et al. 2017). We used data on white men and black men with prostate cancer who were diagnosed from 1973 to 2013 among residents of nine geographic areas: Connecticut, Hawaii, Iowa, New Mexico, Utah, Atlanta, Detroit, Seattle–Puget Sound, and San Francisco–Oakland. Given the rarity of prostate cancer among young people, and following previous research, we restricted our analysis to men aged 40–44 to 85–89. We grouped the years into periods from 1970–1974 to 2010–2014.22 The outcome variable is the number of malignant prostate cancer diagnoses per 100,000 men for each age-period cell. We logged the outcome, given that it is highly skewed.
To identify what can be known from the data without prior constraints, we fit a classical linear regression model predicting the logged prostate cancer rate using age, period, and cohort as inputs with a design matrix that partitions the linear and nonlinear components (cf. Eq. (2)).23 From this model, we estimate that θ1 = α + π = 7.344 and θ2 = π + γ = 1.473.24 As discussed in the previous sections, the values of θ1 and θ2 define a solution line, which is visualized in Fig. 6, panel a. The nonlinearities are visualized in panels a–c of Fig. 7, with the horizontal dashed lines denoting the overall (or grand) mean in the data.25 The nonlinearities can be difficult to interpret because they are deviations from unknown linear effects. The easiest way to interpret these deviations is by recognizing that these are the temporal effects we would observe in the data if all the slopes were 0.26
At this point, we have exhausted what can be known from the data and we must use additional constraints to split the estimates of θ1 and θ2 into three separate age, period, and cohort slopes. An initial theoretical assumption might be that prostate cancer incidence increases monotonically from ages 40 to 85, which results in a minimum age slope of 8.104. This putatively weak constraint restricts the parameter space of the linear effects to the shaded region shown in Fig. 6, panel b. Under this assumption, we can conclude that the cohort linear effect is positive and that the period linear effect is negative. Importantly, this initial bounding analysis informs us that more recent birth cohorts are generally at greater risk of being diagnosed with prostate cancer than earlier cohorts.
To obtain finite bounds on the solution line, we need to apply an additional constraint. One plausible assumption is that the period slope is not monotonically decreasing because screening tests for prostate cancer have improved over calendar time, thereby enabling earlier treatment. Most notably in the United States, the prostate-specific antigen (PSA) blood test introduced in the 1980s may have reduced the apparent (but not actual) incidence of prostate cancer because of increased detection. The assumption of a period slope that is not monotonically decreasing results in a minimum period slope greater than or equal to –4.458, with corresponding bounds visualized in Fig. 6, panel c. Combining the assumption of a monotonically increasing age effect from 40 to 85 with a period slope that is not monotonically decreasing, we obtain the bounds shown in Fig. 6, panel d. With these two assumptions, we have restricted the temporal slopes to 8.110 ≤ α ≤ 11.800, –4.456 ≤ π ≤ –0.766, and 2.239 ≤ γ ≤ 5.929.
When we incorporate the constrained slopes with the identifiable nonlinearities, we obtain the overall effects shown in panels d–f of Fig. 7. The shaded areas in these panels indicate the range of estimates consistent with our assumptions; the dashed lines denote the average of the upper and lower bounds for each of the age, period, and cohort slopes. These dashed lines correspond to values of α = 9.955, π = –2.611, and γ = 4.084, respectively. From these results, reflecting basic theories of the causes of cancer over the life course and through calendar time, we can make several conclusions. First, regarding the period effect, prostate cancer incidence remained relatively flat until the early 1990s, after which it clearly decreased. Second, the cohort effect shows a major increase across birth years, starting with those born in the 1920s. This may reflect the fact that fewer men are dying of preventable diseases and conditions, given that prostate cancer occurs almost exclusively among older age groups. Further analyses, especially research that explicitly incorporates mechanisms for the temporal effects, can help explain these findings as well as assess the plausibility of these assumptions.
However, there are two important caveats regarding these results. First, our APC analysis examines prostate cancer incidence, not prostate cancer mortality, which may lead to different results using the same set of constraints (cf. Braga et al. 2017; Chang et al. 1997; Niclis et al. 2011). For instance, increasing incidence may reflect improved screening techniques, which may, in fact, coincide with an overall decline in prostate cancer mortality due to earlier treatment. Second, our findings are not inconsistent with studies that examine just one or two of the time effects in prostate cancer incidence. For example, Herget et al. (2015) showed that, across recent periods, the incidence of prostate cancer in the United States has decreased. However, because they did not adjust for both age and cohort, the observed differences across periods were confounded with aging and cohort effects. To our knowledge, no recent studies examining prostate cancer incidence have explicitly modeled all three temporal variables.
We now turn to our second empirical example: temporal effects in homicide arrest rates among men. For this analysis, we obtained data on homicide arrest rates collected from 1980 to 2012 by the Federal Bureau of Investigation (FBI) through the Uniform Crime Reporting (UCR) Program. The UCR Program offers the highest-quality historical arrest data in the United States, based on contributions from more than 18,000 agencies. Before analyzing the data, we collapsed the period groups into five-year intervals ranging from 1980–1984 to 2010–2014 and the age groups into five-year intervals ranging from 10–14 to 60–64.
The solution line for the linear components gives estimates of θ1 = α + π = −3.200 and θ2 = π + γ = −1.774.27 Both of these estimates suggest declines in the homicide rate across age and cohort groups, but these linear effects are aliased by the presence of the period slope. However, we can rule out from the data alone particular combinations of the slopes. Specifically, it is impossible that α and π are both positive or that π and γ are both positive. Any valid theory of changes in homicide arrest rates must assume that at least one of the temporal variables has a negative slope. The nonlinearities are shown in panels a–c of Fig. 8.28 As with the previous example, these can be difficult to interpret because they are deviations from an unknown linear effect, but the shape of the age nonlinearities is consistent with an underlying age-crime curve. As visualized in panel a of Fig. 8, if the age slope were 0, then we would indeed observe a sharp increase in the homicide arrest rate followed by a steady decline.
As discussed previously, violent crime is thought to decrease after young adulthood because of a variety of processes, including declining testosterone, greater economic resources, cognitive and analytical skill development, increased legal costs for criminal behavior, changing social norms, and involvement in long-term relationships (see Ulmer and Steffensmeier 2014). That is, we assume the existence of an age-crime curve, such that the crime rate increases monotonically to age 20–24 and decreases monotonically thereafter. This constraint is grounded in social, biological, and cultural theory, and is only as reliable as the theoretical assumptions on which it is based. Assuming that there is an age-crime curve of this form, then the age slope lies between −2.885 to 0, inclusive.29 If we assume that homicide arrest rates among men follow an age-crime curve, then we can rule out substantial sections of the parameter space. Specifically, we can conclude that the period slope must range from −3.200 to −0.321, and the cohort slope must range from −1.453 to 1.427. The corresponding overall effects are shown in panels d–f of Fig. 8. The shaded regions indicate the upper and lower bounds for each of the temporal effects, and the dashed lines indicate the average of the upper and lower bounds for each of the effects. Specifically, the dashed lines correspond to values of α = −1.440, π = −1.761, and γ = −0.013, respectively. Several important conclusions follow from this initial bounding analysis of homicide arrest rates. As shown in panels e and f of Fig. 8, we can conclude that the overall period effect did not increase from 1980 to 2010 and that homicide rates did not monotonically decrease or increase across cohorts.
So far we have examined bounding only the overall age effect for homicide arrest rates. We can obtain narrower bounds using additional assumptions. The emergence of crack cocaine is widely thought to have increased violent crime rates in the 1980s to early 1990s, including homicides (Fryer et al. 2013). An additional assumption, then, is that homicide arrest rates from the 1980s to mid-1990s did not uniformly decrease. Note that we are not assuming that there is a monotonic increase; rather, we are positing the weaker claim that there is no monotonic decrease, reflecting the assumption that the crime rate increased in some periods but decreased in others. We further weaken this assumption by restricting this constraint to the periods 1980–1984 to 1990–1994. Thus, we are making no assumptions directly about periods after the 1980–1994 epoch. With this assumption, as well as the assumption that the age effect is monotonically increasing up to those aged 20–24, we can again conclude that the slopes are negative for all three temporal variables. Specifically, we know that −2.885 ≤ α ≤ −1.990, −2.151 ≤ π ≤ −0.316, and −1.458 ≤ γ ≤ −0.724. When this information is combined with the nonlinearities (see Fig. 8, panels g, h, and i), we can conclude that age and cohort effects have been the primary sources of temporal variation in homicide arrest rates. The overall decline across cohorts is nearly monotonic, increasing very slightly only among cohorts born in the 1960s to early 1980s. This suggests that, although local variations likely exist, crime rates in the United States have plummeted mainly because of large differences between later and earlier birth cohorts.
Researchers in a variety of fields have sought to understand social change by identifying the effects of age, period, and cohort on a range of different outcomes. A variety of methods have been proposed to address the APC identification problem, most recently the hierarchical age-period-cohort (HAPC) (or multilevel) models and estimators based on the Moore-Penrose generalized inverse. In this article, we outline an alternative framework that entails placing bounds on the temporal effects. Our approach begins with identifying what can be known from the data alone with as few constraints as possible. Because full identification is impossible, we then illustrate how partial identification can be achieved based on constraining the expected size, sign, or overall shape of one or more of the temporal effects.
We make several important contributions in this article. First, following previous work on APC models, we show that although nonlinear components are fully identified, the linear components are not. Second, we demonstrate how the data provide us with identifiable linear combinations of the slopes even though each particular slope is unknown. These linear combinations allow us to derive general bounding formulas on the slopes that in turn form the basis of our analyses. Third, we introduce a novel graphical display, the 2D-APC graph, which can be used to evaluate the consequences of various constraints on the slopes. Finally, we present a variety of new bounding strategies, including not only those based on fixing the size and direction of one or more slopes but also those based on constraining just the sign of one or more slopes or applying a shape constraint to the overall effects. These techniques offer a considerably wider range of identification approaches than commonly used in the literature.
There are, however, two caveats for applied researchers interested in using bounds to partially identify APC effects. First, like all APC methods, the constraints outlined here are not verifiable or falsifiable from the data at hand and can be justified only by appealing to social, biological, or cultural theory or the inclusion of additional data (e.g., see Winship and Harding 2008). To reiterate, the recovery of the true, unknown APC effects is only as reliable as the theoretical assumptions on which the constraints are based. Theories of the underlying temporal processes may be fundamentally flawed, thereby leading to mistaken conclusions about APC effects. There is, in this sense, no ultimate resolution of the APC identification problem. Second, because of our focus on model identification, this article says only little about quantifying uncertainty due to sampling variability. Although the data sets APC analysts use typically have large samples, failing to account for sampling variability will nonetheless render bounds that are too narrow. Especially when using small samples, researchers should use bootstrapping techniques to quantify uncertainty due to random sampling.
Notwithstanding, our approach is superior to existing APC methods in at least four ways. First, our constraints are explicit rather than implicit. As Luo (2013) and O’Brien (2015a) have noted, many currently popular methods for APC analysis, such as the intrinsic estimator (IE) or HAPC model, use constraints that are implicit, resulting in confusion and misuse by methodologists and applied researchers. Second, our method is superior in that it is more general and flexible than other approaches. Rather than specifying just one type of restriction, such as equality constraints, we show how various constraints on the size, shape, or sign on one or more of the parameters can be used to identify the temporal effects. Third, our constraints entail weaker theoretical assumptions. We demonstrate that specifying the sign, size, or shape can yield bounds that are quite narrow yet based on general theoretical assumptions about life course or period effects. Finally, our approach provides a novel visualization of the identification problem that facilitates the comparison of multiple estimators. Because all APC methods must address the identification problem (Fienberg 2013; Mason and Smith 1985), other methods—including the IE and HAPC—produce point estimates that lie within our 2D-APC graphs.
At least two major extensions are possible and warrant further research. First, our framework has a clear Bayesian interpretation. Specifically, rather than zeroing out regions of the parameter space, one could place a prior distribution over one or more of the parameters. For example, one might use a gamma distribution as a prior for the age slope, reflecting the assumption that the slope ranges from 0 to positive infinity with some decreasing probability. To date, Bayesian APC analyses have instead focused primarily on using first- or second-order random walk priors on the APC effects, which can improve forecasts of disease rates (Riebler and Held 2017; Smith and Wakefield 2016; see also Nakamura 1986). Second, one can use our approach to conduct a sensitivity analysis of an APC model that includes mechanisms or proxy variables. Note that a sensitivity analysis and a bounds analysis are fundamentally identical (Morgan and Winship 2015). In general, a bounds analysis begins with weak assumptions and then uses stronger assumptions to narrow the bounds (in principle, up to a point estimate). In contrast, a sensitivity analysis begins with strong assumptions (a point estimate) and then weakens those assumptions, yielding a set of upper and lower bounds. Although mechanisms and proxy variables introduce additional information into a traditional APC analysis, thereby allowing for point identification, they are subject to potentially strong assumptions regarding model specification. If a researcher believes that important mechanisms or proxies are absent from the model, a sensitivity analysis can be used to evaluate the robustness of the estimated results.
Over the past several decades, a great deal of progress has been made on developing new methods for APC analysis. However, much work remains. Given the importance of estimating APC effects for understanding the basic contours of population-level change, it is vital that researchers ground their analyses in overt theoretical considerations and empirically based constraints. Following earlier work by Holford (1985), it is our belief that the bounds analysis presented here:
… avoids the mystery of obtaining a completely different set of parameter estimates each time a different constraint is used. Instead, it points out exactly where the problem is and indicates which inferences can be made without ambiguity and which cannot. By presenting the data in this way, the interested reader can readily modify the results to see the effect of different assumptions, and translate the parameters accordingly. (p. 3)
Following the convention in the APC literature, we use the shorthand of “effects” when referring to age, period, and cohort processes (e.g., Fienberg and Mason 1979; Glenn 1981; Mason et al. 1973; O’Brien 2015a; Yang and Land 2013a:). We discuss the issue of interpreting the coefficients from an APC model in the online appendix.
For example, researchers have examined verbal ability (Alwin 1991; Hauser and Huang 1997; Wilson and Gove 1999; Yang and Land 2006), social trust (Clark and Eisenstein 2013; Putnam 1995; Robinson and Jackson 2001; Schwadel and Stout 2012), party identification (Bartels and Jackman 2014; Ghitza and Gelman 2014; Hout and Knoke 1975; Tilley and Evans 2014), religious affiliation (Chaves 1989; Firebaugh and Harley 1991), drug use (Chen et al. 2003; Kerr et al. 2004; O’Malley et al. 1984; Vedøy 2014), obesity (Diouf et al. 2010; Fu and Land 2015; Reither et al. 2009), cancer (Clayton and Schifflers 1987; Liu et al. 2001), and mental health (Lavori et al. 1987; Lewinsohn et al. 1993; Yang 2008).
Researchers have used various terms in the literature to refer to the linear and nonlinear effects of an APC model. In this article, we refer to the linear effects as “slopes” or “linear effects”; conversely, we refer to the nonlinear effects as “nonlinearities” or “deviations.” By “effects” or “overall effects,” we refer to the combination of the linear and nonlinear effects.
However, APC analysis arguably dates back to at least the 1860s, predating the eponymous diagrams of Wilhelm Lexis (see Keiding 2011).
As Rodgers (1982a:785) cautioned, “Although a constraint of the type described by Mason et al. (1973; 1979) seems trivial, in fact it is exquisitely precise and has effects that are multiplied so that even a slight inconsistency between the constraint and reality, or small measurement errors, can have very large effects on estimates.” However, see also the reply by Smith et al. (1982) as well as the rejoinder by Rodgers (1982b).
For example, with reference to the IE, Yang and Land (2013a:119) noted that “the objective of the IE is not to estimate the unidentifiable regression coefficient vector.” That is, the IE finds the point on the solution line closest to the origin in terms of Euclidean distance, but it does not necessarily recover the actual age, period, and cohort effects.
For simplicity of exposition we assume that age and period are aggregated into intervals of equal width. Additional complications arise when the age and period intervals are not equally spaced, because this can generate artifactual cyclical patterns. For approaches to estimating temporal effects when age and period intervals are unequal, see Holford (2006).
Alternatively, one could fix the parameters at one of the levels to 0. By convention, researchers typically fix to 0 the first set of levels (e.g., αi = 1 = πj = 1 = γk = 1 = 0) or the last set (e.g., αi = I= πj = J = γk = K = 0), although other sets could be used.
They are equivalent in the sense that as basis vectors they span the same space.
The null vector is unique up to multiplication by a scalar.
A simple linear transformation can be used to convert agei to i – i* because i – i* = (agei – age*) / (Δage), where age* is the midpoint for all age groups, and ∆age is the fixed difference between the midpoints. For example, suppose we have age1 = 32, age2 = 37, age3 = 42, age4 = 47, and age5 = 52. The midpoint across all age groups is 42, and the fixed difference between the groups is 5. Thus, we can calculate that age1 = 32 equals (32 – 42) / 5 = –2, which is equivalent to i – i* = 1 – 3 = –2.
There is considerable disagreement in the social science and statistics literature on the causal status of nonmanipulable variables. As Rubin (1986) and Holland (1986) argued, such variables do not themselves have well-defined causal effects. However, in Pearl’s (2009) framework, these variables may be ascribed a causal status, with corresponding counterfactuals, even though they are not manipulable.
One could also estimate the θs using Yijk = μ + (periodj)(θ1) + (cohortk)(θ2 – θ1) + εijk or Yijk = μ + (agei)(θ1 – θ2) + (periodj)(θ2) + εijk.
The estimates when the age slope is constrained to equal 0 are α* = 0, π* = θ1, and γ* = θ2 – θ1, and the corresponding estimates when the cohort slope is fixed to 0 are α* = θ1 – θ2, π* = θ2, and γ* = 0.
Figure 1 is a graphical representation of the APC identification problem. If there were no linear dependency, we would have three planes intersecting at a single point in the parameter space (O’Brien 2015a).
The age-period-cohort origin is (0, 0, 0), but this is not directly visible on the 2D-APC graph unless θ2 − θ1 = 0.
The –1 slope relating period to age and cohort as well as the differing direction of the period axis relative to the age and cohort axes in the 2D-APC graph are reflected in the opposing sign of ν in Eq. (4) for the period slope compared with that for the age and cohort slopes.
Likewise, each plane in Fig. 1 can be thought of as a function of linear equations based on θ1 and θ2. For example, the age-period plane is defined by θ1 = α + π, so it is equivalent to the linear equation π = θ1 − α. Similarly, the period-cohort plane can be thought of as a function of θ2 = π + γ, with π = θ2 − γ.
As a reviewer noted, some of these nonlinearities could be noise. One way to address possible noise is to smooth out the nonlinearities by setting the parameters for the higher-order nonlinearities to 0. Alternatively, one could use natural cubic splines, treating age, period, and cohort as continuous rather than categorical variables (see Heuer 1997).
The raw data are yearly, ranging from 1973 to 2013.
Nearly identical results were obtained with Poisson regression, but for ease of exposition, we present our findings using a classical linear regression with a logged rate outcome.
Estimated values of both θ1 and θ2 are statistically significant at the conventional threshold of .05.
F tests indicate that all three time scales have statistically significant nonlinearities at the conventional threshold of .05.
However, this interpretation is not strictly correct because we can estimate the values of θ1 and θ2. For example, if we assume that the period slope is 0, then the age slope must be 7.344 rather than 0.
Estimated values of both θ1 and θ2 are statistically significant at the conventional threshold of .05.
F tests show that at the conventional threshold of .05, all three time scales exhibit statistically significant nonlinearities.
The 2D-APC graph of the solution line is available in the online appendix.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.