This article explores an important property of the intrinsic estimator that has received no attention in literature: the age, period, and cohort estimates of the intrinsic estimator are not unique but vary with the parameterization and reference categories chosen for these variables. We give a formal proof of the non-uniqueness property for effect coding and dummy variable coding. Using data on female mortality in the United States over the years 1960–1999, we show that the variation in the results obtained for different parameterizations and reference categories is substantial and leads to contradictory conclusions. We conclude that the non-uniqueness property is a new argument for not routinely applying the intrinsic estimator.
The December 2013 issue of this journal included contributions addressing the intrinsic estimator (IE). Fienberg (2013), Held and Riebler (2013), Luo (2013), and O’Brien (2013) all seem to agree that the IE is of rather limited value for simultaneously estimating age, period, and cohort effects. Luo (2013), for instance, demonstrated that IE estimates are biased when the true parameters of age, period, and cohort show a linear trend that diverges from the one implied by the IE constraint. She concluded that the IE should not be used. In contrast, in their reply to Luo, the IE developers Yang and Land (2013) remained convinced about IE’s potential for practical research.
Which conclusion must one draw from these diverging expert viewpoints on the IE? Should researchers still consider using the IE, given that its application is validated by the three-step procedure proposed by Yang and Land, or should they abandon the IE in favor of other models?
In this article, we demonstrate the non-uniqueness of the IE, an important property of the IE that has been overlooked in the aforementioned discussion. This property presents a new perspective on the IE method and may have consequences for future use.
We first explain the IE without the use of matrix algebra. Next, we prove the non-uniqueness property. Finally, we show the different results obtained by applying different IE solutions to fictitious data as well as to real data on female mortality. Based on our findings, we recommend that researchers not apply the IE routinely to their data, even if the three-step procedure suggested by Yang and Land (2013) would justify its use.
The Intrinsic Estimator
The IE Constraint
Plugging this estimate of into the expressions given in (7) for α1L, β1L, γ1L, and γ2L leads to the IE estimates for these four remaining parameters. To summarize, the IE estimates of the regression parameters in (2) are those OLS estimates that have the smallest sum of squares for the five collinear variables A1L, P1L, C1L, C2L, and C4L. In the next section, we will show that the IE estimates are not unique but depend on which categories of age, period, and cohort are omitted and on the type of parameterization that is used.
The Non-uniqueness of the Intrinsic Estimator
In the previous section, we showed that the IE obtains estimates with a minimum sum of squares of those parameters that are not identifiable because of collinearity. The IE estimates, however, depend on both the choice of omitted categories and the type of parameterization applied. As an example of the dependence on omitted categories, we derive the IE estimates when the first categories of age, period, and cohort are omitted instead of the last ones, the latter being the default in Yang et al. (2004).
Effect Coding With First Categories Omitted
(See appendix A for the proof of (8a)). For the fourth cohort, the deviation from the mean equals , which differs from the deviation when the last categories were omitted. In contrast, the estimate of the mean is the same regardless of whether it is the last or the first categories that are omitted. In identified models, changing the omitted category has no impact on the deviations from the mean, but here it does. As a demonstration, we applied the default IE with the last categories omitted, as well as the alternative IE with the first categories omitted, to a fictitious data set for three ages, three periods, and five cohorts. For each of the nine age/period combinations, we simulated data for 100 people.1 Table 1 contains the mean of the dependent variable for each combination. We used linear regression, the results of which are shown in Fig. 1.
In Fig. 1, the age and period trends of both IE solutions differ when moving from year 2 to year 3: the solid line shows an increase in the predicted Y value for age and a decrease for period, whereas the dashed line shows the opposite. For cohort, moving from year 1 to year 2 results in an increase or a decrease of the prediction, depending on the omitted categories chosen. Notice that for the midpoints of age, period, and cohort, both IE solutions yield the same predictions. This is congruent with Eq. (6), which shows that the estimates for the middle categories are identifiable without additional restrictions.
Because the type of parameterization in the default IE is effect coding with the unweighted mean as reference point, the sensitivity will be lower the more categories of age, period, and cohort are available. This is illustrated in Fig. 2, which is based on the same data set on U.S. female mortality as Yang et al. (2004, 2008) used. These data are from the Berkeley Human Mortality Database and include 19 age categories, 8 periods, and 26 birth cohorts.
For each of the possible 19 × 8 × 26 = 3,952 triplets of omitted categories, we calculated the corresponding set of effect-coded IE estimates, using the Poisson regression model that Yang et al. (2004, 2008) used. In Fig. 2 we show the triplets with the lowest and the highest linear trends in age, period, and cohort, as well as the default IE (last categories omitted).2 To find the triplets with lowest and highest linear trends, we implemented the principal component regression method in the software package R. As Fig. 2 demonstrates, it is of almost no importance which triplet of categories is omitted, the general tendency being more or less the same. In sum, the IE estimates with effect coding depend on the choice of which categories are omitted. This undesired sensitivity becomes less strong the more categories of age, cohort, and period are being used—an important consideration given that data limitations may force researchers to work with a limited set of periods, while the set of age and cohort categories is a lesser problem under normal circumstances. In any event, it seems advisable to check for sensitivity to the chosen omitted categories when using the IE and effect coding.
Dummy Variable Coding
We have shown that the IE estimates depend on the choice of which categories are omitted. In more general terms, the IE estimates depend on the design matrix that is chosen before estimation takes place. This matrix is defined by the choice of omitted categories and the type of parameterization one wishes to apply. For instance, a researcher may be interested in developments from the first period, youngest age, and/or oldest cohort onward. In that case, a dummy variable parameterization may be appropriate, with the first categories (rather than the means) as points of reference. In identified models, results from any parameterization can be transformed into the results from any other parameterization. However, for IE models this is not true: a solution obtained with effect coding is different from a solution based on dummy variable coding. Interestingly, although these solutions yield different estimates, they both have the IE properties Yang et al. (2004) outlined: each minimizes its own particular sum of squared estimates, and the associated standard errors of these estimates are minimal. Appendix B contains the proof that IE estimates with dummy variable coding as a rule differ from the default IE estimates using effect coding.
Using the U.S. female mortality database, we will now demonstrate how, with dummy variable coding, the IE estimates vary with the reference categories chosen. Again, for each of all possible 19 × 8 × 26 = 3,952 triplets of omitted categories, we obtained the corresponding set of dummy variable–coded IE estimates. Next, we selected the sets with the lowest and the highest linear tendency in age, period, and cohort, which were found for the triplets (19, 5, 14) and (10, 4, 26). These two sets of IE estimates are plotted in Fig. 3 along with the estimates when the first categories act as points of reference.
Figure 3 shows that for age and even more so for period and cohort, the differences between the three dummy variable–coded IE solutions are substantial. For age, the dashed curve predicts less mortality at the end of the life cycle than at the beginning, whereas the other two IEs show the opposite. For period, the dotted IE and the IE with the first categories omitted show a significant decline in mortality over the years, whereas for the dashed IE mortality increases. For cohort, the dotted IE estimates show little variation in mortality over the years, whereas the other two IEs suggest a small and big decline in mortality risk, respectively. Comparing Figs. 2 and 3 shows that dummy variable coding produces more variability in the estimates of age, period, and cohort than does effect coding. Further, choosing the first age, period, and cohort year as (starting) points of reference rather than the default IE, a researcher would obtain different results, particularly with respect to the period (finding a decrease instead of a slight increase in mortality) and the cohort trend (finding a less steep decline in mortality).
In this article, we showed that there is no unique set of IE estimates but that there exist many, each corresponding to a particular type of parameterization and a particular triplet of omitted categories. Using fictitious and real data, we demonstrated that different IEs can lead to different conclusions about age, period, and cohort trends. With many time points, the IE using effect coding seems relatively robust for the choice of omitted categories. However, with a limited number of time points, which will most likely occur for period in actual research, the effect-coded IE can lead to quite different results for different omitted categories. The IE based on dummy coding seems more sensitive to the choice of categories that act as points of reference, even if the number of time points under study is large.
A consequence of our findings is that IE users may have to consider which IE best fits their needs. From a mathematical point of view, it is difficult, if not impossible, to prefer one particular parameterization and set of omitted categories: each IE has desirable properties with respect to a different set of parameters, determined by the chosen parameterization and omitted categories (Yang et al. 2004). The default IE Yang et al. (2004) proposed is special in the sense that it minimizes the sum of squared deviations from the mean (i.e., the variance of the estimates). Some researchers may consider this a desirable property: if one is completely agnostic about age, period, and cohort influences, one may prefer a “conservative” estimation method, which favors a small variance. Yet, in our opinion, other parameterizations can be equally valid. For example, other researchers may favor estimates that show the smallest (sum of squared) changes compared with some reference year of age, period, and cohort and therefore choose a dummy variable parameterization. Yet other researchers may prefer estimates that show the smallest (sum of squared) changes compared with the immediately preceding time point and hence use so called repeatedly-coded dummy variables.
Our point is that if researchers decide to use the IE, they must be aware of the different possibilities that may lead to different results. For instance, Yang et al. (2004:100) stated that the reason for the slow or nonexistent increase in mortality for period in the middle panel of our Fig. 2 is not clear, and they hypothesized that it may be partly due to increasing rates of cigarette smoking in females. However, with the first categories as references, mortality decreases slightly over the four decades (as shown in the middle panel of Fig. 3), which seems equally, if not more, plausible. Because of such possibly divergent conclusions, we agree with Yang et al. (2013) that the IE should never be used routinely, but for an additional reason: even if the three-step procedure Yang et al. recommended is carefully conducted, the question concerning parameterization and omitted categories remains to be answered before applying the IE. In this respect, the IE is more similar to constrained generalized linear models (CGLIM) than one may think, given that both types of models depend on a constraint that has to be chosen before analyzing. In CGLIM, each pair of equaled categories corresponds to a different constraint and thus to different estimates. In the IE, each choice of parameterization and/or omitted categories corresponds to a different constraint and hence to different estimates.
Earlier in the article, we noted that one argument in favor of using the default IE is that researchers, from an agnostic point of view, may prefer estimates for the age, period, and cohort categories with the smallest variance possible. The minimization criterion of the default IE, however, does not involve the estimates of the omitted categories. Instead of minimizing in our fictitious example, one could minimize , which is the sum of squares of estimates of all parameters that are not identified, including the three omitted categories. It is noteworthy that this criterion leads to estimates that are independent of the omitted categories, as opposed to the IE criterion.
To summarize, we showed that the IE has a non-uniqueness property, raising the question of which IE to choose. When the researcher aims for minimum variance, the IE with effect coding would be the obvious choice; effect coding, however. does not provide the smallest variance across all APC categories because the omitted categories are not part of the constraint. When the researcher aims for the smallest (squared) deviations from some reference year, the dummy variable–coded IE, which we demonstrated in an application in this article, would be more appropriate. Our findings may have implications for past research using IE and for future research considering the use of IE.
We end by noting that in this article, we did not explore the issue of the IE being biased with respect to the “true” data-generating parameters. This has been discussed thoroughly in other contributions (see the December 2013 issue of Demography). Our findings demonstrate bias in the sense that different IEs lead to different results.
Appendix A: Proof of the Non-uniqueness of the IE in Case of Effect Coding
Apparently, the estimate depends on the value of − α3F + β3F + γ2F − 2γ5F, whereas the estimate depends on the value of α1L − β1L + 2γ1L + γ2L. In general, with actual data, − α3F + β3F + γ2F − 2γ5F will not be equal to α1L − β1L + 2γ1L + γ2L. To see this, note that the boldfaced parameters in (8) and (8a) represent deviations from the means β0L and β0F in Eqs. (5) and (5a), respectively. Both (5) and (5a) are estimable because of the constraint that the deviation for the fourth cohort is equal to 0; that is, γ4L = 0 and γ4F = 0. As a consequence of using the same constraint in (5) and (5a), all α estimates with the same subscript are equal in both equations; the same holds for all β estimates with the same subscript and for all γ estimates with the same subscript. For example, α2L = α2F. Also, α3L (to be derived as − α1L − α2L) is equal to α3F, as estimated with Eq. (5a). The estimates of the means β0L and β0F are equal as well. Thus, proving that − α3F + β3F + γ2F − 2γ5F ≠ α1L − β1L + 2γ1L + γ2L boils down to proving that − α3L + β3L + γ2L − 2γ5L ≠ α1L − β1L + 2γ1L + γ2L. In this last inequality, the expression to the left of the inequality sign contains different deviations from the mean β0L than the expression to the right. As a consequence, the values of and will usually differ for actual data. Also, the parameter estimates that depend on the values of and will generally be different: for example, , whereas .3
In this appendix, we show why with dummy variable coding, the IE estimates generally differ from the “default” IE estimates Yang et al. (2004) presented. Instead of standard 0 and 1 coded dummy variables, we subtract 1 / k from these dummy variables, with k denoting the number of categories—that is, 3 for age and period, and 5 for cohort. Subtracting the constant 1 / k from the original 0 and 1 coded dummy variables does not change the interpretation of their regression coefficients (i.e., the deviation from the omitted category). If we omit the last age category, the dummy variables A1D and A2D for the first two ages have the coding scheme shown in Table 2.
If we compare this IE estimator of with of Eq. (8), it is obvious that the IE estimate for the fourth cohort effect will usually differ between the effect-coded IE as proposed by Yang et al. (2004) and the dummy variable–coded IE (with the last categories as the omitted ones for both codings). The same holds for the IE estimates of the remaining categories of the three APC variables. Similar to the IE with effect coding, each triplet of reference categories of age, period, and cohort leads to different estimates when dummy variable coding is used, which we do not elaborate here.
For the APC models that we discuss in this article, having the same number of units in each combination of age and period is not required.
We determined the linear trend in, for example, age by a simple ordinary least squares (OLS) regression on the 19 age estimates. We used the highest and lowest linear trends only to demonstrate the variability in IE estimates when different sets of categories are omitted.
The Stata user added routine apc_ie provides deviation contrast IE estimates with only the last categories omitted. To obtain estimates with the first categories omitted, one could “mirror” the three APC variables so that the highest age, period, and cohort values become the lowest.
Then, to obtain, for example, the value of γ3D, calculate γ3D = γ3* + 2γ5L.
For any two ages i and j, the predicted values (controlling for period and cohort) based on Eq. (5) are β0L + αiL and β0L + αjL. The difference between these predictions equals αiL − αjL, which represents the deviation of age i from age j and hence is equal to αiD.