This article explores an important property of the intrinsic estimator that has received no attention in literature: the age, period, and cohort estimates of the intrinsic estimator are not unique but vary with the parameterization and reference categories chosen for these variables. We give a formal proof of the non-uniqueness property for effect coding and dummy variable coding. Using data on female mortality in the United States over the years 1960–1999, we show that the variation in the results obtained for different parameterizations and reference categories is substantial and leads to contradictory conclusions. We conclude that the non-uniqueness property is a new argument for not routinely applying the intrinsic estimator.


The December 2013 issue of this journal included contributions addressing the intrinsic estimator (IE). Fienberg (2013), Held and Riebler (2013), Luo (2013), and O’Brien (2013) all seem to agree that the IE is of rather limited value for simultaneously estimating age, period, and cohort effects. Luo (2013), for instance, demonstrated that IE estimates are biased when the true parameters of age, period, and cohort show a linear trend that diverges from the one implied by the IE constraint. She concluded that the IE should not be used. In contrast, in their reply to Luo, the IE developers Yang and Land (2013) remained convinced about IE’s potential for practical research.

Which conclusion must one draw from these diverging expert viewpoints on the IE? Should researchers still consider using the IE, given that its application is validated by the three-step procedure proposed by Yang and Land, or should they abandon the IE in favor of other models?

In this article, we demonstrate the non-uniqueness of the IE, an important property of the IE that has been overlooked in the aforementioned discussion. This property presents a new perspective on the IE method and may have consequences for future use.

We first explain the IE without the use of matrix algebra. Next, we prove the non-uniqueness property. Finally, we show the different results obtained by applying different IE solutions to fictitious data as well as to real data on female mortality. Based on our findings, we recommend that researchers not apply the IE routinely to their data, even if the three-step procedure suggested by Yang and Land (2013) would justify its use.

The Intrinsic Estimator

To explain how the IE parameters for age, period, and cohort are estimated, we use fictitious data that are limited to three periods, three ages, and five cohorts. With these data, we explain the IE in an accessible way without relying on matrix notation. We start our explanation with the so-called APC accounting, or multiple classification equation, which in regression format reads as follows:
In Eq. (1), Y denotes the value of the dependent variable for a given unit (typically a person), and Ai, Pj, and Ck represent independent dummy variables indicating whether the unit belongs to age i, period j, and cohort k. Further, β0 denotes the intercept, and e represents the unit’s error term. To estimate the parameters αi, βj, and γk, we follow Yang et al. (2004) and apply the following constraints:
Given the first constraint, it follows that in Eq. (1), α3 = − α1 − α2; hence, we can rewrite the age effects in Eq. (1) as follows:
In this last equation, α3 is omitted; its value can be directly derived from α1 and α2. In addition, the three age dummy variables in (1) have been replaced by the two differences A1A3 and A2A3. In the same way, we can substitute − β1 − β2 for β3 and − γ1 − γ2 − γ3 − γ4 for γ5 in (1). The resulting equation then is
where superscript L denotes the elimination of the effects of the last age, period, and cohort categories (i.e., α3L, β3L, and γ5L) from the equation. Note that AiL = Ai − A3, PjL = PjP3, and CkL = CkC5. The variables AiL, PjL, and CkL take the value of 1 if a case belongs to the subscripted age, period, and cohort; the value of 0 if it does not; and the value of –1 if the case belongs to the last (omitted) category. According to these codings, the expectation of Y for the three age categories equals β0L + α1L, β0L + α2L, and β0L − α1L − α2L when controlling for period and cohort. The mean of these three expectations equals β0L. The mean of the expectations of Y for the three periods and for the five cohorts also equals β0L. Consequently, the parameters αiL, βjL, and γkL represent deviations from the mean, β0L, for the given categories of age, period, and cohort. This type of parameterization is known as “effect coding” (Hardy 1993).

The IE Constraint

The APC parameters in Eq. (2) still cannot be estimated because of perfect dependency between the independent variables. For example, C4L can be written as a perfect linear combination of other variables in (2):
If, in Eq. (2), we substitute for C4L the expression given in Eq. (3) and rearrange terms, we obtain the following:
Equation (4) can be written more compactly:
Because variable C4L has been eliminated in Eq. (5), the perfect dependency no longer exists; hence, the regression parameters in (5) can be estimated. To emphasize that the parameters in (5) are estimable, we use boldfaced characters. For the parameters in (4) and (5), the following equalities hold:
The four estimates in (6) are produced by the IE proposed by Yang (2004). Further, α1L in (5) equals α1L + γ4L in (4)—that is, α1L = α1L − γ4L. Similarly, the other three parameters—β1L, γ1L, and γ2L—are related to γ4L. In total then, we have
The expressions in (7) show that when an estimate of γ4L is found, the estimates of α1L, β1L, γ1L, and γ2L simply follow, given the estimates of the boldfaced parameters. However, because of the dependencies in (7), an extra constraint is needed to obtain estimates of α1L, β1L, γ1L, γ2L, and γ4L. The constraint that the IE applies consists of the minimization of the sum of squares of these five parameters. For the sum of the squared estimates of these parameters, we can write:
The minimum value of this sum of squares can be found by setting the first-order derivative with respect to γ^4L equal to 0, leading to:
and thus to the following IE estimate of the true value of γ4:

Plugging this estimate of γ^4L into the expressions given in (7) for α1L, β1L, γ1L, and γ2L leads to the IE estimates for these four remaining parameters. To summarize, the IE estimates of the regression parameters in (2) are those OLS estimates that have the smallest sum of squares for the five collinear variables A1L, P1L, C1L, C2L, and C4L. In the next section, we will show that the IE estimates are not unique but depend on which categories of age, period, and cohort are omitted and on the type of parameterization that is used.

The Non-uniqueness of the Intrinsic Estimator

In the previous section, we showed that the IE obtains estimates with a minimum sum of squares of those parameters that are not identifiable because of collinearity. The IE estimates, however, depend on both the choice of omitted categories and the type of parameterization applied. As an example of the dependence on omitted categories, we derive the IE estimates when the first categories of age, period, and cohort are omitted instead of the last ones, the latter being the default in Yang et al. (2004).

Effect Coding With First Categories Omitted

If one omits the first categories of age, period, and cohort, Eq. (8) turns into

(See appendix A for the proof of (8a)). For the fourth cohort, the deviation from the mean equals γ^4F, which differs from the deviation γ^4L when the last categories were omitted. In contrast, the estimate of the mean is the same regardless of whether it is the last or the first categories that are omitted. In identified models, changing the omitted category has no impact on the deviations from the mean, but here it does. As a demonstration, we applied the default IE with the last categories omitted, as well as the alternative IE with the first categories omitted, to a fictitious data set for three ages, three periods, and five cohorts. For each of the nine age/period combinations, we simulated data for 100 people.1 Table 1 contains the mean of the dependent variable for each combination. We used linear regression, the results of which are shown in Fig. 1.

In Fig. 1, the age and period trends of both IE solutions differ when moving from year 2 to year 3: the solid line shows an increase in the predicted Y value for age and a decrease for period, whereas the dashed line shows the opposite. For cohort, moving from year 1 to year 2 results in an increase or a decrease of the prediction, depending on the omitted categories chosen. Notice that for the midpoints of age, period, and cohort, both IE solutions yield the same predictions. This is congruent with Eq. (6), which shows that the estimates for the middle categories are identifiable without additional restrictions.

Because the type of parameterization in the default IE is effect coding with the unweighted mean as reference point, the sensitivity will be lower the more categories of age, period, and cohort are available. This is illustrated in Fig. 2, which is based on the same data set on U.S. female mortality as Yang et al. (2004, 2008) used. These data are from the Berkeley Human Mortality Database and include 19 age categories, 8 periods, and 26 birth cohorts.

For each of the possible 19 × 8 × 26 = 3,952 triplets of omitted categories, we calculated the corresponding set of effect-coded IE estimates, using the Poisson regression model that Yang et al. (2004, 2008) used. In Fig. 2 we show the triplets with the lowest and the highest linear trends in age, period, and cohort, as well as the default IE (last categories omitted).2 To find the triplets with lowest and highest linear trends, we implemented the principal component regression method in the software package R. As Fig. 2 demonstrates, it is of almost no importance which triplet of categories is omitted, the general tendency being more or less the same. In sum, the IE estimates with effect coding depend on the choice of which categories are omitted. This undesired sensitivity becomes less strong the more categories of age, cohort, and period are being used—an important consideration given that data limitations may force researchers to work with a limited set of periods, while the set of age and cohort categories is a lesser problem under normal circumstances. In any event, it seems advisable to check for sensitivity to the chosen omitted categories when using the IE and effect coding.

Dummy Variable Coding

We have shown that the IE estimates depend on the choice of which categories are omitted. In more general terms, the IE estimates depend on the design matrix that is chosen before estimation takes place. This matrix is defined by the choice of omitted categories and the type of parameterization one wishes to apply. For instance, a researcher may be interested in developments from the first period, youngest age, and/or oldest cohort onward. In that case, a dummy variable parameterization may be appropriate, with the first categories (rather than the means) as points of reference. In identified models, results from any parameterization can be transformed into the results from any other parameterization. However, for IE models this is not true: a solution obtained with effect coding is different from a solution based on dummy variable coding. Interestingly, although these solutions yield different estimates, they both have the IE properties Yang et al. (2004) outlined: each minimizes its own particular sum of squared estimates, and the associated standard errors of these estimates are minimal. Appendix B contains the proof that IE estimates with dummy variable coding as a rule differ from the default IE estimates using effect coding.

Using the U.S. female mortality database, we will now demonstrate how, with dummy variable coding, the IE estimates vary with the reference categories chosen. Again, for each of all possible 19 × 8 × 26 = 3,952 triplets of omitted categories, we obtained the corresponding set of dummy variable–coded IE estimates. Next, we selected the sets with the lowest and the highest linear tendency in age, period, and cohort, which were found for the triplets (19, 5, 14) and (10, 4, 26). These two sets of IE estimates are plotted in Fig. 3 along with the estimates when the first categories act as points of reference.

Figure 3 shows that for age and even more so for period and cohort, the differences between the three dummy variable–coded IE solutions are substantial. For age, the dashed curve predicts less mortality at the end of the life cycle than at the beginning, whereas the other two IEs show the opposite. For period, the dotted IE and the IE with the first categories omitted show a significant decline in mortality over the years, whereas for the dashed IE mortality increases. For cohort, the dotted IE estimates show little variation in mortality over the years, whereas the other two IEs suggest a small and big decline in mortality risk, respectively. Comparing Figs. 2 and 3 shows that dummy variable coding produces more variability in the estimates of age, period, and cohort than does effect coding. Further, choosing the first age, period, and cohort year as (starting) points of reference rather than the default IE, a researcher would obtain different results, particularly with respect to the period (finding a decrease instead of a slight increase in mortality) and the cohort trend (finding a less steep decline in mortality).


In this article, we showed that there is no unique set of IE estimates but that there exist many, each corresponding to a particular type of parameterization and a particular triplet of omitted categories. Using fictitious and real data, we demonstrated that different IEs can lead to different conclusions about age, period, and cohort trends. With many time points, the IE using effect coding seems relatively robust for the choice of omitted categories. However, with a limited number of time points, which will most likely occur for period in actual research, the effect-coded IE can lead to quite different results for different omitted categories. The IE based on dummy coding seems more sensitive to the choice of categories that act as points of reference, even if the number of time points under study is large.

A consequence of our findings is that IE users may have to consider which IE best fits their needs. From a mathematical point of view, it is difficult, if not impossible, to prefer one particular parameterization and set of omitted categories: each IE has desirable properties with respect to a different set of parameters, determined by the chosen parameterization and omitted categories (Yang et al. 2004). The default IE Yang et al. (2004) proposed is special in the sense that it minimizes the sum of squared deviations from the mean (i.e., the variance of the estimates). Some researchers may consider this a desirable property: if one is completely agnostic about age, period, and cohort influences, one may prefer a “conservative” estimation method, which favors a small variance. Yet, in our opinion, other parameterizations can be equally valid. For example, other researchers may favor estimates that show the smallest (sum of squared) changes compared with some reference year of age, period, and cohort and therefore choose a dummy variable parameterization. Yet other researchers may prefer estimates that show the smallest (sum of squared) changes compared with the immediately preceding time point and hence use so called repeatedly-coded dummy variables.

Our point is that if researchers decide to use the IE, they must be aware of the different possibilities that may lead to different results. For instance, Yang et al. (2004:100) stated that the reason for the slow or nonexistent increase in mortality for period in the middle panel of our Fig. 2 is not clear, and they hypothesized that it may be partly due to increasing rates of cigarette smoking in females. However, with the first categories as references, mortality decreases slightly over the four decades (as shown in the middle panel of Fig. 3), which seems equally, if not more, plausible. Because of such possibly divergent conclusions, we agree with Yang et al. (2013) that the IE should never be used routinely, but for an additional reason: even if the three-step procedure Yang et al. recommended is carefully conducted, the question concerning parameterization and omitted categories remains to be answered before applying the IE. In this respect, the IE is more similar to constrained generalized linear models (CGLIM) than one may think, given that both types of models depend on a constraint that has to be chosen before analyzing. In CGLIM, each pair of equaled categories corresponds to a different constraint and thus to different estimates. In the IE, each choice of parameterization and/or omitted categories corresponds to a different constraint and hence to different estimates.

Earlier in the article, we noted that one argument in favor of using the default IE is that researchers, from an agnostic point of view, may prefer estimates for the age, period, and cohort categories with the smallest variance possible. The minimization criterion of the default IE, however, does not involve the estimates of the omitted categories. Instead of minimizing α^1L2+β^1L2+γ^1L2+γ^2L2+γ^4L2 in our fictitious example, one could minimize α^1L2+α^3L2+β^1L2+β^3L2+γ^1L2+γ^2L2+γ^4L2+γ^5L2, which is the sum of squares of estimates of all parameters that are not identified, including the three omitted categories. It is noteworthy that this criterion leads to estimates that are independent of the omitted categories, as opposed to the IE criterion.

To summarize, we showed that the IE has a non-uniqueness property, raising the question of which IE to choose. When the researcher aims for minimum variance, the IE with effect coding would be the obvious choice; effect coding, however. does not provide the smallest variance across all APC categories because the omitted categories are not part of the constraint. When the researcher aims for the smallest (squared) deviations from some reference year, the dummy variable–coded IE, which we demonstrated in an application in this article, would be more appropriate. Our findings may have implications for past research using IE and for future research considering the use of IE.

We end by noting that in this article, we did not explore the issue of the IE being biased with respect to the “true” data-generating parameters. This has been discussed thoroughly in other contributions (see the December 2013 issue of Demography). Our findings demonstrate bias in the sense that different IEs lead to different results.

Appendix A: Proof of the Non-uniqueness of the IE in Case of Effect Coding

To arrive at Eq. (2), we used effect coding with the last categories of age, period, and cohort omitted from the equation. In this appendix, we will show that the IE yields different estimates when the first categories are omitted. Equation (2) then changes into
where superscript F denotes that the first categories of age, period, and cohort are omitted. The independent variables in (2a) differ from those in (2) because they now take value –1 for cases in the first category of age, period, or cohort. Again, the variable for the fourth cohort can be expressed in terms of other variables in (2a):
Substituting into (2a) the expression for C4F given in (3a) yields the following:
The preceding equation can be represented more compactly:
The parameters in (5a) are identified because C4F is not part of the equation. Following the same line of reasoning as outlined in the main text for the last categories omitted, we now have to minimize the sum of squares α^3F2+β^3F2+γ^2F2+γ^4F2+γ^5F2, which finally leads to the following IE estimator for γ4F:
Recall that with the last categories omitted, we found the estimator given in Eq. (8):

Apparently, the estimate γ^4F depends on the value of − α3F + β3F + γ2F − 2γ5F, whereas the estimate γ^4L depends on the value of α1L − β1L + 2γ1L + γ2L. In general, with actual data, − α3F + β3F + γ2F − 2γ5F will not be equal to α1L − β1L + 2γ1L + γ2L. To see this, note that the boldfaced parameters in (8) and (8a) represent deviations from the means β0L and β0F in Eqs. (5) and (5a), respectively. Both (5) and (5a) are estimable because of the constraint that the deviation for the fourth cohort is equal to 0; that is, γ4L = 0 and γ4F = 0. As a consequence of using the same constraint in (5) and (5a), all α estimates with the same subscript are equal in both equations; the same holds for all β estimates with the same subscript and for all γ estimates with the same subscript. For example, α2L = α2F. Also, α3L (to be derived as − α1L − α2L) is equal to α3F, as estimated with Eq. (5a). The estimates of the means β0L and β0F are equal as well. Thus, proving that − α3F + β3F + γ2F − 2γ5F ≠ α1L − β1L + 2γ1L + γ2L boils down to proving that − α3L + β3L + γ2L − 2γ5L ≠ α1L − β1L + 2γ1L + γ2L. In this last inequality, the expression to the left of the inequality sign contains different deviations from the mean β0L than the expression to the right. As a consequence, the values of γ^4F and γ^4L will usually differ for actual data. Also, the parameter estimates that depend on the values of γ^4F and γ^4L will generally be different: for example, α^1L=α1Lγ^4L, whereas α^1F=α^2Fα^3F=α2Fα3Fγ^4F=α1F+γ^4F=α1L+γ^4F.3

Appendix B

In this appendix, we show why with dummy variable coding, the IE estimates generally differ from the “default” IE estimates Yang et al. (2004) presented. Instead of standard 0 and 1 coded dummy variables, we subtract 1 / k from these dummy variables, with k denoting the number of categories—that is, 3 for age and period, and 5 for cohort. Subtracting the constant 1 / k from the original 0 and 1 coded dummy variables does not change the interpretation of their regression coefficients (i.e., the deviation from the omitted category). If we omit the last age category, the dummy variables A1D and A2D for the first two ages have the coding scheme shown in Table 2.

Note that in Table 2, the sum over the three ages for each of the dummy variables (column sum) is 0, just as with effect coding. As a result, the intercept in Eq. (2b) is equal to the intercept in Eq. (2), both representing the unweighted mean of the three predicted values for ages 1, 2, and 3 (controlling for period and cohort). With the last categories omitted, we obtain the following equation to be estimated:
To keep notation parsimonious, we do not use superscript DL to explicitly indicate that the last dummy variable–coded category is the reference. Like Eqs. (2) and (2a), the new Eq. (2b) also suffers from the identification problem. For example, we can write the following for C4D:
If we substitute in (2b) for C4D the expression given in (3b), we obtain
Recall that for effect coding with the last category omitted, we used the constraint that γ4L = 0 in Eq. (2), leading to the estimable Eq. (5). This assumption implies that the deviation of cohort 4 from the mean is 0. Because the deviation of cohort 5 from the mean equals γ5L, the assumption γ4L = 0 implies that the deviation of cohort 4 from cohort 5 equals − γ5L (the value of − γ5L can be derived from the estimates of Eq. (5): − γ5L = γ1L + γ2L + γ3L + γ4L = γ1L + γ2L + γ3L). In terms of the regression coefficients of Eq. (2b), the above implies: γ4D = − γ5L. If, in Eq. (2b), we plug in − γ5L for γ4D, substitute the expression for C4D given in (3b), and rearrange terms we obtain the following estimable equation:4
Because Eqs. (5) and (5b) are based on the same constraint with respect to cohort 4, the effect coding parameters in (5) can be translated into deviations from the reference categories resulting from estimating (5b). This is relevant when we compare the effect-coded IE with the dummy variable–coded IE later in this section. The parameters in (4b) and (5b) are related as follows:
Having an estimate for the parameter of the fourth cohort leads to estimates for the remaining parameters (except for β0D). The IE now employs the criterion of minimizing the following sum of squares:
This is a completely different criterion than the one used in the effect-coded IE, proposed by Yang et al. (2004). Not only are more parameters (eight instead of five) involved in the sum of squares to be minimized, but the parameters also have a different meaning (i.e., distances to the reference category instead of distances to the mean). For the preceding sum of squares, we can write
Taking the first-order derivative of this sum of squares with respect to γ^4D and setting it to 0 finally leads to the following IE estimate for γ4D:
Because the boldfaced parameters in this expression for γ^4D are compatible with the ones in Eq. (5), we can formulate γ^4Din terms of the effect coding parameters of Eq. (5). For example, coefficient α1D, which is the deviation of age 1 from reference category age 3, can be written as the difference of the two corresponding effect coding parameters: namely, α1L − α3L.5 Building on this relation between the coefficients of effect and dummy coding, we can write the above estimate of γ^4D in terms of the effect coding parameters of Eq. (5):
Using our knowledge that α3L = − α1Lα2L, β3L = − β1Lβ2L, and γ5L = − γ1Lγ2Lγ3L, we finally get the following:

If we compare this IE estimator of γ^4Dwith γ^4L=α1Lβ1L+2γ1L+γ2L/8of Eq. (8), it is obvious that the IE estimate for the fourth cohort effect will usually differ between the effect-coded IE as proposed by Yang et al. (2004) and the dummy variable–coded IE (with the last categories as the omitted ones for both codings). The same holds for the IE estimates of the remaining categories of the three APC variables. Similar to the IE with effect coding, each triplet of reference categories of age, period, and cohort leads to different estimates when dummy variable coding is used, which we do not elaborate here.



For the APC models that we discuss in this article, having the same number of units in each combination of age and period is not required.


We determined the linear trend in, for example, age by a simple ordinary least squares (OLS) regression on the 19 age estimates. We used the highest and lowest linear trends only to demonstrate the variability in IE estimates when different sets of categories are omitted.


The Stata user added routine apc_ie provides deviation contrast IE estimates with only the last categories omitted. To obtain estimates with the first categories omitted, one could “mirror” the three APC variables so that the highest age, period, and cohort values become the lowest.

To estimate the parameters in Eq. (5b) with superscript D, one could first estimate the following:

Then, to obtain, for example, the value of γ3D, calculate γ3D  =  γ3* + 2γ5L.


For any two ages i and j, the predicted values (controlling for period and cohort) based on Eq. (5) are β0L + αiL and β0L + αjL. The difference between these predictions equals αiL − αjL, which represents the deviation of age i from age j and hence is equal to αiD.


Fienberg, S. E. (
Cohort analysis’ unholy quest: A discussion
. 10.1007/s13524-013-0251-z
Hardy, M. A. (
Regression with dummy variables
Newbury Park, CA
Held, L., & Riebler, A. (
Comment on “Assessing validity and application scope of the intrinsic estimator approach to the age-period-cohort (APC) problem.”
. 10.1007/s13524-013-0255-8
Luo, L. (
Assessing validity and application scope of the intrinsic estimator approach to the age-period-cohort (APC) problem
. 10.1007/s13524-013-0243-z
O’Brien, R. M. (
Comment of Liying Luo’s article, “Assessing validity and application scope of the intrinsic estimator approach to the age-period-cohort (APC) problem.”
. 10.1007/s13524-013-0250-0
Yang, Y. C., Fu, W. J., & Land, K. C. (
A methodological comparison of age-period-cohort models: The intrinsic estimator and conventional generalized linear models
Sociological Methodology
. 10.1111/j.0081-1750.2004.00148.x
Yang, Y. C., & Land, K. C. (
Misunderstandings, mischaracterizations, and the problematic choice of a specific instance in which the IE should never be applied
. 10.1007/s13524-013-0254-9
Yang, Y. C., Schulhofer-Wohl, S., Fu, W. J., & Land, K. C. (
The intrinsic estimator for age-period-cohort analysis: What it is and how to use it
American Journal of Sociology
. 10.1086/587154