## Abstract

This article explores an important property of the intrinsic estimator that has received no attention in literature: the age, period, and cohort estimates of the intrinsic estimator are not unique but vary with the parameterization and reference categories chosen for these variables. We give a formal proof of the non-uniqueness property for effect coding and dummy variable coding. Using data on female mortality in the United States over the years 1960–1999, we show that the variation in the results obtained for different parameterizations and reference categories is substantial and leads to contradictory conclusions. We conclude that the non-uniqueness property is a new argument for not routinely applying the intrinsic estimator.

## Introduction

The December 2013 issue of this journal included contributions addressing the intrinsic estimator (IE). Fienberg (2013), Held and Riebler (2013), Luo (2013), and O’Brien (2013) all seem to agree that the IE is of rather limited value for simultaneously estimating age, period, and cohort effects. Luo (2013), for instance, demonstrated that IE estimates are biased when the true parameters of age, period, and cohort show a linear trend that diverges from the one implied by the IE constraint. She concluded that the IE should not be used. In contrast, in their reply to Luo, the IE developers Yang and Land (2013) remained convinced about IE’s potential for practical research.

Which conclusion must one draw from these diverging expert viewpoints on the IE? Should researchers still consider using the IE, given that its application is validated by the three-step procedure proposed by Yang and Land, or should they abandon the IE in favor of other models?

In this article, we demonstrate the non-uniqueness of the IE, an important property of the IE that has been overlooked in the aforementioned discussion. This property presents a new perspective on the IE method and may have consequences for future use.

We first explain the IE without the use of matrix algebra. Next, we prove the non-uniqueness property. Finally, we show the different results obtained by applying different IE solutions to fictitious data as well as to real data on female mortality. Based on our findings, we recommend that researchers not apply the IE routinely to their data, even if the three-step procedure suggested by Yang and Land (2013) would justify its use.

## The Intrinsic Estimator

*Y*denotes the value of the dependent variable for a given unit (typically a person), and

*A*

_{i},

*P*

_{j}, and

*C*

_{k}represent independent dummy variables indicating whether the unit belongs to age

*i*, period

*j*, and cohort

*k*. Further, β

_{0}denotes the intercept, and

*e*represents the unit’s error term. To estimate the parameters α

_{i}, β

_{j}, and γ

_{k}, we follow Yang et al. (2004) and apply the following constraints:

_{3}= − α

_{1}− α

_{2}; hence, we can rewrite the age effects in Eq. (1) as follows:

_{3}is omitted; its value can be directly derived from α

_{1}and α

_{2}. In addition, the three age dummy variables in (1) have been replaced by the two differences

*A*

_{1}–

*A*

_{3}and

*A*

_{2}–

*A*

_{3}. In the same way, we can substitute − β

_{1}− β

_{2}for β

_{3}and − γ

_{1}− γ

_{2}− γ

_{3}− γ

_{4}for γ

_{5}in (1). The resulting equation then is

*L*denotes the elimination of the effects of the last age, period, and cohort categories (i.e., α

_{3}

^{L}, β

_{3}

^{L}, and γ

_{5}

^{L}) from the equation. Note that

*A*

_{i}

^{L}=

*A*

_{i}−

*A*

_{3},

*P*

_{j}

^{L}=

*P*

_{j}−

*P*

_{3}, and

*C*

_{k}

^{L}=

*C*

_{k}−

*C*

_{5}. The variables

*A*

_{i}

^{L},

*P*

_{j}

^{L}, and

*C*

_{k}

^{L}take the value of 1 if a case belongs to the subscripted age, period, and cohort; the value of 0 if it does not; and the value of –1 if the case belongs to the last (omitted) category. According to these codings, the expectation of

*Y*for the three age categories equals β

_{0}

^{L}+ α

_{1}

^{L}, β

_{0}

^{L}+ α

_{2}

^{L}, and β

_{0}

^{L}− α

_{1}

^{L}− α

_{2}

^{L}when controlling for period and cohort. The mean of these three expectations equals β

_{0}

^{L}. The mean of the expectations of

*Y*for the three periods and for the five cohorts also equals β

_{0}

^{L}. Consequently, the parameters α

_{i}

^{L}, β

_{j}

^{L}, and γ

_{k}

^{L}represent deviations from the mean, β

_{0}

^{L}, for the given categories of age, period, and cohort. This type of parameterization is known as “effect coding” (Hardy 1993).

### The IE Constraint

*C*

_{4}

^{L}can be written as a perfect linear combination of other variables in (2):

*C*

_{4}

^{L}the expression given in Eq. (3) and rearrange terms, we obtain the following:

*C*

_{4}

^{L}has been eliminated in Eq. (5), the perfect dependency no longer exists; hence, the regression parameters in (5) can be estimated. To emphasize that the parameters in (5) are estimable, we use boldfaced characters. For the parameters in (4) and (5), the following equalities hold:

_{1}

^{L}in (5) equals α

_{1}

^{L}+ γ

_{4}

^{L}in (4)—that is, α

_{1}

^{L}=

**α**

_{1}

^{L}− γ

_{4}

^{L}. Similarly, the other three parameters—β

_{1}

^{L}, γ

_{1}

^{L}, and γ

_{2}

^{L}—are related to γ

_{4}

^{L}. In total then, we have

_{4}

^{L}is found, the estimates of α

_{1}

^{L}, β

_{1}

^{L}, γ

_{1}

^{L}, and γ

_{2}

^{L}simply follow, given the estimates of the boldfaced parameters. However, because of the dependencies in (7), an extra constraint is needed to obtain estimates of α

_{1}

^{L}, β

_{1}

^{L}, γ

_{1}

^{L}, γ

_{2}

^{L}, and γ

_{4}

^{L}. The constraint that the IE applies consists of the minimization of the sum of squares of these five parameters. For the sum of the squared estimates of these parameters, we can write:

_{4}:

Plugging this estimate of $\gamma ^4L$ into the expressions given in (7) for α_{1}^{L}, β_{1}^{L}, γ_{1}^{L}, and γ_{2}^{L} leads to the IE estimates for these four remaining parameters. To summarize, the IE estimates of the regression parameters in (2) are those OLS estimates that have the smallest sum of squares for the five collinear variables *A*_{1}^{L}, *P*_{1}^{L}, *C*_{1}^{L}, *C*_{2}^{L}, and *C*_{4}^{L}. In the next section, we will show that the IE estimates are not unique but depend on which categories of age, period, and cohort are omitted and on the type of parameterization that is used.

## The Non-uniqueness of the Intrinsic Estimator

In the previous section, we showed that the IE obtains estimates with a minimum sum of squares of those parameters that are not identifiable because of collinearity. The IE estimates, however, depend on both the choice of omitted categories and the type of parameterization applied. As an example of the dependence on omitted categories, we derive the IE estimates when the first categories of age, period, and cohort are omitted instead of the last ones, the latter being the default in Yang et al. (2004).

### Effect Coding With First Categories Omitted

(See appendix A for the proof of (8a)). For the fourth cohort, the deviation from the mean equals $\gamma ^4F$, which differs from the deviation $\gamma ^4L$ when the last categories were omitted. In contrast, the estimate of the mean is the same regardless of whether it is the last or the first categories that are omitted. In identified models, changing the omitted category has no impact on the deviations from the mean, but here it does. As a demonstration, we applied the default IE with the last categories omitted, as well as the alternative IE with the first categories omitted, to a fictitious data set for three ages, three periods, and five cohorts. For each of the nine age/period combinations, we simulated data for 100 people.^{1} Table 1 contains the mean of the dependent variable for each combination. We used linear regression, the results of which are shown in Fig. 1.

In Fig. 1, the age and period trends of both IE solutions differ when moving from year 2 to year 3: the solid line shows an increase in the predicted *Y* value for age and a decrease for period, whereas the dashed line shows the opposite. For cohort, moving from year 1 to year 2 results in an increase or a decrease of the prediction, depending on the omitted categories chosen. Notice that for the midpoints of age, period, and cohort, both IE solutions yield the same predictions. This is congruent with Eq. (6), which shows that the estimates for the middle categories are identifiable without additional restrictions.

Because the type of parameterization in the default IE is effect coding with the unweighted mean as reference point, the sensitivity will be lower the more categories of age, period, and cohort are available. This is illustrated in Fig. 2, which is based on the same data set on U.S. female mortality as Yang et al. (2004, 2008) used. These data are from the Berkeley Human Mortality Database and include 19 age categories, 8 periods, and 26 birth cohorts.

For each of the possible 19 × 8 × 26 = 3,952 triplets of omitted categories, we calculated the corresponding set of effect-coded IE estimates, using the Poisson regression model that Yang et al. (2004, 2008) used. In Fig. 2 we show the triplets with the lowest and the highest linear trends in age, period, and cohort, as well as the default IE (last categories omitted).^{2} To find the triplets with lowest and highest linear trends, we implemented the principal component regression method in the software package R. As Fig. 2 demonstrates, it is of almost no importance which triplet of categories is omitted, the general tendency being more or less the same. In sum, the IE estimates with effect coding depend on the choice of which categories are omitted. This undesired sensitivity becomes less strong the more categories of age, cohort, and period are being used—an important consideration given that data limitations may force researchers to work with a limited set of periods, while the set of age and cohort categories is a lesser problem under normal circumstances. In any event, it seems advisable to check for sensitivity to the chosen omitted categories when using the IE and effect coding.

### Dummy Variable Coding

We have shown that the IE estimates depend on the choice of which categories are omitted. In more general terms, the IE estimates depend on the design matrix that is chosen before estimation takes place. This matrix is defined by the choice of omitted categories and the type of parameterization one wishes to apply. For instance, a researcher may be interested in developments from the first period, youngest age, and/or oldest cohort onward. In that case, a dummy variable parameterization may be appropriate, with the first categories (rather than the means) as points of reference. In identified models, results from any parameterization can be transformed into the results from any other parameterization. However, for IE models this is not true: a solution obtained with effect coding is different from a solution based on dummy variable coding. Interestingly, although these solutions yield different estimates, they both have the IE properties Yang et al. (2004) outlined: each minimizes its own particular sum of squared estimates, and the associated standard errors of these estimates are minimal. Appendix B contains the proof that IE estimates with dummy variable coding as a rule differ from the default IE estimates using effect coding.

Using the U.S. female mortality database, we will now demonstrate how, with dummy variable coding, the IE estimates vary with the reference categories chosen. Again, for each of all possible 19 × 8 × 26 = 3,952 triplets of omitted categories, we obtained the corresponding set of dummy variable–coded IE estimates. Next, we selected the sets with the lowest and the highest linear tendency in age, period, and cohort, which were found for the triplets (19, 5, 14) and (10, 4, 26). These two sets of IE estimates are plotted in Fig. 3 along with the estimates when the first categories act as points of reference.

Figure 3 shows that for age and even more so for period and cohort, the differences between the three dummy variable–coded IE solutions are substantial. For age, the dashed curve predicts less mortality at the end of the life cycle than at the beginning, whereas the other two IEs show the opposite. For period, the dotted IE and the IE with the first categories omitted show a significant decline in mortality over the years, whereas for the dashed IE mortality increases. For cohort, the dotted IE estimates show little variation in mortality over the years, whereas the other two IEs suggest a small and big decline in mortality risk, respectively. Comparing Figs. 2 and 3 shows that dummy variable coding produces more variability in the estimates of age, period, and cohort than does effect coding. Further, choosing the first age, period, and cohort year as (starting) points of reference rather than the default IE, a researcher would obtain different results, particularly with respect to the period (finding a decrease instead of a slight increase in mortality) and the cohort trend (finding a less steep decline in mortality).

## Discussion

In this article, we showed that there is no unique set of IE estimates but that there exist many, each corresponding to a particular type of parameterization and a particular triplet of omitted categories. Using fictitious and real data, we demonstrated that different IEs can lead to different conclusions about age, period, and cohort trends. With many time points, the IE using effect coding seems relatively robust for the choice of omitted categories. However, with a limited number of time points, which will most likely occur for period in actual research, the effect-coded IE can lead to quite different results for different omitted categories. The IE based on dummy coding seems more sensitive to the choice of categories that act as points of reference, even if the number of time points under study is large.

A consequence of our findings is that IE users may have to consider which IE best fits their needs. From a mathematical point of view, it is difficult, if not impossible, to prefer one particular parameterization and set of omitted categories: each IE has desirable properties with respect to a different set of parameters, determined by the chosen parameterization and omitted categories (Yang et al. 2004). The default IE Yang et al. (2004) proposed is special in the sense that it minimizes the sum of squared deviations from the mean (i.e., the variance of the estimates). Some researchers may consider this a desirable property: if one is completely agnostic about age, period, and cohort influences, one may prefer a “conservative” estimation method, which favors a small variance. Yet, in our opinion, other parameterizations can be equally valid. For example, other researchers may favor estimates that show the smallest (sum of squared) changes compared with some reference year of age, period, and cohort and therefore choose a dummy variable parameterization. Yet other researchers may prefer estimates that show the smallest (sum of squared) changes compared with the immediately preceding time point and hence use so called repeatedly-coded dummy variables.

Our point is that if researchers decide to use the IE, they must be aware of the different possibilities that may lead to different results. For instance, Yang et al. (2004:100) stated that the reason for the slow or nonexistent increase in mortality for period in the middle panel of our Fig. 2 is not clear, and they hypothesized that it may be partly due to increasing rates of cigarette smoking in females. However, with the first categories as references, mortality decreases slightly over the four decades (as shown in the middle panel of Fig. 3), which seems equally, if not more, plausible. Because of such possibly divergent conclusions, we agree with Yang et al. (2013) that the IE should never be used routinely, but for an additional reason: even if the three-step procedure Yang et al. recommended is carefully conducted, the question concerning parameterization and omitted categories remains to be answered *before* applying the IE. In this respect, the IE is more similar to constrained generalized linear models (CGLIM) than one may think, given that both types of models depend on a constraint that has to be chosen *before* analyzing. In CGLIM, each pair of equaled categories corresponds to a different constraint and thus to different estimates. In the IE, each choice of parameterization and/or omitted categories corresponds to a different constraint and hence to different estimates.

Earlier in the article, we noted that one argument in favor of using the default IE is that researchers, from an agnostic point of view, may prefer estimates for the age, period, and cohort categories with the smallest variance possible. The minimization criterion of the default IE, however, does not involve the estimates of the omitted categories. Instead of minimizing $\alpha ^1L2+\beta ^1L2+\gamma ^1L2+\gamma ^2L2+\gamma ^4L2$ in our fictitious example, one could minimize $\alpha ^1L2+\alpha ^3L2+\beta ^1L2+\beta ^3L2+\gamma ^1L2+\gamma ^2L2+\gamma ^4L2+\gamma ^5L2$, which is the sum of squares of estimates of *all* parameters that are not identified, including the three omitted categories. It is noteworthy that this criterion leads to estimates that are *independent* of the omitted categories, as opposed to the IE criterion.

To summarize, we showed that the IE has a non-uniqueness property, raising the question of which IE to choose. When the researcher aims for minimum variance, the IE with effect coding would be the obvious choice; effect coding, however. does not provide the smallest variance across all APC categories because the omitted categories are not part of the constraint. When the researcher aims for the smallest (squared) deviations from some reference year, the dummy variable–coded IE, which we demonstrated in an application in this article, would be more appropriate. Our findings may have implications for past research using IE and for future research considering the use of IE.

We end by noting that in this article, we did not explore the issue of the IE being biased with respect to the “true” data-generating parameters. This has been discussed thoroughly in other contributions (see the December 2013 issue of *Demography*). Our findings demonstrate bias in the sense that different IEs lead to different results.

### Appendix A: Proof of the Non-uniqueness of the IE in Case of Effect Coding

*F*denotes that the first categories of age, period, and cohort are omitted. The independent variables in (2a) differ from those in (2) because they now take value –1 for cases in the first category of age, period, or cohort. Again, the variable for the fourth cohort can be expressed in terms of other variables in (2a):

*C*

_{4}

^{F}given in (3a) yields the following:

*C*

_{4}

^{F}is not part of the equation. Following the same line of reasoning as outlined in the main text for the last categories omitted, we now have to minimize the sum of squares $\alpha ^3F2+\beta ^3F2+\gamma ^2F2+\gamma ^4F2+\gamma ^5F2$, which finally leads to the following IE estimator for γ

_{4}

^{F}:

Apparently, the estimate $\gamma ^4F$ depends on the value of − **α**_{3}^{F} + **β**_{3}^{F} + **γ**_{2}^{F} − 2**γ**_{5}^{F}, whereas the estimate $\gamma ^4L$ depends on the value of **α**_{1}^{L} − **β**_{1}^{L} + 2**γ**_{1}^{L} + **γ**_{2}^{L}. In general, with actual data, − **α**_{3}^{F} + **β**_{3}^{F} + **γ**_{2}^{F} − 2**γ**_{5}^{F} will not be equal to **α**_{1}^{L} − **β**_{1}^{L} + 2**γ**_{1}^{L} + **γ**_{2}^{L}. To see this, note that the boldfaced parameters in (8) and (8a) represent deviations from the means **β**_{0}^{L} and **β**_{0}^{F} in Eqs. (5) and (5a), respectively. Both (5) and (5a) are estimable because of the constraint that the deviation for the fourth cohort is equal to 0; that is, γ_{4}^{L} = 0 and γ_{4}^{F} = 0. As a consequence of using the same constraint in (5) and (5a), all **α** estimates with the same subscript are equal in both equations; the same holds for all **β** estimates with the same subscript and for all **γ** estimates with the same subscript. For example, **α**_{2}^{L} = **α**_{2}^{F}. Also, **α**_{3}^{L} (to be derived as − **α**_{1}^{L} − **α**_{2}^{L}) is equal to **α**_{3}^{F}, as estimated with Eq. (5a). The estimates of the means **β**_{0}^{L} and **β**_{0}^{F} are equal as well. Thus, proving that − **α**_{3}^{F} + **β**_{3}^{F} + **γ**_{2}^{F} − 2**γ**_{5}^{F} ≠ **α**_{1}^{L} − **β**_{1}^{L} + 2**γ**_{1}^{L} + **γ**_{2}^{L} boils down to proving that − **α**_{3}^{L} + **β**_{3}^{L} + **γ**_{2}^{L} − 2**γ**_{5}^{L} ≠ **α**_{1}^{L} − **β**_{1}^{L} + 2**γ**_{1}^{L} + **γ**_{2}^{L}. In this last inequality, the expression to the left of the inequality sign contains different deviations from the mean **β**_{0}^{L} than the expression to the right. As a consequence, the values of $\gamma ^4F$ and $\gamma ^4L$ will usually differ for actual data. Also, the parameter estimates that depend on the values of $\gamma ^4F$ and $\gamma ^4L$ will generally be different: for example, $\alpha ^1L=\alpha 1L\u2212\gamma ^4L$, whereas $\alpha ^1F=\u2212\alpha ^2F\u2212\alpha ^3F=\u2212\alpha 2F\u2212\alpha 3F\u2212\gamma ^4F=\alpha 1F+\gamma ^4F=\alpha 1L+\gamma ^4F$.^{3}

#### Appendix B

In this appendix, we show why with dummy variable coding, the IE estimates generally differ from the “default” IE estimates Yang et al. (2004) presented. Instead of standard 0 and 1 coded dummy variables, we subtract 1 / *k* from these dummy variables, with *k* denoting the number of categories—that is, 3 for age and period, and 5 for cohort. Subtracting the constant 1 / *k* from the original 0 and 1 coded dummy variables does not change the interpretation of their regression coefficients (i.e., the deviation from the omitted category). If we omit the last age category, the dummy variables *A*_{1}^{D} and *A*_{2}^{D} for the first two ages have the coding scheme shown in Table 2.

*DL*to explicitly indicate that the last dummy variable–coded category is the reference. Like Eqs. (2) and (2a), the new Eq. (2b) also suffers from the identification problem. For example, we can write the following for

*C*

_{4}

^{D}:

*C*

_{4}

^{D}the expression given in (3b), we obtain

_{4}

^{L}= 0 in Eq. (2), leading to the estimable Eq. (5). This assumption implies that the deviation of cohort 4 from the mean is 0. Because the deviation of cohort 5 from the mean equals

**γ**

_{5}

^{L}, the assumption γ

_{4}

^{L}= 0 implies that the deviation of cohort 4 from cohort 5 equals −

**γ**

_{5}

^{L}(the value of −

**γ**

_{5}

^{L}can be derived from the estimates of Eq. (5): −

**γ**

_{5}

^{L}=

**γ**

_{1}

^{L}+

**γ**

_{2}

^{L}+

**γ**

_{3}

^{L}+

**γ**

_{4}

^{L}=

**γ**

_{1}

^{L}+

**γ**

_{2}

^{L}+

**γ**

_{3}

^{L}). In terms of the regression coefficients of Eq. (2b), the above implies: γ

_{4}

^{D}= −

**γ**

_{5}

^{L}. If, in Eq. (2b), we plug in −

**γ**

_{5}

^{L}for γ

_{4}

^{D}, substitute the expression for

*C*

_{4}

^{D}given in (3b), and rearrange terms we obtain the following estimable equation:

^{4}

_{0}

^{D}). The IE now employs the criterion of minimizing the following sum of squares:

_{4}

^{D}:

**α**

_{1}

^{D}, which is the deviation of age 1 from reference category age 3, can be written as the difference of the two corresponding effect coding parameters: namely,

**α**

_{1}

^{L}−

**α**

_{3}

^{L}.

^{5}Building on this relation between the coefficients of effect and dummy coding, we can write the above estimate of $\gamma ^4D$ in terms of the effect coding parameters of Eq. (5):

**α**

_{3}

^{L}= −

**α**

_{1}

^{L}−

**α**

_{2}

^{L},

**β**

_{3}

^{L}= −

**β**

_{1}

^{L}−

**β**

_{2}

^{L}, and

**γ**

_{5}

^{L}= −

**γ**

_{1}

^{L}−

**γ**

_{2}

^{L}−

**γ**

_{3}

^{L}, we finally get the following:

If we compare this IE estimator of $\gamma ^4D$with $\gamma ^4L=\alpha 1L\u2212\beta 1L+2\gamma 1L+\gamma 2L/8$of Eq. (8), it is obvious that the IE estimate for the fourth cohort effect will usually differ between the effect-coded IE as proposed by Yang et al. (2004) and the dummy variable–coded IE (with the last categories as the omitted ones for both codings). The same holds for the IE estimates of the remaining categories of the three APC variables. Similar to the IE with effect coding, each triplet of reference categories of age, period, and cohort leads to different estimates when dummy variable coding is used, which we do not elaborate here.

## Notes

^{1}

For the APC models that we discuss in this article, having the same number of units in each combination of age and period is not required.

^{2}

We determined the linear trend in, for example, age by a simple ordinary least squares (OLS) regression on the 19 age estimates. We used the highest and lowest linear trends only to demonstrate the variability in IE estimates when different sets of categories are omitted.

^{3}

The Stata user added routine *apc_ie* provides deviation contrast IE estimates with only the last categories omitted. To obtain estimates with the first categories omitted, one could “mirror” the three APC variables so that the highest age, period, and cohort values become the lowest.

^{4}

*D*, one could first estimate the following:

Then, to obtain, for example, the value of **γ**_{3}^{D}, calculate **γ**_{3}^{D} = **γ**_{3}^{*} + 2**γ**_{5}^{L}.

^{5}

For any two ages *i* and *j*, the predicted values (controlling for period and cohort) based on Eq. (5) are **β**_{0}^{L} + **α**_{i}^{L} and **β**_{0}^{L} + **α**_{j}^{L}. The difference between these predictions equals **α**_{i}^{L} − **α**_{j}^{L}, which represents the deviation of age *i* from age *j* and hence is equal to **α**_{i}^{D}.