Abstract
Drawing cohort profiles and cohort forecasts from grids of age–period data is common practice in demography. In this research note, we (1) show how demographic measures artificially fluctuate when calculated from the diagonals of age–period rates because of timing and cohort-size bias, (2) estimate the magnitude of these biases, and (3) illustrate how prediction intervals for cohort indicators of mortality may become implausible when drawn from Lee–Carter methods and age–period grids. These biases are surprisingly large, even when the cohort profiles are created from single-age, single-year period data. The danger is that we overinterpret deviations from expected trends that were induced by our own data manipulation.
Introduction
As demographers, we are all aware that demographic events and person-years at risk can be split and aggregated into different formats on the Lexis diagram: age–period squares, cohort parallelograms, or, for even more flexibility, Lexis triangles (Carstensen 2007; Keiding 1990; Wilmoth et al. 2021). But who among us has never created a cohort profile by “taking the diagonals” of an age–period grid? Perhaps this is because demographic data are usually preformatted for period analysis. Possibly we wanted to analyze cohort trends from a methodology that was designed specifically to be used on an age–period grid. Whatever the reason, taking the diagonals is common practice (e.g., Kermack et al. 1934; Myrskylä et al. 2013; Preston and Wang 2006; Shkolnikov et al. 2011; Vogt et al. 2017), even when we suspect that it is not the best practice.
In this research note, we illustrate the dangers of drawing cohort profiles from period data using examples that we have recently encountered. The first two examples describe how demographic measures artificially fluctuate when calculated from the diagonals of age–period rates because of (1) timing and (2) cohort-size biases. The third example illustrates how drawing forecast uncertainty from the upper and lower bounds of a period Lee–Carter time series parameter results in implausible prediction intervals when completing cohorts from period diagonals.
It is well known that the larger the age–period square, the weaker the approximation of a real cohort. As the age–period square gets smaller, cohort patterns drawn from such squares start to approximate real (continuous-time) cohort lines. All of our empirical examples use single-year, single-age data and illustrate that the dangers do not vanish with this fine grid. The examples are fully replicable (https://osf.io/xn5w7/) using data from the Human Mortality Database ([HMD] 2023) and the Human Fertility Database ([HFD] 2023).
Example 1: Cohort Summary Measures From Period Age-Specific Rates—Temporal Bias
Two key biases come into play when we calculate cohort summary measures from the diagonals of age–period data. The first is a bias from the timing of events and exposure, while the second relates to changing cohort sizes.
To illustrate the first of these biases, we turn to the Lexis diagram. In Figure 1, the true cohort data are represented by green parallelograms and “period diagonals” by purple squares. The data from the upper Lexis triangles will always be one calendar year earlier for the age–period compared with age–cohort data, with single-year, single-age data. Lengthening the time/age data format increases discrepancies in timing.
In practice, the extent to which this temporal difference biases the cohort summary measure depends on how the measure is trending. If the processes happen to be stationary, there is no bias. In Figure 2, we contrast cohort completed fertility by age 40, also known as the cohort total fertility rate (TFR), for Japanese females calculated from the two data formats: age–cohort parallelograms and the diagonals of age–period squares. When fertility trends were flat over cohorts (e.g., cohorts 1945–1955), the artificial and true cohort TFRs line up well. When fertility declined sharply (e.g., cohorts 1955–1970), there was a general overestimation of the cohort TFR from the period squares, marked as “A” in Figure 2. The cohort TFR was underestimated from cohort 1975 onward as fertility recovered.
Example 2: Cohort Summary Measures From Period Age-Specific Rates—Cohort-Size Bias
The second bias results from changing cohort size. The trend in cohort TFR derived from the diagonals of age–period rates shows especially large fluctuations around the 1966 birth cohort (marked as “B” in Figure 2). This is only partially real. The true cohort trend is more stable. Changing cohort size explains the difference between the two trends. In 1965, there were 1.8 million births; births dropped to 1.4 million in 1966 (owing to beliefs related to being born in the year of the “Fire-Horse” (Kaku 1975)), only to rebound to 1.9 million in 1967. Deriving summary cohort fertility measures from the age–period squares uses rates drawn from events and exposures of the smaller 1966 cohort mixed with events and exposures from the larger neighboring cohorts. These interactions cause an artificial fluctuation in the corresponding estimated cohort trends.
How Large Is the Bias on Demographic Measures? Empirical Estimates From the HFD and HMD
The timing and cohort-size biases demonstrated with Japanese fertility data affect any cohort demographic measure estimated from the diagonals of age–period squares. In practice, these two biases are challenging to separate.
We estimated the joint impact of these biases on empirical trends of fertility (cohort TFR), as well as mortality (cohort life expectancy), from countries with long time series of available data from the Human Fertility and Mortality Databases (Table 1). Overall, on average there was a 0.6% (in absolute values) difference in cohort TFR between true cohort data and cohort-from-period diagonals data. Life expectancy differences amounted to 0.7% on average (in absolute values). Cohort-from-period fertility was underestimated about one third of the time and overestimated about two thirds of the time, while cohort-from-period life expectancy was usually underestimated. This is because although cohort fertility generally declined during the time frame investigated, there were also periods of recovery, while cohort life expectancy more consistently increased.
Example 3: Cohort Lifespan Variation Drawn From Period Lee–Carter Forecasts
Cohort mortality forecasts are often done by “taking the cohort diagonals” of a forecast age–period surface (Alburez-Gutierrez et al. 2021; Andreev and Vaupel 2006; Shkolnikov et al. 2011). Our third example shows that we need to be careful in attributing forecast uncertainty with such an approach.
The Lee–Carter (1992) model1 and its various extensions have become the dominant mortality forecasting methodology, used worldwide by statistical agencies and international organizations (Basellini et al. 2023). The prediction intervals are derived from the mortality time series parameter kt, usually either by:
simulating trajectories of kt, constructing the life table measure of interest for each trajectory, and calculating the uncertainty from this simulated distribution; or
using the upper and lower bounds of the simulated kt trajectories, and constructing the life table measure of interest that corresponds to these bounds.
The latter method is computationally simpler,2 in particular since sometimes it is feasible to avoid the simulation altogether and calculate the bounds of kt analytically. This approach is widely used, for instance, in the popular “demography” R package (Hyndman et al. 2023), and was suggested by Lee (2000:83) himself for period—not cohort—life table functions, since the variations in the log death rates will be “perfectly correlated with one another, because all are linear function of the same time-varying parameter k.” For cohort life table measures, however, mortality schedules are completed from kt, kt + 1, kt + 2, and so on, and the associated complex relationship between their uncertainty bounds (Lee 2000).
We illustrate this with a Lee–Carter forecast3 of Swedish female mortality data, where cohort age-specific mortality was completed by employing the diagonals of a forecast age–period grid, using the 95% prediction intervals from the upper and lower bounds of the simulated kt trajectories (Figure 3, panel c). While the life expectancy prediction intervals (panel a) appear reasonable, the interquartile range intervals (panel b) do not. The interval first grows, then shrinks, goes to zero, and then grows again. The decline in uncertainty over cohorts is clearly implausible. These dynamics occur regardless of the chosen index of variation, the input data (age–period or age–cohort), or the population (see Figures A1–A4 in the online appendix).
Discussion
Summary
Drawing cohort profiles from age–period data or models can lead to patently false conclusions. In the first instance, using age–period data to calculate cohort TFR and cohort e0 resulted in estimates that were on average 0.6% and 0.7% different from those calculated from age–cohort data, respectively. The largest deviations were a 4.1% difference in cohort TFR and a −4.8% difference for cohort life expectancy at birth (e0). In the second example, drawing cohort profiles from period Lee–Carter forecasts resulted in implausible prediction intervals for lifespan variability measures, when based on taking the upper and lower bounds of the kt time series parameter.
Should the artificial fluctuations in cohort fertility and mortality caused by the input data be considered large or small? To put this difference into perspective, the average data-driven artifact estimated for any single cohort TFR amounts to about one third of the recent average yearly drop in U.S. period TFR over the period 2007–2019 (HFD 2023), which is noteworthy for its size and has generated considerable discussion (Guzzo and Hayford 2020; Hartnett and Gemmill 2020). In terms of mortality, the drop in period e0 in either the United States or the United Kingdom from 2014 to 2017 was less than 0.1% (HMD 2023). In the pre-COVID-19 world, these life expectancy declines received extensive attention in the media, as well as in demographic and public health circles (Hiam et al. 2018; Ho and Hendi 2018). The declines were seven times less than the average data-induced cohort life expectancy artifact.
While the examples used related to fertility and mortality, these issues are equally pertinent in migration research. Migration flows fluctuate by year and season. In Europe, 1.3 million refugees, primarily from Syria, Iraq, and Afghanistan, claimed asylum in 2015 (Eurostat 2016). This was more than double the level seen in any year over the preceding decade. Two thirds of these 2015 asylum seekers arrived in July or later that year. In a Lexis diagram with the year of arrival on the x-axis (“year”) and years since arrival on the y-axis (“age”), the late-2015 migration cohort would have some weight in “year 2015–age 0,” heavy weight in “year 2016–age 0,” some weight in “year 2016–age 1,” heavy weight in “year 2017–age 1,” and so on with the fluctuation continuing as this migration cohort is followed throughout its life course. An analysis of decisions on asylum seekers by year of arrival, based on square-based cohort inference, could be very different from decisions based on properly structured age–cohort data.
As a field, demography is slowly moving away from the analysis of aggregate population-level data toward the usage of rich individual-level data (Lee 2001), which we can aggregate ourselves by year of birth. These changes started with the implementation of repeated cross-sectional and panel surveys a few decades ago (Crimmins 1993) and have progressed to include administrative and other big data sources (Kashyap 2021). The longer some of these studies (or registers) run, the more we can expect research designs that examine temporal change over the cohort dimension, as well as methodological development for proper cohort inference (see, e.g., Yang and Land (2013) for a discussion of how individual-level data are shaping methodological developments in the field of age–period–cohort analysis). Nevertheless, analysis of aggregate data remains a core task of demographers, and it is likely we will face aggregated age–period data from statistical offices for some time to come.
Potential Solutions to Mitigate the Problems
Calculating Cohort Summary Measures From Period Data
National statistical agencies often provide data in aggregated age–period formats that are particularly suited to period analysis. When conducting cohort analyses, we should first split the data into Lexis triangles and then reshape it into cohort parallelograms (Carstensen 2007). These methods have their assumptions, in particular, that births (or the event that defines the cohort) are evenly distributed at the beginning and end of the calendar year. This assumption is sometimes violated, for instance, after exceptional events such as wars, famines, or epidemics (Aassve et al. 2020; Agadjanian and Prata 2002; Chandra et al. 2018; Lindstrom and Berhanu 1999); in anticipation of loss or gain in social benefits such as baby bonuses (Brunner and Kuhn 2011; Thévenon and Gauthier 2011); depending on cultural beliefs, for example, the period boost in fertility among Chinese accompanying dragon years (Goodkind 1991, 1995); or due to seasonal fluctuations such as the 2015 European migration wave. In these cases, exposures can be distributed by month of birth or other event defining the cohort, as is now done for the exposure populations in the HMD and HFD (Wilmoth et al. 2021).
If the data we have on hand come in the form of age–period rates and not the actual event and exposure counts, we envisage an adjustment of the rates that depends on the changes in cohort sizes, if known.
Completing Cohort Profiles From Period Forecasts
The problem of the crossing prediction intervals is not a general feature of completing cohort mortality from period forecasts, but relates to an often-used approach of attributing uncertainty within the Lee–Carter forecasting methodology that is clearly inappropriate. Lee himself argued against using such an approach for cohort measures (Lee 2000) but never empirically elaborated on the point. Crossovers do not occur when measures are calculated on the implied death rates for each simulated time series trajectory (see Figure A5 in the online appendix). However, this approach has not been empirically tested to determine whether the obtained uncertainty intervals are well calibrated.
Conclusion
Drawing cohort profiles from period data and forecasts can lead to misleading conclusions, including erroneous fluctuation in cohort summary measures and implausible uncertainty in cohort forecasts. No one will be surprised that we find bias. But we expect many to be surprised by the magnitude of bias, even when cohort profiles and forecasts are drawn from a single-age, single-period grid. The danger is that we go down rabbit holes explaining deviations from expected trends when the deviations themselves are an artifact of our own data manipulation.
We argue that cohort summary measures of fertility, mortality, and migration should always be calculated on the basis of age–cohort data. When unavailable, age–period data should be split into Lexis triangles and reformatted to cohort data using the most plausible assumptions. In the case of cohort forecasting, crossing prediction intervals can be avoided by period-based forecasting methodologies that are not based on the Lee–Carter framework, or by simulation approaches within the Lee–Carter framework. However, we are of the opinion that uncertainty can only be properly accounted for with a cohort-based forecasting methodology. We end with a plea for researchers to further test and develop new cohort-based methods and models.
Acknowledgments
We thank Jim Oeppen, Vladimir Shkolnikov, and our reviewers for helpful comments at various stages that greatly improved the manuscript. A.A.vR. and M.R.N. were supported by an ERC Starting Grant (716323), with an extension granted by the Max Planck Society. M.M. was supported by the Strategic Research Council, FLUX consortium, decision numbers 345130 and 345131; by the National Institute on Aging (R01AG075208); by grants to the Max Planck–University of Helsinki Center from the Max Planck Society, Jane and Aatos Erkko Foundation, Faculty of Social Sciences at the University of Helsinki, and Cities of Helsinki, Vantaa, and Espoo; and the European Union (ERC Synergy, BIOSFER, 101071773). The views and opinions expressed are, however, those of the authors only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.
Notes
The Lee–Carter model expresses variation in death rates across age and time as follows: ln mx,t = ax + bxkt + εx,t, where ax is the pattern of log mortality at age x, kt is an index of the level of mortality at time t, bx describes how much mortality at a given age changes with the overall mortality level changes, and εx,t is the residual.
This was far more of a concern in the early 1990s than it is today. With modern computing, the difference in the time needed to calculate prediction intervals from the two methods is trivial. We argue that the first approach should now be used even in period calculations.
In our results, following the standard Lee–Carter estimation procedure, the model was fit to data from 1960 to 2019, ax was set to equal the means over time of the ln mx,t, and the parameters bx and kt were estimated with singular value decomposition. An adjusted kt (to match the observed number of deaths in each year) was forecast to decline linearly to 2100 using a random walk with drift.