If there was ever any doubt that demographers love to discuss Lexis diagrams, this has been quashed by the many responses we have received by email, over social media, and in this journal to our article “The Dangers of Drawing Cohort Profiles From Period Data: A Research Note” (van Raalte et al. 2023). Our paper showed the biases that can result from “taking diagonals” of age–period (AP) grids, rather than cohort parallelograms, to construct cohort profiles. We showed how this widely used method (see references in our research note) can introduce a surprisingly large bias.

The responses were constructive and informative. First, we learned about methods developed at INED in the 1980s to account for seasonal fluctuations in births and more severe distortions following events such as wars (Calot 1984; Calot and Sardon 2004:146). Part of this framework has been adopted by the Human Fertility Database (HFD) team in their methodologies for the Short-Term Fertility Fluctuation series (Jdanov et al. 2022). Second, and related to the empirical Japanese cohort fertility example in our paper, we belatedly realized that the data were not based on actual cohort exposures but on Human Mortality Database (HMD)–adjusted cohort exposures. However, the Japanese case is an obvious example of a fertility shock, which still clearly illustrates the implications of using unadjusted period data to estimate cohort profiles. Finally, we spurred top demographers into action to assess ways to mitigate the bias of drawing cohort profiles from AP squares.

In his commentary, Schmertmann (2024) shows that much of the bias is reduced by averaging two temporally adjacent AP squares, which he calls the AP2 estimator. We fully agree. Schmertmann's AP2 estimator is intuitive, easy to implement, and—as his beautiful visualizations demonstrate—substantially reduces the bias associated with estimating cohorts from the diagonal of one AP square (which he calls the AP estimator, and which we will refer to as the AP1 estimator). The reasons why the AP2 outperforms the AP1 are obvious when the age–period squares are superimposed on the true cohort in a Lexis diagram (see figure 1 in van Raalte et al. (2023) and imagine that each purple square becomes a horizontal purple rectangle spanning two adjacent years, across which an average value is calculated). AP2 covers the entire area of the true cohort parallelogram and is a better temporal fit. From this visual perspective, the advantages are so clear that it is surprising that, to our knowledge, AP2 is not commonly used. This is averaging at the input level and is based on demographic insight. In practice, the AP2 approach is similar to attenuating the problem at the outcome level by smoothing, which has been used, for example, in cohort forecasting (Myrskylä et al. 2013).

Has AP2 solved the problem? We would expect that for many if not most applications from 1 × 1 (one calendar year by one year of age) Lexis data grids, the AP2 estimator would be just fine. For the fertility and mortality examples presented in our paper (see table 1 in van Raalte et al. 2023), where the HFD and HMD cohort total fertility rate (TFR) and life expectancy figures were compared with estimates from the diagonal of the age–period squares, AP2 showed a substantial improvement. While the differences in cohort TFR and life expectancy for the AP1 estimator averaged 0.6% and 0.7%, respectively (in absolute values), these differences decreased to 0.1% and 0.3%, respectively, for the AP2 estimator.

We have created a set of fertility scenarios to understand if and when concern is warranted in using the AP2 estimator. To isolate the bias, we filled an infinite Lexis surface with birth and population exposure counts of the 1950 French birth cohort. In other words, the age-specific fertility in each year, and for each cohort, was the same as the fertility rate experienced by that cohort. The world was stationary. Then we shocked it. Figure 1 shows the percentage difference between the true cohort TFR and the AP1 and AP2 estimates under various scenarios, while online supplementary Figure S1 shows the estimated TFR levels. The replication file to produce both figures is available at https://osf.io/xn5w7/.

In the first scenario (Figure 1, top row), we introduced a period shock where birth counts dropped by 50% in one year, before returning to their preshock level in the following calendar year. We shifted the timing of the shock in calendar year to capture the different impacts of the shock falling in the beginning, middle, or end of a five-year period for the 5 × 5 Lexis grid. On the 1 × 1 Lexis grid, the AP2-estimated cohort TFR captures this shock extremely well, while as expected, the AP1 estimator falls in between the values of adjacent cohorts for any given cohort that bore children in that year. On the 5 × 5 grid, as expected, differences to the true cohort were larger than on the 1 × 1 grid. The AP2 estimator was generally better than the AP1 estimator when the shock was toward the beginning or end of a five-year period included in the grid, but it had a larger maximum deviation compared with the AP1 estimator when the shock was in the middle of the period. Also, both the AP1- and the AP2-estimated cohort TFRs differed from the true one for cohorts around the “half-size” birth cohort, because of mixing across triangles of different population sizes. However, the bias is small in both cases.

In a second scenario (Figure 1, middle row), we followed the 50% drop in births in year 1 with an overcompensating recovery (preshock births increased by 40% in year 2 and by 20% in year 3 before returning to preshock levels). We repeated this scenario three times, shifting it each time to get different combinations breaking across the 5 × 5 cells. On a 1 × 1 grid, the AP2 estimate was close to the true cohort estimate. On a 5 × 5 grid, the absolute and relative performance of both estimators differed across the shifting combinations. Moreover, the trend was incorrectly captured by both AP estimates when the fall and recovery in births were situated in different 5 × 5 cells (seen more clearly in online supplementary Figure S1).

A third scenario (Figure 1, bottom row) attempted to mimic a war event, with a sudden large and permanent emigration (a halving of the population at all ages) combined with a fertility shock with birth counts dropping to 30% of prewar levels in the first year, dropping to 20% in the year after, and then recovering to 50% afterward, to match the halved population. The AP2 was a great match to the true cohort on the 1 × 1 grid. On the 5 × 5 grid, AP2 did not estimate the true cohort TFR better than AP1 overall, and the bias was substantial for some cohorts with either estimator.

We could, of course, imagine dozens of more complex scenarios and cherry-pick those that illustrate the danger of generating cohort profiles from period grids. But our takeaway from the Schmertmann (2024) commentary, and further analysis of our own, is that the AP2 estimator will generally perform well on a 1 × 1 grid. On a 5 × 5 grid, caution is still warranted, particularly when faced with large fluctuations in demographic rates.

Unfortunately, the 5 × 5 data grid is still a reality for many demographic applications. True cohort data are unavailable for many if not most countries. In the meantime, demographers should carefully think through the most appropriate way to estimate cohort profiles for their applications and the biases induced by data manipulations.

Acknowledgments

We thank Jean-Paul Sardon for pointing us to some earlier literature dealing with the same issues. A.A.vR. and M.R.N. were supported by an ERC Starting Grant extension granted by the Max Planck Society. M.M. was supported by the Strategic Research Council, FLUX consortium, Decision Nos. 345130 and 345131; by the National Institute on Aging (R01AG075208); by grants to the Max Planck–University of Helsinki Center from the Max Planck Society (Decision No. 5714240218), Jane and Aatos Erkko Foundation, Faculty of Social Sciences at the University of Helsinki, and Cities of Helsinki, Vantaa, and Espoo; and the European Union (ERC Synergy, BIOSFER, 101071773). The views and opinions expressed are, however, those of the authors only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

References

Calot, G. (
1984
).
La mesure des taux en démographie: Taux par âge en années révolues et taux par âge atteint dans l'année [The measurement of rates in demography: Rate by age in completed years and rate by age reached in the year]
.
Population (French Edition)
,
39
,
107
146
. https://doi.org/10.2307/1532114
Calot, G., & Sardon, J.-P. (
2004
).
Methodology for the calculation of Eurostat's demographic indicators
(Population and Social Conditions Report, 3/2003/F/No.
26
). Eurostat. Retrieved from https://tinyurl.com/4edmf2d7
Jdanov, D., Sobotka, T., Zeman, K., Jasilioniene, A., Alustiza Galarza, A., Németh, L., & Winkler-Dworak, M. (
2022
).
Short-term fertility fluctuations data series (STFF)—Methodological note
(HFD Note, Version 30.03.2022). Human Fertility Database. Retrieved from https://www.humanfertility.org/File/GetDocumentFree/Docs/STFFnote.pdf
Myrskylä, M., Goldstein, J. R., & Cheng, Y. A. (
2013
).
New cohort fertility forecasts for the developed world: Rises, falls, and reversals
.
Population and Development Review
,
39
,
31
56
.
Schmertmann, C. (
2024
).
Commentary on van Raalte et al.’s “The dangers of drawing cohort profiles from period data: A research note
.”
Demography
,
61
,
967
971
. https://doi.org/10.1215/00703370-11484875
van Raalte, A. A., Basellini, U., Camarda, C. G., Nepomuceno, M. R., & Myrskylä, M. (
2023
).
The dangers of drawing cohort profiles from period data: A research note
.
Demography
,
60
,
1689
1698
. https://doi.org/10.1215/00703370-11067917
Freely available online through the Demography open access option.

Supplementary data