Abstract

van Raalte et al. (2023) alerted demographers to the potential dangers of calculating cohort measures from the “diagonals” of gridded age–period (AP) data. In the case of cohort fertility, however, a minor change to the estimation procedure can mitigate the trend and cohort size biases that the authors identify. With an appropriate algorithm, researchers can estimate cohort fertility indices from AP data quite well.

Introduction

In the first example in their research note, van Raalte et al. (2023) addressed the problem of estimating completed cohort fertility levels from a 1 × 1 grid of age–period (AP) rates [fxt]. If one uses ϕxc to denote the fertility rate at integer age x of the cohort born in calendar year c (an age–cohort rate that applies to a parallelogram on the Lexis diagram), the true value of a cohort's completed fertility at exact age 40 is
They investigated the consequences of approximating Φc by adding AP rates from the Human Fertility Database (HFD; 2023) over the “diagonal” of the Lexis grid. That is, for the cohort born in year c, they calculated

which I will call the AP estimator. They compared F1935,...,F1982 to the Japanese cohort values published by the HFD. For this example, they showed that proportional errors FcΦcΦc range from −2.1% to +4.1%, that Fc systematically overestimates cohort fertility when Φc is decreasing from one cohort to the next (and underestimates when Φc is increasing), and that the errors can be magnified by size differences between adjacent birth cohorts.

In this commentary, I make two points about their cautionary fertility example:

  • Raw HFD input data for Japan are in age–period format. So target Φc values for Japan were also estimated from a grid of Lexis squares and [fxt] values, but with the multistep procedure described in the HFD Methods Protocol (HFD 2023). Errors in their example measure failure to match another algorithm, rather than failure to match true cohort values.

  • There is a simple variant of the AP estimator that has a stronger demographic rationale and that better approximates the HFD algorithm. This alternative procedure also has less bias and smaller errors when tested against true Φc values calculated from Lexis triangle data.

As a consequence, estimating cohort fertility from age–period data is less dangerous than implied in van Raalte et al.’s (2023) cautionary note.

Alternative AP Diagonalization

Formula (2) looks reasonable, because it is a straightforward discretization of the integral formula for cohort fertility on a Lexis rate surface f(x,t) with continuous ages and times. In comparing (1) and (2), we see that the AP estimator simply plugs in fx,c+x to approximate the cohort–age rate ϕxc. However, with discrete 1 × 1 AP squares, a cohort passes through a single-year age interval over two calendar years, not one.

Figure 1 illustrates this with an example. The fertility rate of 1980-born women between exact ages 20 and 21 depends on births and exposure from parts of two AP cells: [20,2000] and [20,2001]. The AP estimate F1980 calculated from (2) includes data from only the first of the two relevant calendar years. This causes the backward-looking temporal bias noted by the authors.

Intuitively, it seems that including data from both of the AP squares in the figure would produce a better approximation to the cohort–age rate. That intuition is correct. In generalizing the specific example in Figure 1, the true fertility rate at age x for cohort c is the ratio of births to exposure over Lexis triangles Lx,c+x and Ux,c+x+1. Call the births in those triangles BL and BU, respectively, and call the corresponding exposures NL and NU. The true age–cohort rate is then
The HFD and Human Mortality Database Methods Protocols (HFD 2023; Human Mortality Database 2023) describe careful procedures for allocating the births and exposure observed in AP cells to the underlying Lexis triangles. But a simple approximation to those procedures yields good estimates of cohort rates and completed fertility—namely, assume that half of births and half of exposure in each of the two relevant AP cells belong to the cohort of interest. In that case,
and we can construct an AP2 estimator for completed cohort fertility by summing over ages:

F˜c will not exactly equal Φc, for several reasons (such as changing fertility rates within one-year age intervals and cohort size differences), but it will typically be a very good approximation. Most importantly, it is not as vulnerable to temporal bias as the AP estimator Fc, because it is centered on the age–cohort parallelograms of interest.

Empirical Comparisons

Figure 2 compares the AP estimator from Eq. (2) to the AP2 estimator from Eq. (5). The top left panel reproduces the cautionary Japanese example in the research note and adds the AP2 estimates as red dots connected by line segments.

The temporal centering of the AP2 estimator greatly reduces the bias that concerned the authors. Over the 1955–1970 cohorts, for example, the AP2 estimator does not have a notable positive bias. In fact, it comes much closer than the AP estimator to matching the HFD estimates. Similarly, the AP2 estimator avoids the AP underestimates for the cohorts born after 1975. As a bonus, the errors associated with the unusually small 1966 “Fire-Horse” cohort are much smaller with the temporally centered AP2 estimates.

The top right panel of Figure 2 shows the cumulative distribution of differences between the HFD estimates of Japanese cohort fertility and the two alternatives. Large errors are much less common with the AP2 procedure, and the mean error for the AP2 estimator is close to zero (+0.1%).

The bottom panels of Figure 2 show the equivalent results for France, a country for which the raw HFD data include year of mother's birth. For French females, cohort fertility rates are observed rather than estimated. Thus, in these bottom panels we compare AP and AP2 estimates to the true cohort completed fertility levels. Results are very similar to those for Japan: the AP2 estimates do not have the temporal bias of the AP estimates when there are strong trends across cohorts, they are typically smaller than AP errors in absolute value, and the mean AP2 error is very close to zero (+0.2%).

Conclusion

The van Raalte et al. (2023) research note demonstrated that treating AP diagonal estimates as if they were true cohort measures can be problematic. It is important to carefully consider the dangers of this procedure. In the specific context of cohort fertility, however, the dangers are not as large as their research note suggests. Temporally centering the estimates by averaging two diagonals eliminates most of the bias problems that they highlight and greatly mitigates the cohort-size effects. In the Japan case, the centered estimates match the HFD cohort allocation procedure very well. In the France case, they match the actual cohort data very well. With proper caution, demographers can safely “diagonalize” AP fertility rates to learn about cohorts.

Supplementary Material

R code and data for replicating the calculations in this commentary are available at https://github.com/schmert/cohort-diagonals.

References

Human Fertility Database
. (
2023
).
Rostock, Germany
:
Max Planck Institute for Demographic Research
; Vienna, Austria: Vienna Institute of Demography. Available from https://humanfertility.org
Human Mortality Database
. (
2023
).
Rostock, Germany
:
Max Planck Institute for Demographic Research
; Berkeley, CA, USA: University of California, Berkeley; Paris, France: French Institute for Demographic Studies. Available from https://www.mortality.org
van Raalte, A. A., Basellini, U., Camarda, C. G., Nepomuceno, M. R., & Myrskylä, M. (
2023
).
The dangers of drawing cohort profiles from period data: A research note
.
Demography
,
60
,
1689
1698
. https://doi.org/10.1215/00703370-11067917