## Abstract

Research on mortality modeling of multiple populations focuses mainly on extrapolating past mortality trends and summarizing these trends by one or more common latent factors. This article proposes a multipopulation stochastic mortality model that uses the explanatory power of economic growth. In particular, we extend the Li and Lee model (Li and Lee 2005) by including economic growth, represented by the real gross domestic product (GDP) per capita, to capture the common mortality trend for a group of populations with similar socioeconomic conditions. We find that our proposed model provides a better in-sample fit and an out-of-sample forecast performance. Moreover, it generates lower (higher) forecasted period life expectancy for countries with high (low) GDP per capita than the Li and Lee model.

## Introduction

The twentieth century witnessed a substantial increase in human life expectancy. By the beginning of the twenty-first century, the record female life expectancy had reached approximately age 85 for Japan, compared with age 60 at the beginning of twentieth century for the New Zealand non-Maori population (Oeppen and Vaupel 2002). Many actuarial practices rely largely on projections of future life expectancies in more than one population. For instance, the pricing of index-based, longevity-linked derivatives (such as the catastrophe (CAT) bond issued by the Swiss Reinsurance Company (Swiss Re)) and the risk management of life insurance companies with businesses in multiple countries require accurate forecasts of the joint mortality movements of the relevant populations. Therefore, the projection of future mortality experiences—especially the joint dynamics of multiple populations—should be carefully addressed. Even if one were interested only in the forecast for a particular country, a multipopulation model has the advantage of incorporating more data. As Li and Lee (2005) noted, improving the mortality forecasts for individual populations should be possible by taking into account the common patterns in a larger group of populations.

Most mortality models in the literature are *extrapolative*: they forecast future mortality rates based solely on historical trends (see, e.g., Cairns et al. 2011a). In particular, it is popular to model the past mortality trends by one or more latent factors and to forecast future mortality rates by extrapolating these latent factors. However, the interpretation of these latent factors is not straightforward. For example, the underlying factors that drive historical trends and whether these trends will continue remain unclear. On the other hand, empirical studies have suggested long-run correlations between mortality developments and observable trends. In particular, one of the most heavily studied trends is economic growth. For example, Brenner (2005) noted that economic growth, cumulatively over at least a decade, has been the central factor in mortality declines in the United States over the twentieth century.^{1} In a more recent study, Niu and Melenberg (2014) performed Johansen cointegration tests on the latent mortality factor extracted from the Lee-Carter model (Lee and Carter 1992) and the real gross domestic product (GDP) per capita for six OECD countries. Niu and Melenberg found a long-run relationship between these two trends and showed that they have comparable performance in terms of fitting historical mortality rates.

Relationships between economic growth and mortality developments have typically been discussed in a single-population framework—that is, a separate model for every population. In this article, we extend the literature to a setting with multiple populations. In particular, we study whether long-term relationships exist between economic growth and mortality declines for a group of closely related populations. Moreover, and more importantly, we propose mortality forecasts for individual populations by taking these relationships into account. A desirable feature when dealing with multiple populations is that mortality forecasts for a group of closely related populations do not diverge in the long run. In other words, the mortality forecasts should be coherent (Li and Lee 2005). Separate single-population models fitted to individual populations may fail to generate coherent forecasts because the resulting mortality trend may be different for each population. To the contrary, many multipopulation models ensure coherent mortality forecasts between different populations (D’Amato et al. 2014; Dowd et al. 2011; Hyndman et al. 2013; Li and Lee 2005). In this article, we extend the Li-Lee model (Li and Lee 2005) to include both latent factors and observable variables related to economic growth, and fit the extended model to populations with similar socioeconomic characteristics.^{2} Similar to the original Li-Lee model, mortality forecasts are coherent in our extended model. For the variables related to economic growth, we follow Niu and Melenberg (2014) and use the logarithm of the real GDP per capita (hereafter referred to as simply *GDP*) for each country.

In the empirical study, we focus on four groups of populations: 14 low-mortality countries around the world, 6 Eastern European countries, 6 former Soviet Union countries, and the male and female populations of Sweden. In our analysis, we find that at most, two principal components are already sufficient to capture more than 98 % of variations in the GDP data. Therefore, we include up to two principal components of GDP in the extended model instead of the original GDP series. We obtain mortality forecasts by extrapolating the principal components of GDP. We find that although we use the observable GDP series instead of the common latent mortality factor, the proportions of in-sample variations of the mortality data explained by our model are very close to the ones explained by the original Li-Lee model. Hence, GDP may serve as a reasonable substitute for the common latent factor in a multipopulation context. Moreover, the in-sample fit, measured by the Akaike information criterion (AIC) ratio and the Bayesian information criterion (BIC), is better for our proposed model because the number of free parameters in our proposed model is significantly smaller than that in the original Li-Lee model. The more parsimonious structure of our proposed model may help to avoid overfitting of historical data. Finally, we perform out-of-sample forecast for each group with various jump-off years and find that the proposed model generates, on average, better forecasting performance than the original Li-Lee model and the rotational Lee-Carter model (Li et al. 2013). Therefore, an important contribution of this study is that GDP is indeed useful in predicting mortality rates in a multipopulation framework.

## Mortality Developments and Economic Growth

The cross-sectional relationship between the level of economic developments and the level of mortality has been discussed several decades ago. In his seminal paper, Preston (1975) showed that countries with a higher national income, on average, have a higher life expectancy. This positive relation is stronger for countries with lower national incomes and weakens for richer countries. This finding is referred to as the *Preston curve*.

In recent decades, there have been ongoing debates on the dynamic relationships between change of economic conditions (economic growth) and the change of mortality rates (mortality developments). For example, Brenner (2005) suggested that economic growth has been an important factor for mortality declines in the United States over the twentieth century, and Birchenall (2007) reached the same conclusion for a wider range of countries. Moreover, French and O’Hare (2014) and Niu and Melenberg (2014) incorporated macroeconomic factors in stochastic mortality models and suggested that these factors are helpful in predicting future mortality rates. To the contrary, Tapia Granados and Roux (2009) showed that mortality declines in the United States accelerate during economic recessions and slow during economic expansions. See Ruhm (2005) for an overview of studies on the relationship between economic growth and mortality changes. Findings on the alternative direction—that is, the effect of health and life expectancy on economic growth— have been mixed. For example, Bloom and Canning (2005) found that health has a positive effect on economic growth, both at the macro and the micro level. Moreover, Bhargava et al. (2001) suggested a positive effect of life expectancy on economic growth in low-income countries. However, Acemoglu and Johnson (2007) showed that no evidence exists that exogenous increases in life expectancy lead to per capita economic growth. In this article, we study the long-term relationship between the economic growth and the mortality declines of a group of closely related populations. Therefore, in addition to the coherence assumption on mortality, the economic growth of countries within a group should not diverge. The convergence of the economic growth in different countries follows from the convergence hypothesis, which various empirical studies have supported (Ben-David 1996; Mankiw and Weil 1992). Moreover, despite the controversy in the literature, our model is based on a hypothesis similar to that proposed by Niu and Melenberg (2014). In particular, our underlying assumption is that the economic growth and the mortality declines in a group of closely related populations do not diverge in the long run. We require no assumptions on the causality between these two trends.

## Model

In their seminal paper, Lee and Carter (1992) made linear extrapolation one of the most popular methods of mortality modeling. Since then, various extensions of the Lee and Carter (Lee-Carter) model have been proposed (Booth et al. 2002, 2006; Hyndman et al. 2013; Renshaw and Haberman 2003). However, most of these studies are based on a single-population perspective. Li and Lee (2005) extended the Lee-Carter model to a multipopulation framework by taking into account a common mortality pattern. In particular, they imposed a coherence assumption on the multipopulation model, meaning that forecasts of life expectancy in different populations could not diverge in the long run. Recently, Niu and Melenberg (2014) extended the Lee-Carter model by including the logarithm of real GDP per capita as an additional factor. They found a good fit, showing that movements in GDP are positively correlated with life expectancy for six OECD countries. This article follows the idea of Niu and Melenberg (2014) and extends the Li-Lee model by including the real GDP per capita. To that end, here we introduce the general model specification and its estimation procedure.

To formulate our model, we first describe the Li-Lee model. Denote by *I* the number of populations. In each population, there are *N* ages and *T* periods. Let *m*_{i,x,t} be the one-year central death rate for an individual age *x* in year *t* in population *i*. A population could be a unisex national population or a national population of a certain gender. In the Li-Lee model, the logarithm of the one-year central death rate log *m*_{i,x,t} is explained by a common latent factor and a population-specific latent factor. The common factor is estimated by the singular value decomposition applied to the average of the demeaned logarithm of the central death rates, $1I\u2211i=1Ilogmi,x,t$. After the common factor is obtained, the population-specific factors are estimated by applying the singular value decomposition again to the population-specific residuals of the mortality rates that are not explained by the common factor.

We extend the Li-Lee model by including the logarithm of the real GDP per capita. Without loss of generality, we correct the logarithm of the real GDP per capita in each country to have a mean of 0. With slight abuse of terminology, we refer to the demeaned logarithm of the real GDP per capita as simply *GDP*. We are interested in the correlation between the economic growth and the mortality improvements in a group of populations with similar socioeconomic conditions. We assume that the GDPs of the countries within a group do not diverge in the long run. Therefore, instead of looking at the population-specific GDP data, we consider the common trends in GDP for the whole group. In particular, we include the minimal number of principal components (PCs) of GDP such that the PCs explain at least 95 % of the variation in GDP.

*a*

_{i,x}

*, b*

_{i,x}, and

*k*

_{i,t}are the population-specific parameters;

*K*

_{j,t}is the

*j*th common latent mortality trend;

*g*

_{ℓ , t}is the

*ℓ*th principal component of GDP for the

*I*populations; and

*B*

_{j,x}and γ

_{ℓ , x}are their respective age-specific loadings. The model includes

*L*principal components of GDP, with

*L*≤

*I*. Although these principal components are estimated, we treat them as observable because they do not depend on the mortality rates. Moreover, the model includes

*J*common latent factors. The population-specific parameters

*a*

_{i,x},

*b*

_{i,x}, and

*k*

_{i,t}have an interpretation similar to that in Li and Lee (2005). In particular, for any

*i*and

*x*,

*a*

_{i,x}is the mortality level (i.e., the average over time of log

*m*

_{i,x,t}),

*k*

_{i,t}is the population-specific latent factor, and

*b*

_{i,x}is the corresponding age-specific loading.

Principal components *g*_{ℓ ,t} satisfy by construction that the sample covariance of (*g*_{ℓ,t},*g*_{m,t}) is 0 for all *ℓ* ≠ *m*_{;} that is, $\u2211t=1Tg\u2113,tgm,t=0$. The population-specific factors *k*_{i,t} are assumed to follow stationary processes to ensure coherent forecasts (Li and Lee 2005). When *J* = 1 and *L* = 0, Eq. (1) reduces to the original Li-Lee model. Moreover, the case where *J* = 0, *I* = 1, and *L* = 1 leads to Model (6) discussed in Niu and Melenberg (2014).

When fitting Eq. (1) to historical mortality data, one can achieve an increasingly better in-sample fit by using larger values of *J* and *L*. However, as Lee and Miller (2001) suggested, it is preferable to retain a simpler model specification, which runs less risk of overfitting and thus projecting changes that may be transitory. In fact, as Li and Lee (2005) demonstrated, one common latent factor already produces a rather good fit. Therefore, we consider at most one common latent factor: *J* = 0 or *J* = 1. Moreover, to further ensure simplicity of the model, we let *J* = 0 when GDP data are used. In other words, we assume that there is no significant mortality trend that is common to all populations after the economic growth is accounted for. Instead, the population-specific mortality factors are able to explain a sufficiently large part of the remaining variations in the data. In upcoming discussion, we show the robustness of our model by discussing a more general model that includes both a common latent factor and principal components of GDP.

*J*= 0 or

*J*= 1, we introduce the estimation procedure for

*J*= 1; that is,

*L ≥*1. We first estimate

*a*

_{i,x}as follows:

*m*

_{i,x,t}. After

*a*

_{i,x}is specified, we estimate the model in two steps:

We estimate the population-invariant parameters,

*B*_{x},*K*_{t}, and γ_{ℓ,x}from the average of the logarithm of the central death rates over populations.We then estimate the population-specific parameters,

*b*_{i,x}and*k*_{i,t}, using the population-specific residuals of the logarithm of the central death rates after the common part is subtracted.

Finally, we estimate the parameter $\sigma \epsilon i,x$, that shows the volatility of the error terms ε_{i,x}.

*u*

_{x,t}be the systematic part of $logmx,t\u223c$. By taking the average over populations, we use a direct extension of the single-population estimation procedure of Niu and Melenberg (2014) for the population-invariant parameters. Denote by

**θ**are not identified without additional constraints. For example, with

*c*∈ ℜ

^{L}and

*d*≠ 0, we have

*K*

_{t},

*g*

_{ℓ,t}) is 0.

Following Nielsen and Nielsen (2010) and Niu and Melenberg (2014), we show in the following theorem that our constraints identify the population-invariant parameters uniquely.

**Theorem 1.** Let *u* = (*u*_{x , t})_{x = 1 , … , N , t = 1 , … , T}, where *u* = *u*(θ) satisfies $ux,t=BxKt+\u2211\u2113=1L\gamma \u2113,xg\u2113,t$ for some θ as given by Eq. (4). Then the parametrization θ^{0} under the normalization constraints shown in Eqs. (7)–(10) satisfies the following:

θ

^{0}is a function of θ.*u*is a function of θ through θ^{0}.The parametrization of u by θ

^{0}is exactly identified. That is, if θ^{1}≠ θ^{2}are two sets of parameters satisfying the normalization constraints shown in Eqs. (7)–(10), then*u*(θ^{1}) ≠*u*(θ^{2}).

The proof of Theorem 1 is delegated to Section A of Online Resource 1.

*i*th population,

*b*

_{i,x}and

*k*

_{i,t}are obtained by applying the singular value decomposition on the population-specific residuals of mortality rates after the common part and

*a*

_{i,x}are subtracted—that is, the

*N*×

*T*matrix:

## Application to Mortality and GDP Data

In this section, we apply the model discussed in the preceding section to historical mortality data and compare the results with those obtained from the Li-Lee model.

### Data

We use mortality data from the Human Mortality Database (HMD n.d.) for five-year age groups: 0, 1–4, 5–9, . . . , 95–99. For the empirical study, we consider four groups of populations. The first group comprises 14 low-mortality countries. The sample period for Group 1 is 1947–2010. The second group comprises six eastern European countries, with the sample period being 1969–2009.^{3} (A complete list of the countries in Groups 1 and 2 is shown in Table S3 Online Resource 1.) Group 3 consists of six former Soviet Union countries,^{4} with the sample period being 1969–2010. We use unisex mortality data for the first three groups. The final group consists of the male and female populations of Sweden, with the sample period being 1947–2010. This final group allow us to understand the comovements in mortality rates for different genders.

The GDP data for Groups 1 and 4 are from the Maddison Project on World Economy (Maddison-Project n.d.), and the GDP data for Groups 2 and 3 are from the Economic Research Service (ERS) International Macroeconomic Data Set (ERS n.d.). For the populations in Groups 1–3, we use the national real GDP per capita for that country except for East Germany, where only the combined GDP data of East Germany and West Germany is available. For Group 4, we use the national real GDP per capita of Sweden. All GDP data are based on purchasing power parity (PPP).^{5}

### GDP and Life Expectancy

Before we estimate the model, we take a preliminary look at the developments of GDP and the life expectancy of the populations in each group. Figure 1 displays the GDP and the life expectancy for the countries in Groups 1–3. For the readability of the figure, we show only the first five countries of Group 1 (countries are sorted in alphabetical order), and we provide this figure for all other countries of Group 1 and for Group 4 in Online Resource 1 (Figs. S3 and S4). For Group 1, we see that GDP and life expectancy are increasing over time for each country. Therefore, GDP is clearly positively correlated with life expectancy for the populations in Group 1. The same applies to Group 4. For Groups 2 and 3, we observe a decline of GDP during 1990–2000 for all countries except for East Germany. Meanwhile, in this period, life expectancy decreased for all countries except for East Germany, especially between 1990 and 1995. The primary cause of the decreases in both GDP and life expectancy may be the collapse of the Soviet Union, with the resulting collapse of the health care systems and other changes. Hence, GDP seems to be correlated with life expectancy at and after the collapse of the Soviet Union because of its correlation with omitted variables, such as the collapse of health care system.

### The Number of Principal Components of GDP

To select an appropriate model for each group of populations, we select the minimal number of principal components that explain at least 95 % of the variations in the GDP data. For Group 1, we find that the first principal component explains approximately 99 % of the variations in the data. For Groups 2 and 3, the first principal component explains approximately 80 % of the variations in the data, while the first two principal components explain approximately 98 %. Because Group 4 has only one country—Sweden—we use directly the GDP series of this country. Hence, we let *L* = 1 for Groups 1 and 4, and *L* = 2 for Groups 2 and 3. We refer to this model specification as the *GDP-LL model* for each group.

### Estimation Results

We estimate the GDP-LL model for each group using the method proposed earlier. For comparison, we also report the estimation results of the Li-Lee model. We plot the common factors (*g*_{1,t} and *K*_{t}) and their loadings (γ_{1,x} and *B*_{x}) for Groups 1 and 3 using the Li-Lee model and the GDP-LL model in Figs. 2 and 3, respectively. Figure 2, shows that the first principal component of GDP *g*_{1,t} is increasing over time. Moreover, γ_{1,x} is negative, and so it follows that a growth in GDP is negatively associated with the central death rates. Moreover, a larger estimated γ_{1,x} in absolute value suggests that this association is more substantial for younger ages.^{6} When we compare this with the Li-Lee model, we find consistent patterns. For the Li-Lee model, the mortality declines at younger ages are more sensitive to the common latent factor. Note that the normalization in Eq. (7) leads to positive estimates of *B*_{x}; therefore, the estimated *K*_{t} is decreasing.

The results for Group 3 are shown in Fig. 3. The figure shows that in contrast to Group 1, the GDP series for the countries in Group 3 are not monotonically increasing. Instead, GDP decreased during the period 1990–1998 and increased thereafter.

Figures 1 and 3 show that both principal components of GDP have patterns similar to that of the GDP series themselves. Moreover, we see that the second principal component is smoother. More importantly, γ_{2,x}, the loading of the age-specific mortality rates with respect to the second principal component of GDP has roughly an opposite pattern than γ_{1,x}. In fact, because the two principal components have a similar pattern, the association of the first principal component with mortality is mitigated for the ages where γ_{1,x} and γ_{2,x} have opposite signs. For ages 45–65, we see that the estimates of γ_{1,x} are positive because mortality rates are increasing from 1969 to 1990 for these ages in most of the countries. This period captures the Cold War period and directly thereafter. See Field (1995) for an overview of the positive effect of the Cold War on mortality rates.

A positive estimate of γ_{1,x} implies that the increasing trend of *g*_{1,t} would result in increasing forecasted mortality rates for these ages. These forecasts are obviously unrealistic. However, the estimated γ_{2,x} is negative for these ages and has larger absolute values. In fact, if we combine the effects of the two principal components of GDP, we still get decreasing forecasted mortality rates for these ages. The forecasting performance of the GDP-LL model will be discussed in the upcoming Forecasting Performance section.

For the Li-Lee model applied to Group 3, the estimates of *B*_{x} are negative for ages around 25–65. Combining this finding with the observation that the factor *K*_{t} is decreasing yields that the forecasted mortality rates for these ages are increasing. Therefore, it seems that the Li-Lee model is not the appropriate specification for this group given that it generates unreasonable forecasts.

The population-invariant parameters for Group 2 and Group 4 are qualitatively similar to the parameters for Group 3 and Group 1, respectively (see Figs. S7 and S8 in Online Resource 1). In other words, the explanatory power of the GDP on mortality rates is similar for the low-mortality countries (Group 1) and the two populations in Sweden (Group 4), and for the Eastern European countries (Group 2) and the former Soviet Union countries (Group 3).

Figure 4 displays the population-specific parameters for five countries of Group 1. The estimated values of *k*_{i,t} are similar, but the estimated values of *b*_{i,x} are rather different from each other. The estimated values of *b*_{i,x} from the GDP-LL model are smoother, but the ones from the Li-Lee model are rather volatile. Thus, in the GDP-LL model, the forecasted mortality rates would be smoother across ages, especially in the short term, because the differences between *b*_{i,x} across ages are smaller for each country. This observation also holds for the other countries in Groups 1 and 4. For Groups 2 and 3, the population-specific parameters from the Li-Lee model and the GDP-LL model are rather similar. The rest of the population-specific parameter estimates are reported in Online Resource 1 (Figs. S9–S13 therein).

### The Explanation Ratios *R*_{C}(*i*) and *R*_{AC}(*i*)

*R*

_{C}(

*i*) and

*R*

_{AC}(

*i*), as considered by Li and Lee (2005). The ratio

*R*

_{C}(

*i*) is the proportion of variations explained by only the common factors; that is,

*R*

_{C}(

*i*) measures how well the common factors fit for the

*i*th population. Moreover, the ratio

*R*

_{AC}(

*i*) is the proportion of variation explained by all factors; that is,

*R*

_{AC}(

*i*) measures how well the full model fits for the

*i*th population.

Figure 5 shows ratios from both the GDP-LL model and the Li-Lee model for the four groups. The *R*_{C} ratios for the GDP-LL model are smaller than the *R*_{C} ratios for the Li-Lee model for the majority of populations. This finding is not surprising because the common factors in the GDP-LL model are (the PCs of) observable GDP data, while the common factor in the Li-Lee model is the latent factor that minimizes the overall fitting error in the historical mortality data. However, the differences between these two sets of ratios are small. Thus, GDP is indeed correlated with mortality developments and is able to explain a substantial part of the variation in the mortality data.

Note that the *R*_{C} ratios from the GDP-LL model are larger for Belarus, Estonia, Latvia, and Russia in Group 3. In other words, GDP produces a better fit for the mortality rates in these former Soviet Union countries. The *R*_{AC} ratios for the GDP-LL model are only marginally smaller than the ones for the Li-Lee model, thus indicating that when a population-specific latent factor is allowed, the observable factors of GDP provide only a marginally worse fit than the optimal latent factor.

### BIC and AIC Ratios

^{7}

*m*is the number of free parameters of the model, and

*M*is the number of total samples for the group considered. The AIC ratio is given by (Akaike 1973):

A higher AIC or BIC means that the model has a better in-sample fit. The difference between the AIC and the BIC is that the BIC ratio imposes a higher penalty for the number of free parameters. A popular reference that explains the differences between the AIC and BIC ratios is Yang (2005).

*I*is the number of populations,

*N*the number of age groups in each population, and

*T*the number of periods. In Eq. (18), the values of

*I*,

*T*,

*J*, and

*L*can be different for each group and model. The number of free parameters,

*m*, is the number of total parameters in Eq. (18) minus the number of constraints given in Eqs. (7)–(11).

Table 1 displays the number of free parameters and the ratios for each group. We find that the AIC and BIC ratios of the GDP-LL model are larger than the ones of the Li-Lee model, indicating a better in-sample fit. The only exception is the AIC ratio for Group 3, where the ratio is almost identical for the two models. This is not surprising because, compared with the Li-Lee model, the GDP-LL model has only a marginally lower explanation ratio *R*_{AC}(*i*) for each *i*. However, the GDP-LL model has a lower number of the free parameters because it involves (the PCs of) the observable GDP data instead of a common latent factor, which needs to be estimated.

## Forecasting Performance

In this section, we evaluate the forecasting performance of the GDP-LL and Li-Lee models. We first explain the time series specifications for the time-varying factors, and then we compare the out-of-sample forecast accuracy. Finally, we discuss the forecasted period life expectancies at birth using both the GDP-LL model and the Li-Lee model.

### The Time Series Processes

*y*

_{t}is the process

*K*

_{t}or

*g*

_{ℓ,t}for

*ℓ*∈ {1, … ,

*L*}. Therefore, the common factors are assumed to be nonstationary with a linear trend. To ensure coherence of the forecasted mortality rates, we assume that the population-specific processes are stationary. In particular, we fit an

*AR*(1) specification for each

*k*

_{i,t}:

As Li and Lee (2005) noted, an *AR*(1) specification ensures coherent forecasts of different populations. Because the process *k*_{i,t} is orthogonal to *K*_{t} or *g*_{ℓ,t}, ω_{i,t} is uncorrelated with η_{t} in Eq. (19).

If *c*_{i,}_{1} ≥ 1 for some population *i*, then the process *k*_{i,t} is not stationary. In this case, the coherence assumption is not satisfied, and Li and Lee (2005) suggested that population *i* should thus be excluded from the forecasting analysis.

### Structural Breaks in the Common Factors

Research on extrapolative mortality modeling has widely made the linearity assumption of mortality trends in Eq. (19) (e.g., Cairns et al. 2011a). However, many recent studies have suggested that the validity of the linearity assumption depends on the calibration windows of the nonstationary factors and have proposed different methods to select an appropriate calibration window. For example, Booth et al. (2002) constructed a ratio for the Lee-Carter model to measure the reduction in the model fit arising from assuming a completely linear mortality trend.^{8} They then proposed to use the longest calibration window over which this ratio is reasonably small. Moreover, Coelho and Nunes (2011) and Van Berkum et al. (2016) proposed statistical tests for structural breaks in the mortality trends of various stochastic mortality models to determine the optimal calibration window over which the linearity assumption of the mortality trend is appropriate.

The importance of choosing an appropriate calibration window applies to our analysis as well. From Fig. 2, we observe that the estimated *g*_{t} and *K*_{t} for Group 1 are (visually) close to linear. However, Fig. 3 shows that there may exist structural changes in the principal components of GDP. In particular, the principal components both increased from 1970 to 1990, then dropped drastically between 1990 to around 1998, and finally switched to an increasing trend again—one that was steeper than before. In this case, observations before the final trend break very likely contain little information regarding the future developments of GDP.

To select the appropriate calibration window of the common factors for forecasting, we implement the test for structural breaks on *K*_{t} and *g*_{ℓ,t} for each group, following Van Berkum et al. (2016). For the theoretical background and the detailed procedure of implementing this test for multiple breaks in a univariate time series process, see Van Berkum et al. (2016) and the references therein. The latest break points of the common factors for each group and model are given in Table S1 in Online Resource 1.

### Forecasted Period Life Expectancies

We proceed with forecasting the period life expectancies. Keeping the parameters of the Li-Lee model and GDP-LL model fixed, we estimate the random walk with drift in Eq. (19) to the common factors (*K*_{t} and *g*_{ℓ,t}) after the most recent break point for each group. Then we forecast the future life expectancy using the estimated parameters. The estimated parameters for the population-invariant and the population-specific processes are reported in Online Resource 1 (Tables S2 and S3, respectively). In our analysis, *c*_{i,1} ≥ 1 for at least one model holds for the Czech Republic and East Germany in Group 2 and for Estonia, Latvia, and Ukraine in Group 3. Therefore, the population-specific process is nonstationary, and so we exclude these populations from our forecasting analysis.

For Group 2, the *R*_{C} ratios are negative for Russia in both the Li-Lee model and the GDP-LL model. Li and Lee (2005) obtained a value of 0*.*46 for the *R*_{AC} ratio and suggested that the coherence assumption does not apply to Russia. Therefore, Russia is excluded from their forecasting analysis. However, with the inclusion of more recent observations,^{9} the *R*_{AC} ratios for Russia (Group 2) are approximately 0.86 and 0.83 for the Li-Lee model and the GDP-LL model, respectively. Therefore, we provide forecasts for Russia as well.

For each group, we project period life expectancies at birth 30 years ahead for all populations. We display one population for each group in Fig. 6 and delegate a full overview of the forecasts to Online Resource 1 (Figs. S14–S19). The mean and the confidence intervals for each population are calculated numerically using 1,000 simulated paths of future life expectancies. We find that the mean forecasted life expectancies generated by the GDP-LL model are lower for the United States (Group 1), higher for Hungary (Group 2) and Russia (Group 3), and almost the same for the Swedish female population (Group 4). Therefore, for the low-mortality populations, it seems that there are events common to countries in Group 1 and have positive effects on mortality decline. Nevertheless, these events are not fully reflected in the GDP growth. Possible examples include rises of specialist visits, drug prescriptions, hospital admissions, and surgical procedures among the elderly (for a case in The Netherlands, see Mackenbach et al. 2011). In the Li-Lee model, such events affect the estimation of drift term in the common latent factor and thus have a long-run impact on mortality forecasts. On the other hand, these changes are included in the *AR*(1) process for the GDP-LL model and would affect only mortality forecasts on a short-term period.

For Hungary and Russia, the GDP-LL model leads to higher forecasted future period life expectancies. One possible reason is that, as discussed earlier, the loadings *B*_{x} for ages 25–65 are negative for Group 3. Therefore, with a decreasing *K*_{t} (see Fig. 3), the forecasted mortality rates for these ages are increasing, which results in decreasing future life expectancies. On the other hand, the combination of the two principal components of GDP and their loadings γ_{1,x} and γ_{2,x} results in a decreasing trend in the mortality rates for all ages but yields increasing trends in forecasted life expectancies for Russia. Moreover, the confidence intervals from the GDP-LL model are larger than the ones from the Li-Lee model, except for Russia.

### Out-of-Sample Forecast Performance

We next evaluate the out-of-sample forecast performance of the GDP-LL model. In particular, we compare the forecast accuracy of the GDP-LL model with both the Li-Lee model and an extension of the Lee-Carter model proposed by Li et al. (2013).

Li et al. (2013) proposed an extension of the Lee-Carter model. In this model, the age pattern of mortality declines (commonly denoted by *b*_{x} in the literature) will begin to rotate when the realized (or projected) period life expectancy reaches a predetermined threshold and will ultimately become flat for the majority of ages (e.g., for ages 0–75). We refer to this model as the *rotational* Lee-Carter (R-LC) model. The R-LC model is motivated by the empirical evidence that mortality decline in low-mortality countries has been decelerating for infants and children and has been accelerating for old ages in the past several decades (Li and Gerland 2011). By rotating the age pattern of mortality decline, the mortality projections in adjacent ages will not diverge in the long run except for at very old ages. The United Nations Population Division uses the R-LC model to produce country-specific life expectancy projections (Ševčíková et al. 2016). A brief description of the R-LC model is given in the appendix.

*i*, age

*x*, and year

*t*. The relative RMSFE of mortality rates for population

*i*and jump-off year $u\u0302$ is given by

*U*

_{i}denotes the end of sample for population

*i*. The relative RMSFE measures the relative forecasting errors for all future years in the sample and all ages. We also study the relative RMSFE for period life expectancies. The relative RMSFE for the period life expectancy for population

*i*with jump-off year $u\u0302$ is given by

*l*

_{i,u}is the observed period life expectancy at birth for population

*i*in year

*u*, and $l^i,u$ is the corresponding forecasted value.

Figures 7 and 8 display the relative RMSFE for all groups and jump-off years for the three models. For each group, the average relative RMSFE of all populations is shown. We see that the forecasting accuracies of the three models are rather similar for Group 1. Hence, for the 14 low-mortality countries, all three methods capture the mortality improvements well. For Group 4, the relative RMSFEs for log mortality rates are close for the three models, but the relative RMSFEs of the life expectancies from the Li-Lee model and the R-LC model are often higher and have larger variations than those from the GDP-LL model. One possible explanation is that the observed mortality data for the Swedish population are more volatile than the GDP data, and thus mortality trends extrapolated purely from mortality data change more drastically when sample size varies. For Groups 2 and 3, the forecasts from the GDP-LL model are generally more accurate than those from the other two models. The only exceptions are several jump-off years that are very close to the end of the sample, where the R-LC model performs better than the GDP-LL model. However, because the R-LC model is a single-population model and is fitted to each population separately, the estimated mortality trend is population-specific and may not be coherent for each group of populations.

In the forecasting analysis, we also evaluate whether each model can produce reasonable confidence intervals. In particular, for each model, population, and jump-off year, we calculate the 95 % confidence intervals of the projected period life expectancy, and then calculate how frequent the realized period life expectancies fall inside the predicted 95 % confidence intervals. We find that the confidence intervals produced by all models are able to capture the realized life expectancy in most cases. The exact percentages for the three models are shown in Table S4 of Online Resource 1.

## Robustness Analysis

In this article, we assume that there is no common latent mortality trend when we include GDP. In this section, we study the robustness of this model specification by including a common latent factor in the GDP-LL model, and we evaluate the goodness of fit of this augmented model.

We let *J* = 1 for each group with the same optimal number of principal components of GDP as in the section, Application to Mortality and GDP Data. The augmented model is estimated using the method proposed in the Model section. The AIC and BIC ratios for each group are reported in Table 2. Compared with those in Table 1, we find that the augmented model has the lowest AIC and BIC ratios for all groups. The reason for this finding is that the augmented model has more free parameters, while the model fit is not substantially better than for the Li-Lee model and the GDP-LL model. Moreover, because the GDP-LL is a nested model of the augmented model, we perform a likelihood-ratio test (Cox and Hinkley 1974) on the augmented and GDP-LL models. The *p* values of the likelihood-ratio test for each group are also shown in Table 2. We find that all *p* values are close to 1, meaning that the augmented model does not significantly improve the goodness of fit. Hence, we find no evidence that adding an extra common latent factor to the GDP-LL model improves the model fit. Moreover, the *R*_{C}(*i*) and *R*_{AC}(*i*) ratios for the augmented model are only marginally larger than the ones for the GDP-LL model and are almost the same as the ones for the Li-Lee model. The *R*_{C}(*i*) and *R*_{AC}(*i*) ratios are reported in Online Resource 1 (Tables S5 and S6).

In Section B of Online Resource 1, we introduce an alternative model in which we include real GDP per capita data in the product-ratio model proposed by Hyndman et al. (2013). The product-ratio model of Hyndman et al. (2013) can be seen as an extension of the Li-Lee model in which more common and population-specific latent factors are included and more sophisticated specifications of the time-varying factors are employed. Consistent with the Li-Lee framework, we find that the average forecast performance is improved when GDP data are included in the model.

## Conclusion

In this article, we propose a stochastic model for mortality rates in a multipopulation context including economic growth. In particular, we extend the Li and Lee (2005) model by including the principle components of real GDP per capita. We apply our proposed model to four groups of populations: low-mortality countries, eastern European countries, former Soviet Union countries, and the male and female populations of Sweden. Whereas the Li-Lee model selects the common latent factor with maximal explanatory power of the common variations of mortality rates across populations, the observable factors of GDP yield a very similar model fit. Moreover, BIC and AIC ratios are better for the proposed model, indicating that the more parsimonious structure of the proposed model may lead to better goodness of fit. In an out-of-sample forecast analysis, we find that the proposed model yields, on average, more accurate forecasts than the original Li-Lee model and the extended Lee-Carter proposed by Li et al. (2013) and Ševčíková et al. (2016) for the four groups of populations.

Our model sheds light on the interpretation of the common mortality developments in a multipopulation context. For extrapolative mortality models, the common latent factors are not easy to interpret, and it is not clear whether the trends captured by these latent factors would continue in the future. In our model, the common mortality declines in a group of populations are linked to the economic growth in the group, and the future mortality changes are forecasted by extracting the common trend of economic growth. Even if one were interested only in mortality forecasts for a particular population, our proposed model has the advantage of using the common trends of the economic growth in a larger group of populations, which may improve the mortality forecasts of the interested population.

## Acknowledgments

The authors thank three anonymous referees, the editors, Geng Niu, Michel Vellekoop, seminar participants at the 20th International Congress on Insurance: Mathematics and Economics, Longevity 12, University of Kent, and Universitat de Barcelona for valuable comments.

### Appendix

#### The Rotational Lee-Carter Model

*i*, the rotational Lee-Carter model is given by

*i*and age

*x*,

*a*

_{i,x}is the mortality level—that is, the average of log

*m*

_{i,x,t}over time. Moreover,

*k*

_{i,t}is the population-specific latent factor, and

*b*

_{i,x,t}is the age-specific loading of

*k*

_{i,t}. The latent factor

*k*

_{i,t}is modeled by a random walk with drift as in Eq. (19). Li et al. (2013) referred to the loadings

*b*

_{i,x,t}as the age pattern of mortality-decline rates. They are given by

In Eq. (24), $liO$and $liU$ are the threshold and the ultimate life expectancy for population *i*, respectively. Moreover, $bi,xO$ is the original age pattern of the mortality-decline rates, which are estimated from the original Lee-Carter model, and $bi,xU$ is the ultimate age pattern of mortality-decline rate, which is flat for most ages and has the same (downward) trend for the old ages as $bi,xO$. Finally, *w*_{i}(*t*) is a time-varying weighting function, which varies from 0 to 1.

The R-LC model is used to project future life expectancies. Initially, when the projected life expectancy in year *t* is smaller than the threshold life expectancy, *w*_{i}(*t*) equals 0, and the mortality-decline rates are set to be the original rates, $bi,xO$. When the projected life expectancy exceeds the threshold life expectancy, *w*_{i}(*t*) increases, and the actual age pattern of mortality-decline rates gradually rotate toward the ultimate rates. Finally, when the projected life expectancy exceeds the ultimate life expectancy, *w*_{i}(*t*) equals 1, and the actual age pattern of mortality-decline rate are the same as the ultimate rates. In our estimation, we use the same parameter specifications as used by Li et al. (2013). So, for each population, we let $liO$ = 80 and $liU$ = 102, and $bi,xU$ is flat for age groups 0 to 70–74. For the construction of the ultimate age pattern of mortality-decline rates and the weighting function, see Li et al. (2013).

## Notes

^{1}

For an overview of the related literature, we refer to the section, “Mortality Developments and Economic Growth.”

^{2}

In Section B of Online Resource 1, we extend the discussion to a more general stochastic mortality model proposed by Hyndman et al. (2013) and find qualitatively similar results. Other multipopulation mortality models, such as those of Cairns et al. (2011b), Dowd et al. (2011), D’Amato et al. (2014), and Salhi and Loisel (2017), can be naturally incorporated in our framework.

^{3}

The first group includes the same population as the “low-mortality countries” discussed in Li and Lee (2005) except for West Germany. Because mortality data for West Germany are not available before 1956, the inclusion of West Germany would lead to a shorter calibration window for the group. The second group has the same populations as the set of eastern European countries in Li and Lee (2005).

^{4}

Belarus, Estonia, Latvia, Lithuania, Russia, and Ukraine.

^{5}

The Maddison Project data are expressed as 1990 international dollars, whereas the ERS data are expressed as 2010 international dollars.

^{6}

The increasing trend of the first principal component of GDP reflects the common trend in the GDP. In Online Resource 1, we plot the principal component(s) of GDP and the corresponding values of γ_{x} for each group, both multiplied with –1 (Figs. S5 and S6 therein). In that case, we see that the principal component(s) of GDP have a similar trend as the *K*_{t} in the Li-Lee model.

^{7}

Our notation corresponds with the definition in Schwarz (1978), and a higher BIC implies a better model fit. The BIC ratios are also used in the literature with a negative sign:

$\u223cBIC$ = –2 log $L^$ + *m*·log *M.*

Then, a lower value of the $\u223cBIC$ ratio indicates a better model fit.

^{8}

This mortality trend is denoted by κ in Lee and Carter (1992).

^{9}

The calibration period used in Li and Lee (2005) is from 1952 to 1996.