## Abstract

The majority of countries in Africa and nearly one-third of all countries require mortality models to infer the complete age schedules of mortality that are required to conduct population estimates, projections/forecasts, and other tasks in demography and epidemiology. Models that relate child mortality to mortality at other ages are important because almost all countries have measures of child mortality. A general, parameterizable component model (SVD-Comp) of mortality is defined using the singular value decomposition and calibrated to the relationship between child or child/adult mortality and mortality at other ages in the observed mortality schedules of the Human Mortality Database. Cross-validation is used to validate the model, and the predictive performance of the model is compared with that of the log-quadratic (Log-Quad) model, which is designed to do the same thing. Prediction and cross-validation tests indicate that the child mortality–calibrated SVD-Comp is able to accurately represent the observed mortality schedules in the Human Mortality Database, is robust to the selection of mortality schedules used for calibration, and performs better than the Log-Quad model. The child mortality–calibrated SVD-Comp can be used where and when child mortality is available but mortality at other ages is unknown.

## Introduction

Complete age-specific mortality schedules are necessary inputs to a wide variety of formal demographic and epidemiological methods. A key example is the biennial *World Population Prospects* (WPP) produced by the United Nations Population Division (United Nations, Department of Economic and Social Affairs, Population Division 2015b). These are generally considered the reference population indicators and are widely used by other domestic and international agencies as inputs to estimation and modeling exercises. The WPP contains estimates of time-, sex-, and age-specific mortality, fertility, and population size from 1950 to the present and forecasts of the same quantities to 2100 for all countries of the world. Consequently, each WPP update must contain full age-specific mortality schedules covering the period 1950–2100.

Some countries in the developing world, particularly in Africa, do not yet have civil registration and vital statistics systems that function well enough to report accurately on either fertility or mortality. Focusing on mortality, Table 1 displays the number of countries or world regions for which no information is available on either child mortality or adult mortality, with Africa broken out. Because of the exhaustive coverage of household surveys investigating fertility and maternal/child health, essentially the whole world has at least some recent information on child mortality (Li 2015). In contrast, 50 countries around the world with a total population of nearly 1 billion people have no information on adult mortality, with the bulk of those in Africa—33 countries with a total population of 666 million people.

Mortality models are used to solve this problem and produce full age schedules of mortality. Table 2 describes the number of countries or world regions for which the U.N. Population Division must use mortality models of some kind to produce either estimates of life expectancy at birth *e*_{0} or full age schedules of mortality. Most African countries require mortality models for both, and 38.6 % of countries globally require a model for *e*_{0} and 32.6 % for age-specific mortality.

The standard approach to generating complete age schedules of mortality for countries and areas with insufficient data is to take advantage of the fact that they do have information on child mortality. Typically, model life tables are used to extrapolate full mortality schedules from _{5}*q*_{0}—this is what the U.N. Population Division does (making heavy use of the traditional Coale and Demeny (1966) model life tables), and the Institute for Health Metrics and Evaluation (IHME) uses variations on the modified logit (Mod-Logit) model (Murray et al. 2003) to do the same.

The commonly used model life table systems—regional model life tables and stable populations (Coale and Demeny 1966), life tables for developing countries (United Nations 1982), modified logit life table system (Mod-Logit) (Murray et al. 2003; Wang et al. 2013), and flexible two-dimensional mortality model (Log-Quad) (Wilmoth et al. 2012)—combine a specific model structure and defined variable parameters with a set of fixed parameters that summarize the relationships between mortality at different ages in a set of observed life tables. All are *empirical* models in the sense that they summarize observed mortality and use that summary to produce predicted mortality schedules that are consistent with observed mortality. They come in both regional and continuous forms. The regional models identify and replicate commonly observed mortality patterns associated with geographic regions (and *de facto* periods) and allow mortality to vary continuously within each region-specific pattern. In contrast, the continuous models generate mortality patterns that vary smoothly. Both approaches are essentially two-parameter models. The regional models first identify a discrete region and then use effectively continuously varying life expectancy within each region to adjust the level of region-specific mortality. The continuous models have two continuously varying parameters (e.g., life expectancy, child mortality, or adult mortality).

Murray et al. (2003) enumerated three characteristics required of mortality models: (1) simplicity and ease of use; (2) comprehensive representation of the true variability in sex- and age-specific mortality observed in real populations; and (3) validity that is well quantified by comparing age schedules of mortality predicted by the model with corresponding observed life tables. To those I would add (1) generality with respect to the underlying model structure; (2) flexibility in terms of input parameters; and (3) an ability to handle a wide range of age groups, including very narrow, without having to fundamentally alter the structure of the model.

This work defines and describes a new SVD component–based mortality modeling framework that satisfies all of those requirements. The SVD-component framework provides a general, flexible way to model any demographic age schedule as a function of covariates or predictors that are related to age-specific variation in the age schedule. Here, the SVD-component framework is demonstrated by creating a mortality model that predicts single-year-of-age mortality schedules using either _{5}*q*_{0} or both _{5}*q*_{0} and _{45}*q*_{15} as predictors, similar to both the Mod-Logit and Log-Quad models. The resulting model can be used to produce single-year-of-age mortality schedules from _{5}*q*_{0} alone that are consistent with observed mortality schedules, and this could be useful for those like the U.N. Population Division who must manipulate full age schedules of mortality but have observed values only for _{5}*q*_{0}. The resulting SVD-component model performs better than the current state-of-the-art two-parameter model (Log-Quad), provides predictions by single year of age, and is easily extensible to include additional predictors beyond child and adult mortality.

## Mortality Models

Traditional model life tables (e.g., Coale and Demeny 1966; Ledermann 1969; Murray et al. 2003; United Nations 1955, 1982; Wang et al. 2013; Wilmoth et al. 2012) take an inductive, empirically driven approach to identify and parsimoniously express the regularity of mortality with age based on observed relationships in large collections of high-quality life tables. Some fertility models (e.g., Coale and Trussell 1974; Lee 1993) do the same. An alternative, sometimes deductive approach can be found in the wide variety of parametric or functional-form mortality models (e.g., Gompertz 1825; Heligman and Pollard 1980; Li and Anderson 2009; Makeham 1860) that define age-specific measures of mortality in an analytical form, sometimes with interpretable parameters. Brass (1971) developed a new approach with his two-parameter relational model that has been extended and refined in many ways (for example, Murray et al. 2003; Zaba 1979). More recently, the Log-Quad model of Wilmoth et al. (2012) combines empirical and functional-form approaches to mortality models.

Population forecasting has motivated another important family of related mortality models. Forecasting generates many iterations of age-specific mortality and fertility into the future, and those are usually based on a summary of the corresponding age-specific mortality and fertility in the past. Hence, there is an immediate need to represent full age schedules and their dynamics *compactly*. This has led to the widespread use of dimension-reduction or data-compression techniques to reduce the dimensionality of the problem so that only a few parameters are necessary to represent age schedules and their dynamics. Ledermann and Breas (1959) appear to have been the first to use principal components analysis (PCA) to summarize age-specific mortality and generate model life tables, and many subsequent investigators refined this approach (e.g., Bourgeois-Pichat 1962, 1990; Ledermann 1969; United Nations 1982). Following the early use of PCA to build model life tables, PCA and related methods, such as the singular value decomposition (SVD) (e.g., Good 1969; Stewart 1993; Strang 2009), have been widely used and refined by forecasters to create time series models of mortality and fertility (e.g., Bozik and Bell 1987; Lee 1993; Lee and Carter 1992). See Bell (1997) for a comprehensive summary of this line of development in various fields, dominated by actuarial science and applications in forecasting.

*x*is age,

*t*is time,

**m**is a matrix of age- and time-specific mortality rates,

**a**is the time-constant vector of mean (over columns of

**m**) logged age-specific mortality rates through time,

**b**is the time-constant first left singular vector from an SVD decomposition of the matrix of residuals generated by subtracting

**a**from each column of

**m**, and

**k**

_{t}is a (column) vector of time-varying weights for

**b**

_{x}.

Fitting the model requires three separate steps: (1) calculate **a**_{x}; (2) calculate the residuals *r*_{xt} = ln(**m**_{xt}) – **a**_{x}; and (3) extract the first left singular vector from the SVD of **r** and calculate a value for each element of **k**_{t} corresponding to each column of **m** that minimizes the elements ε_{xt} (the elements of **k**_{t} are essentially the elements of the first right singular vector multiplied by the first singular value of this SVD).

The Lee-Carter model contains two conceptually separate elements: (1) a one-parameter (i.e., **k**_{t}) model of the full age-specific mortality or fertility schedule, and (2) a time series model for that parameter. The temporal sequence of values taken by the elements of **k**_{t} is the focus of a stochastic time series model that is responsible for the temporal dynamics of the method, including the forecasts. Development of the time series model is previewed in earlier work by the authors (Carter and Lee 1986).

**k**

_{t}, it becomes clear that the structure of the Lee-Carter model appears to be a simplified version of the more complex age-period-cohort mortality model conceived earlier by Wilmoth and elaborated over a number of years (Wilmoth 1990; Wilmoth and Caselli 1987; Wilmoth et al. 1989).

^{1}Wilmoth’s model is designed to separate and identify age, period, and cohort effects in an age and time matrix of mortality rates. The basic structure is log(

*m*

_{x}) = (mean model) + (residual model), with the final formwhere

*i*is age,

*j*is period,

*k*= (

*j*–

*i*) indexes cohorts,

*f*is logged age- and period-specific mortality (log(

*m*)), α is an age effect, β is a period effect, the sum $\u2211m=1\rho \varphi m\gamma im\delta jm$ is over a set of ρ rank-1 matrices from the SVD of the residuals remaining after the main effects are subtracted from

*f*, and θ

_{k}is a residual cohort effect remaining after subtracting both the main effects and the SVD approximation of the first residuals from

*f*. This form first appears in Wilmoth et al. (1989).

The model is fit in three steps, effectively explaining ever more nuanced variation in a sequence of residuals: (1) calculate α_{i} and β_{j} such that they minimize the first residuals *r*_{ij} = *f*_{ij} – (α_{i} + β_{j}); (2) use the first ρ terms from the SVD of the matrix of residuals **r** to calculate the second residual $sij=rij\u2212\u2211m=1\rho \varphi m\gamma im\delta jm$; and (3) calculate values for the elements of θ_{k} such that they minimize *s*_{ij} – θ_{k} = ε_{ij}. The SVD or multiplicative term $\u2211m=1\rho \varphi m\gamma im\delta jm$ took shape over several publications (Wilmoth 1990; Wilmoth and Caselli 1987; Wilmoth et al. 1989) to eventually be the standard SVD form that appears in the final model, with the SVD first appearing in Wilmoth et al. (1989).

An examination of Eqs. (1) and (2) reveals the relationship between the Wilmoth and Lee-Carter models. Moving from Wilmoth to Lee-Carter requires the following steps: (1) remove the main period effect β_{j} and the cohort effect θ_{k}, and (2) take only the first term in the SVD approximation of the first residual. The SVD term then becomes ϕ_{1}γ_{i1}δ_{j1} or, dropping the *m* = 1 index, γ_{i}(ϕδ_{j}). Replacing Wilmoth’s *i* and *j* with Lee-Carter’s *x* and *t* and letting *k* = ϕδ make the equivalence clear. Lee and Carter (1992) acknowledged that their model has much in common with the Wilmoth model. They cited Wilmoth by way of explaining the SVD solution to calculating the elements of **b**, whereas this is just the simplest rank-1 form of the time-varying term in the model Wilmoth proposed.

_{5}

*q*

_{0}or (

_{5}

*q*

_{0},

_{45}

*q*

_{15}). Adopting the nomenclature from log-linear models, this log-quadratic (Log-Quad) model has the following form:

*x*is age;

*m*is age-specific mortality;

*a*,

*b*, and

*c*are constant age-specific coefficients for the quadratic mean model;

*h*is the input value of log(

_{5}

*q*

_{0});

*v*is an age-specific correction factor; and

*k*is a coefficient for

*v*. Correction factor values

*v*

_{x}are identified by calculating the SVD of the matrix of residuals that remain after the quadratic portion of the model is subtracted from life tables that are part of the Human Mortality Database (HMD) (University of California, Berkeley and Max Planck Institute for Demographic Research n.d.) and using the resulting first left singular vector as a starting point.

^{2}Thus, the Log-Quad model has the now familiar mean/residual form of the original Wilmoth model, and the structure of the residual model is a one-term version of the SVD form originally proposed by Wilmoth et al. (1989). The Log-Quad’s contribution is an innovative new mean model that takes advantage of the empirically observed curvilinear relationship between child mortality and mortality at other ages. The Log-Quad model is elegant, simple, and parsimonious—one (

_{5}

*q*

_{0}) or two (

_{5}

*q*

_{0}and

*k*)

^{3}parameters—and it performs well, accurately representing a wide range of life tables, including life tables with very low mortality, and generally outperforming all other model life tables (Wilmoth et al. 2012).

**m**is a vector of age-specific mortality rates,

**u**

_{i}are a set of

*c*vectors containing age-varying values identified in a set of observed mortality rates,

*w*

_{i}are weights, and

**r**is a vector of residuals. This is similar to Ledermann’s original use of factor analysis to build a system of model life tables based on factors resulting from a PCA decomposition of a matrix of age-specific mortality rates (Ledermann 1969; Ledermann and Breas 1959) and the PCA-based model underlying the U.N. model life tables (United Nations 1982), both of which have the mean/residual structure of the Wilmoth models because they use PCA operating on a

*centered*data cloud. The component model has been used to summarize mortality data from the INDEPTH Network using PCA-derived components (Clark 2001; Clark et al. 2009; INDEPTH Network 2002), similarly for the HMD (Clark and Sharrow 2011a,b), and more recently in work on small-area estimates of mortality (Alexander et al. 2017). This approach combines a simple linear model with PCA, SVD, or similar methods to concentrate information along a few dimensions; see Clark (2015) for a detailed discussion.

The component model is similar to the SVD-inspired *first residual model* term in Wilmoth’s Eq. (2). However, neither Wilmoth nor subsequent investigators identified or developed the relationship between the SVD decomposition of a matrix of mortality rates and the columnwise weighted-sum model in Eq. (4). A key conceptual difference between the two approaches is that Eq. (4) does not have a mean model. Consequently, the factors identified by the SVD model everything, not just the residual as in all the Wilmoth-inspired models. The first component, **u**_{1}, is effectively the mean age-specific mortality schedule, and its weight reflects the overall level of mortality. The remaining components, **u**_{i} for *i >* 1, define deviations from the average age pattern, independent of level. All this follows directly from the properties of the SVD and a substantive interpretation of both the left and right singular vectors when applied to demographic age schedules (Clark 2015). Additionally, the weights are viewed as continuously varying parameters that can be the object or output of additional models—for example, clustered using objective clustering methods to identify groups of similar age schedules, estimation using either traditional or Bayesian methods, or predicted from covariates that vary systematically with age schedules, as this article demonstrates.

Finally, along with other researchers, I applied the component model to HIV-related mortality in countries with large HIV epidemics (Sharrow et al. 2014). In that article, we demonstrated that the weights in Eq. (4) vary systematically with HIV prevalence. We took advantage of that fact to build a model that predicts three weights as a function of HIV prevalence and then predicts mortality age schedules from the predicted weights using Eq. (4). The resulting HIV-calibrated component model uses the weights as a link between HIV prevalence and full age schedules of mortality.

In this article I describe how the SVD can be used to develop a general modeling framework for demographic age schedules. This framework has the important advantages of being (1) straightforward and easy to understand and use; (2) general and applicable to any demographic age schedule; (3) able to incorporate covariates or predictors in a unified way; and (4) able to handle age groups of any granularity (e.g., one year or five years) in the same way. I demonstrate this framework by creating and validating an accurate one- or two-parameter mortality model based on age-patterns of mortality contained in the HMD.

## Data

### Human Mortality Database Life Tables

The HMD contains rigorously cleaned, checked, and validated information on deaths and exposure from a number of mainly developed countries “where death registration and census data are virtually complete.” The data are aggregated and presented in a wide variety of formats. The objective of this analysis is to capture and characterize as much variability in age-specific mortality as possible, and consequently I use the 1 × 1 HMD life tables for each sex. Those provide all columns of a standard life table for single calendar years by single year of age from 0 to 110+. Each country provides data for different historical periods, and some countries are subdivided into more specific subpopulations. In the latter situation, a national population life table is typically provided that aggregates across the subgroups. Both the national and subgroup populations are included in this analysis to maximize the variability in age-specific mortality schedules in the overall data set. A few of the 1 × 1 life tables from the HMD contain problems: (1) the life tables for Belgium 1914–1918 for both sexes contain no data; and (2) the female life tables for Iceland in 1852 and the Maori Population of New Zealand in 1949, 1956, and 1959 display implausible mortality at older ages. All those life tables are excluded. Table 3 contains an organized list of the life tables included in this analysis: 4,610 life tables for each sex and 9,220 in total. The HMD data used in this analysis were downloaded on Friday November 2, 2018 from the HMD web site (http://www.mortality.org/hmd/zip/all_hmd/hmd_statistics.zip).

### Model Scales

This analysis is conducted on life table probabilities of dying for those who survive to the beginning of each one-year age group. Single-year probabilities, _{1}*q*_{x}, are taken directly from the HMD life tables; five-year probabilities, _{5}*q*_{x}, are calculated as $qx5=1\u2212\u220fa=xx+41\u22121qa$; and _{45}*q*_{15} is calculated as $q1545=1\u2212\u220fa=15591\u22121qa$. *Child mortality* refers to _{5}*q*_{0}, and *adult mortality* refers to _{45}*q*_{15}.

The natural scale of the models is the full real line, so life table probabilities of dying, *q*, are transformed using the logit function $logitx=lnx1\u2212x$ so that their transformed values occupy the full real line. Outputs from the models are transformed back to the probability scale with range [0,1] using the expit function $expitx=ex1+ex$, inverse of the logit.

## Methods

### Relevant Characteristics of the SVD

**X**into three matrix factors with special properties:

**U** is a matrix of left singular vectors (LSVs) arranged in columns, **V** is a matrix of right singular vectors (RSVs) arranged in columns, and **S** is a diagonal matrix of singular values (SVs). The LSVs and RSVs are independent and have unit length. If one views the columns of **X** as a set of dimensions, then the rows of **X** locate points defined along those dimensions—the data cloud. The RSVs define a new set of dimensions that line up with the axes of most variation in the data cloud. The first RSV points from the origin to the data cloud, or if the cloud is around the origin, then it points along the line of maximum variation within the cloud. The remaining RSVs are orthogonal to the first and each other and line up with successively less variable dimensions within the cloud. The elements of the LSVs are values that correspond to the projection of each point along the new dimensions defined by the RSVs. The SVs effectively stretch the new dimensions defined by the RSVs in accordance with the variation in the cloud along each RSV. The numeric value of each SV is the square root of the sum of squared distances from the origin to each point along the corresponding SVD dimension, and their squares sum to the total sum of squared distances from the origin to each point along all of the original dimensions.

**u**

_{i}are LSVs,

**v**

_{i}are RSVs,

*s*

_{i}are SVs, ρ is the rank of

**X**,

**x**

_{ℓ}are columns of

**X**, and

*v*

_{ℓi}are the elements of RSV

**v**

_{i}(see the online appendix, section A). Equation (6) says that

**X**can be written as a sum of rank-1 matrices, each created from one of the LSVs by applying weights in the form of the elements of the corresponding RSV. Equivalently, Eq. (7) says that each column

**x**

_{ℓ}of

**X**can be written as the weighted sum of the LSVs with the weight for each being the

*ℓ*th element of the corresponding RSV.

^{4}The LSVs and SVs are constant, so the weights are the variables in these expressions, and their values determine how much of each LSV is added to the mixture to represent the original data. Finally, because the LSVs are independent, ordinary least squares (OLS) regression can be used to estimate models that relate

**x**

_{ℓ}to the LSVs. If the constant is constrained to be 0, then the coefficients are equal to

*s*

_{i}

*v*

_{ℓi}.

Because the RSVs define successively less variable dimensions in the data cloud, the first term in Eqs. (6) and (7) contains the most information, and subsequent terms contain less and less (Golub et al. 1987). Including all ρ terms replicates the original data matrix **X** or any of its columns **x**_{ℓ} exactly, while including only the first few terms provides a good approximation.

### SVD Component (SVD-Comp) Model

*A × L*matrix,

**Q**, of mortality schedules for each sex, calculate the $SVDQz=UzSzVzT$. Using the resulting factors as in Eq. (7), each

*A*-element mortality schedule,

**q**

_{zℓ}, is approximated as the

*c*-term sum,

*A*is the number of age groups and rows in

**Q**

_{z};

*L*is the number of life tables and columns in

**Q**

_{z};

*z*∈ {female, male};

*c ≤*ρ, the rank of

**Q**

_{z}; and

*ℓ*∈{1

*. . . L*} indexes mortality schedules (Golub et al. 1987). The

*A*-element LSVs,

**u**

_{zi}, and the SVs,

*s*

_{zi}, are constant across all mortality schedules. Because

*c*≤ ρ, the sum on the right is an approximation of the mortality schedule, as indicated by the ≈. As is clear in the upcoming section on calibration of SVD-Comp,

*c*= 4 is sufficient to make the approximation almost perfect across the entire HMD. If viewed as a data compression technique, all 4,610 sex-specific mortality schedules in the HMD can be very closely approximated with just four age-varying components—a greater than 99.9 % reduction in the volume of data required to represent the HMD. The elements that vary among mortality schedules are the RSVs,

**v**

_{zi}, whose elements,

*v*

_{z}

_{ℓ}

_{i}, are the weights in the sum. This is a continuously varying model, such as Mod-Logit (Murray et al. 2003) and Log-Quad (Wilmoth et al. 2012), rather than a regional model, such as the Coale and Demeny (Coale and Demeny 1966) and U.N. model life tables (United Nations 1982).

Figure 2, presented later in the article, displays the scaled LSVs, *s*_{zi}**u**_{zi}, obtained from the SVD of the matrix of logit-scale _{1}*q*_{x} values contained in the HMD. The SVD-Comp model is simply a weighted sum of those components. The first component represents the average shape and scale of human mortality by age, and the remaining three components add age-specific modifications to that basic shape; that is, all values of the first component are negative (because of the logit transformation), whereas the second through fourth components cross the *x*-axis.

When the *v*_{z}_{ℓ}_{i} are replaced by values that can be related to covariates, as they are just below in Eqs. 9-11, the modeling framework becomes highly flexible: like traditional model life tables, this framework can be used inductively to produce a mortality model that generates age schedules of mortality that are consistent with a collection of observed mortality schedules, or it can be used deductively to generate new age schedules based on a theoretical understanding of how a covariate should affect each component in the model. In general, the age pattern of the scaled LSVs in the sum can be interpreted and manipulated theoretically; see upcoming Fig. 2 and the results discussed in the section Factors of the SVD.

### Parameterization Using _{5}*q*_{0} and (_{5}*q*_{0}, _{45}*q*_{15})

*z*∈ {female, male};

*i ≤*ρ indexes the RSVs; and

*ℓ*∈ {1

*. . . L*} indexes both the elements of the RSVs and the values of child and adult mortality, one for each sex-specific mortality schedule. Each sex-specific RSV has its own separate model,

*f*

_{zi}, that can be used to produce predicted values for the weights in Eq. (8) using new values for

_{5}

*q*

_{0 z}and

_{45}

*q*

_{15}

_{z}.

**q**

_{z}, associated with given values for a set of weights $w\u0302zi=fziq0z5$ or $w\u0302zi=fziq0z,q15z455$ is

Equation (11) relates either child mortality (_{5}*q*_{0}) or both child and adult mortality (_{5}*q*_{0}, _{45}*q*_{15}) to full age schedules of mortality according to the patterns of those relationships that exist in the original set of HMD life tables, **Q**, using a very compact approximation.

This is a fully general approach to predicting mortality or any other demographic age schedules. Equations (9) and (10) can be replaced with models that summarize the relationships between any covariate and elements of the RSVs and weights, and age can be aggregated into any age group; doing so requires simply recalculating the SVD on the age-aggregated data set.

### Calibrating SVD-Comp to the Relationship Between _{5}*q*_{0} and Mortality at Other Ages in the HMD

All computation is carried out using the R statistical programming environment (R Foundation for Statistical Computing 2016).

#### Calibration SVDs

The life tables of the HMD are arranged into two *A × L* matrices (**Q**_{z}) of single-year, age-specific life table probabilities of dying (_{1}*q*_{x}), one for each sex. *A* = number of age groups = 110; *L* = number of life tables = 4,610; and *z* ∈ {female, male}. The SVD^{5} of each **Q**_{z} yields ρ LSVs, **u**_{zi}; RSVs, **v**_{zi}; and SVs, **s**_{z}. To ensure that all age groups have approximately the same influence when calculating the SVDs, each mortality schedule is offset from the origin^{6} by –10, and the offset is added back to predicted mortality schedules. Four of the new dimensions identified by each SVD are retained—that is, *c* = 4 in Eq. (11). For females, those account for 0.998328, 0.000936, 0.000071, and 0.000058 of the total sum of squares, respectively, or together 0.999392. Corresponding figures for males are 0.998595, 0.000824, 0.000103, and 0.000052, and together 0.999575. Section C of the online appendix contains additional information on the total sum of squares explained by each component of the SVD.

#### Models for Predicting Weights

**v**

_{zi}to

_{5}

*q*

_{0 z}and

_{45}

*q*

_{15}

_{z}. Scatterplots of the elements of the RSVs versus logit(

_{5}

*q*

_{0}) in Figs. E1 and E2 in the online appendix make it clear that the relationships are not linear or simple. With no theory to guide the choice of predictors, I tried all combinations of simple transformations of logit(

_{5}

*q*

_{0}) and logit(

_{45}

*q*

_{15}) and their interactions. The resulting models explain almost all the variance in the elements of

**v**

_{1}(

*R*

^{2}

*≈*97 % for both sexes), the vast majority of the variance in the elements of

**v**

_{2}(

*R*

^{2}

*≈*87 % for both sexes), and one-third to one-half the variance in the elements of

**v**

_{3}and

**v**

_{4}. Additionally, I tried to avoid overfitting or creating odd boundary effects in the predicted values that would have made out-of-sample predictions immediately implausible. These models behave sensibly up to the edges of the sample. The final models are

*i*∈ {1 : 4} indexes the SVD dimensions, and

*ℓ*indexes mortality schedules and elements of

**v**

_{zi}. OLS regression is used to estimate coefficients for the eight regression models defined in Eq. (12), and the estimated values are contained in online appendix D, Tables D1 and D2. With new values for both

_{5}

*q*

_{0}and

_{45}

*q*

_{15}as inputs, these models are used to predict values for the weights in Eq. (11)—that is, for prediction,

*v*

_{zℓi}on the left-hand side is replaced with $w\u0302zi$.

#### Models for Adult Mortality

_{5}

*q*

_{0}as an input, I define a regression model that relates adult mortality logit(

_{45}

*q*

_{15})

_{z}to child mortality

_{5}

*q*

_{0}

_{z}. The scatterplot of logit(

_{45}

*q*

_{15}) versus logit(

_{5}

*q*

_{0}) in Fig. E3 in the online appendix reveals a slightly complicated relationship that is neither linear nor systematically curvilinear. Again, without theory as a guide, I tried a variety of models, including various simple transformations of

_{5}

*q*

_{0}. The resulting models explain most of the variance in logit(

_{45}

*q*

_{15}) (

*R*

^{2}= 93 % for females, and 79 % for males). The final models are

OLS regression is used to estimate coefficients for the two regression models defined by Eq. (13), and the estimated coefficients are contained in Table D3 in the online appendix. This model is used to predict values for _{45}*q*_{15} when only _{5}*q*_{0} is supplied as an input. Then both the input value for _{5}*q*_{0} and the predicted value for _{45}*q*_{15} are used in Eq. (12) to predict the weights in Eq. (11).

#### Models for Mortality in the First Year of Life

_{1}

*q*

_{0}) and logit(

_{5}

*q*

_{0}). Mortality falls very rapidly in the first few years of life. Using the child mortality rate (

_{5}

*q*

_{0}), a five-year summary of mortality between ages 0 and 5, as a predictor of single-year mortality within that same five-year age group is relatively uninformative. Using the SVD procedure, experimentation reveals that

_{5}

*q*

_{0}predicts

_{1}

*q*

_{1}through

_{1}

*q*

_{4}well and

_{1}

*q*

_{0}slightly less well. The prediction of

_{1}

*q*

_{0}can be improved by modeling the relationship between logit(

_{1}

*q*

_{0}) and logit(

_{5}

*q*

_{0}) separately as

OLS regression is used to estimate the coefficients of this model, displayed in Table D4 of the online appendix. The model explains essentially all the variance in logit(_{1}*q*_{0}) (*R*^{2}*>* 99 % for both sexes) and is used to predict values for _{1}*q*_{0} directly from the input value of _{5}*q*_{0}.

### Using the Model

The full model is used as follows:

Identify input values for

_{5}*q*_{0}and optionally_{45}*q*_{15}, and transform them to the logit scale. If_{45}*q*_{15}is not available, predict logit(_{45}*q*_{15}) using the input value for_{5}*q*_{0}and the regression coefficients corresponding to Eq. (13).Use the input values for logit(

_{5}*q*_{0}) and logit(_{45}*q*_{15}) obtained in Step 1 and the regression coefficients estimated using Eq. (12) to predict values for the weights $w\u0302zi$ defined in Eq. (11).Insert the weights predicted in Step 2 into Eq. (11) to calculate a predicted age schedule of mortality probabilities, $q\u0302$, on the logit scale.

If desired, improve the prediction of logit(

_{1}*q*_{0}) using the regression coefficients corresponding to Eq. (14) to directly predict logit(_{1}*q*_{0}) from the input value of logit(_{5}*q*_{0}) from Step 1. Replace the first element of $q\u0302$ with this predicted value for logit(_{1}*q*_{0}).Add 10 to each element of $q\u0302$ to account for the offset used when calculating the SVDs of the HMD mortality schedules.

Take the expit of $q\u0302$ to yield single-year age-specific probabilities of dying on the probability scale.

### Model Validation

The general sensitivity of the model to exactly which mortality schedules are used for calibration is assessed using a cross-validation approach. Fifty random samples of 50 % of the HMD mortality schedules are drawn, the model is calibrated with each using the previously described calibration process, and all the HMD mortality schedules are predicted. For each of the 50 models, prediction errors are calculated for all mortality schedules as the difference $q\u2113\u2212q\u0302\u2113$. The error distributions of the in-sample and out-of-sample mortality schedules are summarized and compared.

To investigate the sensitivity of the overall modeling approach to the number of mortality schedules used to calibrate the model, I conduct another cross-validation exercise with varying sample sizes. For each sample fraction from 10 % to 90 % in 20 % increments, 50 random samples are drawn from the HMD life tables. As described just above, I calibrate the model using each sample, and I predict all the HMD mortality schedules, calculate errors, and summarize and compare error distributions for in- and out-of-sample mortality schedules.

### Comparing Performance of SVD-Comp and the Log-Quad Model

The Log-Quad model (Wilmoth et al. 2012) is the state-of-the-art mortality model relating child and adult mortality to full age schedules of mortality. I compare prediction errors produced by both the Log-Quad and SVD-Comp models. I use the Log-Quad model as published and the R code provided by Wilmoth et al. (2012) to produce predicted _{5}*q*_{x} values for each of the HMD mortality schedules using either _{5}*q*_{0} or both _{5}*q*_{0} and _{45}*q*_{15} as inputs. The Log-Quad model predicts mortality in five-year age groups. To accommodate the one-year age groups (_{1}*q*_{x}) predicted by the SVD-Comp model, I use standard life table methods to transform predicted single-year to five-year _{5}*q*_{x} values. I summarize the distribution of errors, $q\u2113\u2212q\u0302\u2113$, produced by both models in various ways. Comparisons are made only for predictions using the same inputs for both models, either _{5}*q*_{0} alone or both _{5}*q*_{0} and _{45}*q*_{15}.

I also summarize the overall error produced by each model across all the mortality schedules in the HMD. This is done by taking the absolute value of each year-, sex-, and age-specific error and then summing the resulting absolute errors across all ages and years for each sex. This produces a single number—the total absolute error—that indicates the overall difference between the predicted and actual values for all years and ages. In addition to this I present total absolute errors in *e*_{0}.

To assess age-specific errors in $q\u0302$ and life table quantities derived from $q\u0302$, I predict $q\u0302\u2113$ with both SVD-Comp and Log-Quad using _{5}*q*_{0} from each HMD life table as input. I construct full life tables from $q\u0302\u2113$ and compared them with the life tables in the HMD.^{7} I construct age-specific weights from the *l*_{x} columns of the HMD life tables by summing *l*_{x} across all HMD life tables in five-year age intervals and then dividing each age-specific sum by the total across all ages. The resulting weights correspond to the proportionate *l*_{x} age structure of the HMD life tables. I calculate weighted age-specific absolute errors in $q\u0302$ and $e\u0302$ by summing absolute errors in $q\u0302x5$ and $e\u0302x$ at five-year age intervals across all life tables in the HMD and then multiplying by the corresponding age-specific weight. The weighted age-specific errors in $q\u0302x5$ are a refinement on the overall errors in $q\u0302x5$, as described earlier, and reveal how close each model comes to replicating _{5}*q*_{x} at each age. The weighted age-specific errors in $e\u0302x$ provide an age-specific summary of the errors at each age in the derived life table columns that are necessary to calculate *e*_{x}—that is, all the columns.

### Application to Mexico and South Africa

SVD-Comp and Log-Quad are used to predict age-specific mortality rates for Mexico in 1983–1985 and South Africa in 2005 using both child and adult mortality as inputs. Data for Mexico come from the Human Life Table Database (Max Planck Institute for Demographic Research et al. n.d.), and data for South Africa from the World Health Organization’s Global Health Observatory data repository (World Health Organization n.d.)—both downloaded on August 21, 2018.

Mexico was chosen because it is a developing country with reasonable data and generally low but otherwise unremarkable mortality. South Africa was chosen because it is a developing country with a unique age-specific mortality schedule during the late 1990s and early 2000s. HIV/AIDS caused many deaths at very young and adult ages, giving rise to a characteristic bulge in mortality at adult ages. Because both Log-Quad and SVD-Comp are calibrated using the HMD, which does not contain life tables with HIV/AIDS–related mortality, both models are expected to perform reasonably well for Mexico, but neither is expected to follow the HIV/AIDS–related mortality bulge in South Africa.

## Results

### Data and Fits

To provide a sense of the mortality data contained in the HMD and the fits produced by the SVD-Comp model, Fig. 1 displays _{1}*q*_{x} on the logit scale for Sweden in 1751 and Austria in 1990, with both data and predicted values produced by SVD-Comp using _{5}*q*_{0} alone as an input.

### Factors of the SVD

Figure 2 and Table B1 (online appendix) present the sex-specific LSVs from the SVD of the full set of HMD mortality schedules scaled by their corresponding SVs, *s*_{i}**u**_{i} (ignoring the index for sex *z*). All elements of *s*_{1}**u**_{1} are negative so that *s*_{1}**u**_{1} captures the underlying average shape of the mortality profile with age. Weights applied to *s*_{1}**u**_{1} move this underlying mortality profile up and down and hence control the overall level of mortality. The remaining *s*_{i}**u**_{i} cross the *x*-axis and therefore represent age-specific deviations from the overall underlying pattern. These scaled LSVs are the components used in the weighted sum in Eq. (11). Figure 2 also displays smoothed^{8} versions of the scaled LSVs. The smoothed versions can be used to make the predicted mortality schedules smoother.

### Calibration Relationships

Figures E1–E4 (online appendix) display the data and predicted values from the models in Eqs. (12), (13), and (14). The corresponding estimated coefficients based on the whole HMD and used to calculate the predicted values in the figures are contained in Tables D1–D4 (online appendix). Figures E1 and E2 (online appendix) contain scatterplots of the RSV element values versus logit(_{5}*q*_{0}). The figures display both data and values predicted from Eq. (12) using logit(_{5}*q*_{0}) and logit(_{45}*q*_{15}) predicted from the model in Eq. (13) as inputs. There are clear, quasilinear relationships between the elements of the RSVs and logit(_{5}*q*_{0}). Figure E3 (online appendix) displays logit(_{45}*q*_{15}) versus logit(_{5}*q*_{0}), along with the predicted values from Eq. (13). Finally, Figure E4 (online appendix) displays logit(_{1}*q*_{0}) versus logit(_{5}*q*_{0}), along with predicted values from Eq. (14).

### Cross-Validation Prediction Errors

Figure 3 displays sex- and age-specific boxplots of the error distribution for one-year age groups from the first cross-validation using 50 samples of 50 % of the HMD to calibrate the SVD-Comp model. The errors are generally very small and centered on 0 through roughly age 60. At older ages, the size of the errors increases, and the median drifts slightly away from 0 in a positive direction, especially at ages older than 90. However, the median error is never much more than 0.01, and as displayed in Fig. 5, median errors are significantly smaller than those produced by the Log-Quad model at the same ages. The error distributions of the in-sample and out-of-sample predictions are indistinguishable at all ages, indicating that the SVD-Comp model is not sensitive to exactly which mortality schedules are used for calibration when half of them are used.

### Varying Sample Size Cross-Validation Prediction Errors

Figures 4 and E6 (online appendix) contain the second set of cross-validation results investigating the effect of varying the number of mortality schedules used to calibrate the SVD-Comp model. Both figures summarize the overall prediction error distributions (all ages and years combined) for the SVD-Comp model by sample status (i.e., in-sample versus out-of-sample mortality schedules). The sample fraction varies from 10 % to 90 % in increments of 20 %. Figure 4 displays boxplots of the median of the 50 sample medians of overall error. This is very similar comparing in-sample and out-of-sample mortality schedules for both sexes across all sample fractions. In all cases, a slight positive bias results from the positive bias in errors at older ages (see Fig. 3). A similar situation exists for the distributions of the interquartile range of overall errors, (Fig. E6, online appendix). The only systematic change in these distributions by sample fraction is that the interquartile range of the indicators calculated from the sample decreases as the sample fraction increases, as expected. Inversely, there is a weak trend toward increases in the interquartile range calculated in the out-of-sample group as the sample fraction increases, also as expected. In general the SVD-Comp model appears to be remarkably robust as the number of mortality schedules used for calibration decreases. Performance is satisfactory all the way down to the 10 % sample and is good all the way down to 30 %.

### Comparison Between SVD-Comp and Log-Quad Prediction Errors

Figure 5 displays sex-age-specific boxplots of the distribution of prediction errors for both the SVD-Comp and Log-Quad models. The median error by sex and age is close to 0 for both models through roughly age 70. At ages older than 70 the median error for the Log-Quad model is systematically substantially larger than 0, while for the SVD-Comp model the median error stays at 0. The sex- and age-specific interquartile ranges are similar for both models, very small through roughly age 40, growing slowly between 40 and roughly 85 and then shrinking again through 110. In general, at ages older than 45 the error distribution is biased in a positive direction for the Log-Quad model but is centered on 0 at all ages for the SVD-Comp model.

Table 4 displays the total absolute errors on the natural scale for the SVD-Comp and Log-Quad models for predictions based on either _{5}*q*_{0} alone or both _{5}*q*_{0} and _{45}*q*_{15}. The table also presents differences between the total absolute errors for the two models in both additive (Log-Quad – SVD-Comp) and proportional form ((Log-Quad – SVD-Comp) / SVD-Comp). In all cases, the SVD-Comp model predictions are globally closer to the HMD life tables.

Tables F1 and F2 (online appendix) display the weighted sum of age-specific absolute errors in $q\u0302\u2113$ and $e\u0302\u2113$ across all 4,610 life tables in the HMD. The last row in each displays the sum across all ages. The unweighted total absolute errors in $e\u03020$ for SVD-Comp calculated using one through four components are presented in Table F3 (online appendix). Predicted values for life expectancy at birth, $e\u03020$, reflect predictions at all ages so that errors in $e\u03020$ describe the cumulative effect of prediction errors at all ages. With each additional component, the total absolute errors in $e\u03020$ are reduced, and four components are required for SVD-Comp to perform better than Log-Quad. This is true in spite of the fact that the models used to predict the weights for the third and fourth components are not as predictive as those used to predict the weights for the first two components (Eq. (12), and Tables D1 and D2, online appendix).

Finally, Fig. E5 (online appendix) displays predicted _{1}*q*_{x} from the SVD-Comp using _{5}*q*_{0} alone for three different levels of _{5}*q*_{0}.

### Application to Mexico and South Africa

Figure 6 displays data and predictions from both Log-Quad and SVD-Comp in standard five-year age groups for Mexico in 1983–1985 and South Africa in 2005 using both child and adult mortality as predictors. The two models produce essentially the same predictions for Mexico, and both adequately follow the data given that they are effectively two-parameter models. The situation for South Africa is different. As expected, neither model is able to follow the HIV/AIDS–related bulge at adult ages. Both models thread the predictions through the male age schedule reasonably well, overstating the mortality of adolescents and young adults and understating the mortality of middle-aged adults. For males, both models produce plausible predictions but are unable to reproduce the bulge. SVD-Comp does the same for females, essentially cutting off the bulge; however, Log-Quad produces an implausible age pattern of mortality, with extremely high mortality for older children, adolescents, and young to middle-aged adults. The predictions for South Africa reveal a fundamental limitation of all empirically based mortality models: they cannot represent mortality age profiles that are fundamentally different from those contained in the data used to create them. The solution to this is to identify or create new empirical life tables that represent the age profiles in question and include them in the data used to create the models.

## Discussion

The SVD-Comp model is a simple framework for building mortality models. Its key advantages are (1) a simple linear structure that does not need to be changed for the model to be used in a variety of ways; (2) a general interface—that is, the weights in Eq. (11)—through which input parameters can affect the age pattern of mortality; (3) an ability to handle arbitrarily defined age groups without having to alter the fundamental structure of the model, such as the one-year age groups used here; and (4) through its structure, an inherent constraint that ensures that mortality at each age is related to mortality at each other age according to the age patterns reflected in each of the components. In addition to these advantages, the model also satisfies the combined list of desired characteristics for a mortality model enumerated in the Introduction.

This approach is general and allows all-age mortality schedules (in arbitrarily fine age groups) to be predicted from any covariates that are related to age-specific mortality. This general relationship is quantified in the models (Eq. (12)) that relate the weights in Eq. (11) to the covariates, given that the relationship of each age to all others is maintained through the constant components derived from the SVD, and those intra-age relationships are affected all together through the weights on the components. This constrains the intra-age relationships and relates them to the covariates in a simple, flexible way.

When the weights are modeled as functions of child mortality and calibrated using the relationship between the empirical weights (*v*_{zℓi} in Eq. (8)) and child mortality in the HMD, the model serves the same purpose as the Log-Quad model (Wilmoth et al. 2012), and it performs slightly better in a direct comparison while having the advantage of directly producing mortality schedules by single year of age. Note that this comparison is conducted with the Log-Quad as presented in Wilmoth et al. (2012). In that article, the authors explicitly favored an estimation technique that would, they claimed, reduce estimation bias at the cost of having (slightly) larger prediction errors when evaluated against the historical data set—a fact that is apparent in Fig. 5. The published Log-Quad was calibrated to the slightly different and smaller set of HMD life tables that existed at the time and met the authors’ criteria for inclusion. Consequently, the results of the comparison would likely change if the Log-Quad were recalibrated using the same set of HMD life tables described and used here. However, given the robustness of the SVD-Comp to the set of life tables used in calibration (see the sections Cross-Validation Prediction Errors, and Varying Sample Size Cross-Validation Prediction Errors), this potential difference is unlikely to be large.

Concerning calibration and complexity, the cross-validation results clearly demonstrate that the calibration to the HMD is robust with respect to exactly which and how many mortality schedules are used, and SVD-Comp is no more complex than Log-Quad. SVD-Comp requires one SVD calculation and six regression models (four in Eq. (12), one in Eq. (13), and one in Eq. (14)) for each sex to capture the relationship between child mortality and mortality at other ages in the HMD—12 regression models in total. Log-Quad requires one SVD calculation and one log-quadratic model of the general form log(_{5}*m*_{x}) ~ log(_{5}*q*_{0}) + log(_{5}*q*_{0})^{2} for each five-year age group and another to refine the prediction of _{1}*q*_{0} for each sex—46 regression models in total. The total number of regression coefficients required by each model (for each sex) is: 44 for SVD-Comp and 70 for Log-Quad. The total number of discrete values required for prediction (for each sex) is 484 (4.4 per age group) for SVD-Comp and 92 (3.8 per age group) for Log-Quad. SVD-Comp makes predictions in single-year age groups, and Log-Quad makes predictions in five-year age groups. Comparing the complexity of the models is not easy and depends on where one focuses, but it is clear that neither is obviously more or less complex than the other. Perhaps the only important difference in this respect is that there is nothing in the overall Log-Quad model to directly constrain the relationship of mortality at one age to another except for the quadratic form of the relationship between mortality at each age and _{5}*q*_{0}, whereas SVD-Comp manipulates a linear combination of age-specific vectors, so that the relationships between ages are constrained to fall within the four-dimensional space defined by the four components used by SVD-Comp.

Together with my earlier work with others on an HIV-calibrated version of SVD-Comp (Sharrow et al. 2014), this demonstration suggests that it is reasonable to expect that SVD-Comp could be calibrated in a variety of additional ways to produce useful models that relate age-specific mortality to, for example, life expectancy at birth (or some other age), GDP, geographic region, period, epidemiological indicators (as in Sharrow et al. 2014), a combination of any of these, or something else. Moreover, subtle effects on the age structure of mortality, such as the rotation in age-specific mortality identified by Li and Gerland (2011), could be incorporated by adding the necessary elements to the models for the weights. The same approach could be applied to develop models for the difference between underlying age-specific mortality and age-specific mortality affected by specific shocks, such as natural disasters, conflicts, or epidemic diseases (e.g., HIV). It is even possible to refine the Lee-Carter model in Eq. (1) by adding more components to the SVD-derived **b**_{x}**k**_{t} term so that the enhanced model could represent a wide range of age patterns instead of the constant age pattern included in the existing formulation. This would add more parameters to the model, but the payoff might be sufficient to make that worthwhile. Going further, the entire Lee-Carter model could be replaced by the SVD-Comp model, which would give it the ability to model changing levels and age patterns of mortality independently and generally be more flexible.

The general SVD-Comp model in Eq. (11) can be used in another way to interpolate or smooth incomplete or noisy age schedules by simply using OLS regression of the incomplete mortality schedule against the corresponding elements of the first few components, *s*_{zi}**u**_{zi}, with the constant constrained to be 0, and then predicting the full mortality schedule from all elements of the components and the coefficients estimated by the regression. Bayesian estimation can also be used to estimate the weights and their uncertainty, similar to Sharrow et al. (2013).

The application to Mexico and South Africa confirmed that the HMD-calibrated SVD-Comp works at least as well as Log-Quad when applied to mortality schedules in populations well outside of the HMD. For South Africa, neither model was able to reproduce the HIV/AIDS–related mortality bulge at adult ages. SVD-Comp produced plausible mortality schedules for both sexes that were as close as possible to South Africa’s, given that it could not reproduce the bulge. In contrast, Log-Quad produced a plausible mortality schedule for males but a nonsensical schedule for females. These results reveal an urgent need to increase the diversity of mortality schedules available in freely accessible archives, such as HMD, and in particular, an important need to compile much better mortality data for Africa and other developing world regions where age schedules of mortality are different from what has been observed in the developed world. Additionally, the application to South Africa suggests that SVD-Comp may provide a stable framework to begin building mortality models that include epidemiological (e.g., HIV prevalence and antiretroviral therapy coverage) and other predictors. Earlier work using modeled data (Sharrow et al. 2014) is a start. However, because building models using modeled data is of limited value, reasonably large, high-quality empirical mortality data sets must be assembled from the places where models such as Log-Quad and SVD-Comp are most useful.

## Software and Reproducibility Materials

A GitHub repository contains all the code necessary to reproduce the results presented in this manuscript (https://github.com/sinafala/svd-comp). Both the appendices and a PDF rendered from the R Markdown file (on GitHub) that produces the results are available online.

An R package (R Foundation for Statistical Computing 2016) implementing the HMD child or child/adult mortality–calibrated version of SVD-Comp presented above is available as fully open source and free software to download directly from the GitHub repository using the *devtools* R package and command: install_github(repo = "sinafala/svdComp5q0")

## Acknowledgments

This work was supported in part by Grants R01 HD086227 and R01 HD054511 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). The funder had no part in the design, execution, or interpretation of the work. Tables of regression coefficients were formatted using the LaTeX package *stargazer* (Hlavac 2015).

## Notes

^{1}

The core ideas underlying the Wilmoth model appear in his doctoral dissertation (Wilmoth 1988), with further refinement in the following years, culminating in the English-language summary (Wilmoth 1990).

^{2}

The first left singular vector of the HMD residuals are massaged slightly to ensure all elements of *v* are positive and smooth.

^{3}

If desired, *k* is chosen so that the resulting mortality schedule matches an input value _{45}*q*_{15}.

^{4}

This is the expression used to model the first residual in Wilmoth’s age/period/cohort model, shown in Eq. (2).

^{5}

SVDs are calculated using the *svd* function in the base package of R.

^{6}

This ensures that the whole data cloud is separated from the origin by an amount that is substantially greater than the typical value of each logit-transformed mortality rate, and therefore each age group has roughly equivalent leverage in the optimization required to identify the first new dimension of the SVD. The remaining dimensions are effectively identified on a centered data cloud.

^{7}

The SVD-Comp life tables are constructed using standard procedures in one-year age groups with _{n}*a*_{x} values taken from the HMD life tables. The Log-Quad life tables are constructed using R code provided by Wilmoth et al. (2012) in five-year age groups.

^{8}

For components *i* ∈ {2, 3, 4}, kernel smoother with Gaussian kernel and bandwidth = *i* + 1 for ages *i* and older.

## References

## Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.