## Abstract

Reliable subnational mortality estimates are essential in the study of health inequalities within a country. One of the difficulties in producing such estimates is the presence of small populations among which the stochastic variation in death counts is relatively high, and thus the underlying mortality levels are unclear. We present a Bayesian hierarchical model to estimate mortality at the subnational level. The model builds on characteristic age patterns in mortality curves, which are constructed using principal components from a set of reference mortality curves. Information on mortality rates are pooled across geographic space and are smoothed over time. Testing of the model shows reasonable estimates and uncertainty levels when it is applied both to simulated data that mimic U.S. counties and to real data for French *départements*. The model estimates have direct applications to the study of subregional health patterns and disparities.

## Introduction

To effectively study health disparities within a country, one must obtain reliable subnational mortality estimates to quantify geographic differences accurately. There is a large demand for estimates of small-area mortality as indicators of overall health and well-being as well as for natural experiments that exploit policy changes at local levels. Reliable mortality estimates for regional populations could help researchers to better understand how place of residence and communities can affect health status through both compositional and contextual mechanisms (Macintyre et al. 2002).

One of the difficulties in producing mortality estimates for subnational areas is the presence of small populations in which the stochastic variation in death counts is relatively high. For example, 10 % (approximately 300) of U.S. counties have populations of less than 5,000, and 1 % of counties have populations less than 1,000 (U.S. Census Bureau 2015). The resulting mortality rates in small areas are often highly erratic and may have zero death counts, meaning that the underlying true mortality schedules are unclear.

The aim of this study is to formulate a model for estimating subnational mortality rates across geographic areas with a wide variety of population sizes and death counts. Resulting estimates would be useful for guiding future policy efforts to improve the health of populations and to investigate the historical effect of public health interventions and changes in the structure of local health programs. In this article, we focus on developing the methodology to produce age- and sex-specific mortality rates. We test our approach on simulated data that mimic U.S. counties and on real data for French *départements*. However, the model is flexible enough to be used in a wide range of situations.

*y*

_{a,x}in a population in area

*a*at age

*x*are Poisson-distributed:

*y*

_{a,x}

*~*Poisson(

*P*

_{a,x}·

*m*

_{a,x}), where

*P*

_{a,x}is the population at risk, and

*m*

_{a,x}is the mortality rate. The maximum likelihood estimate of the mortality rate for area

*a*at age

*x*is

This approach essentially involves estimating as many fixed-effect parameters *m*_{a,x} as there are data points. In addition, this estimation process makes no reference to or use of the information about mortality rates at other ages, in other areas, or at other time points. Confidence intervals can be derived based on the distribution of deaths. For small populations, though, stochastic variation is high, leading to large confidence intervals and standard errors. Estimating mortality rates in small populations therefore requires different types of approaches.

To avoid issues that arise in small-area mortality estimation, a common approach is to aggregate mortality data across multiple years or across space to form larger geographic areas. For example, recent work by Chetty et al. (2016) and Currie and Schwandt (2016) looked at mortality inequalities in the United States using deaths and income measures at the county level, but either data are aggregated across space and time (Currie and Schwandt 2016) or results are not published for smaller populations (in the case of Chetty et al. (2016), where the minimum population size is 20,000). However, given the information lost in aggregating data from smaller areas, there is value in employing other techniques to infer mortality levels and trends.

One option that has been employed is to treat each small population as a stand-alone population and model accordingly using traditional model life table approaches. For example, Bravo and Malta (2010) outlined an approach for estimating life tables in small populations, which they applied to regional areas of Portugal. They estimated Gompertz-Makeham functions via generalized linear models, with an adjustment at older ages. Another approach by Jarner and Kryger (2011) involved estimating old-age mortality in small populations by first estimating parameters of a frailty model using a larger reference population. However, approaches that treat small populations separately do not account for the likely relationships between the regional population estimates or patterns over time. Other approaches in the United States have used county-level covariates, such as socioeconomic status and education to predict county-level life expectancy (Dwyer-Lindgren et al. 2016; Ezzati et al. 2008; Kindig and Cheng 2013; Kulkarni et al. 2011; Srebotnjak et al. 2010). However, issues arise with using the resulting estimates to infer relationships among health, poverty rates, and education without endogeneity concerns.

In this article, we propose a new model that relies on a Bayesian hierarchical framework, allowing information on mortality to be shared across time and through space. This approach helps to inform the mortality patterns in smaller geographic areas, for which uncertainty around the data is high. The modeling process produces uncertainty intervals around the mortality estimates, which can then be translated into uncertainty around other life table quantities (for example, life expectancy). In addition to producing uncertainty intervals around the final estimates, the modeling process also involves the estimation of other meaningful variance parameters that may relate to variation in mortality within, or across, states. A Bayesian hierarchical approach has been used previously in estimating mortality in small populations (see, e.g., Congdon 2014). However, our modeling approach is novel in that it combines demographic knowledge about regularities in age-specific mortality with the flexibility of a Bayesian hierarchical structure.

## Method

We propose a model with an underlying functional form that captures regularities in age patterns in mortality. We then build on this functional form within a Bayesian hierarchical framework, penalizing departures from the characteristic shapes across age as well as sharing information across geographic areas and ensuring a relatively smooth trend in mortality over time.

Bayesian hierarchical models have been used in a wide range of demographic applications. For example, Bayesian hierarchical models are used by the United Nations Population Division to produce probabilistic projections of population and life expectancy (Raftery et al. 2012, 2013). Alkema and New (2014) developed a Bayesian hierarchical model for estimating country-specific under-5 mortality rates. Other examples can be found in the study of mortality, fertility, and migration (e.g., Alkema et al. 2012; Bijak 2008; Congdon 2009; King and Soneji 2011; Sharrow et al. 2013). Our approach has similarities to applications by authors in cause-of-death mortality estimation (Girosi and King 2008) and cohort fertility projection (Schmertmann et al. 2014) but with a focus on addressing small-area estimation issues rather than forecasting.

### Model Set-up

*y*

_{x,a,t}be the deaths at age

*x*in area

*a*at time

*t*. We assume that deaths are Poisson-distributed:

*m*

_{x,a,t}is the mortality rate at age

*x*, area

*a*, and time

*t*; and

*P*

_{x,a,t}is the population at risk at age

*x*, area

*a*, and time

*t*. We estimate mortality rates at ages 0, 1, 5, and then in five-year intervals up to 85+.

*m*

_{x,a,t}are modeled on the log scale as follows:

*Y*

_{px}is the

*p*th principal component of some set of standard mortality curves, and

*u*

_{x,a,t}is a random effect. The use of principal components has similarities with the Lee-Carter approach (Lee and Carter 1992). Principal components create an underlying structure of the model in which regularities in age patterns of human mortality can be expressed. Many different kinds of mortality curve shapes can be expressed as a combination of the components. Incorporating more than one principal component allows for greater flexibility in the underlying shape of the mortality age schedule.

**X**be a

*N*×

*G*matrix of log mortality rates, where

*N*is the number of observations, and

*G*is the number of age groups. In the U.S. states case, we have

*N*= 50 × 31 = 1,550 observations of

*G*= 19 age groups. The SVD of

**X**is

**U**is a

*N*×

*N*matrix,

**D**is a

*N*×

*G*matrix, and

**V**is a

*G*×

*G*matrix. The first three columns of

**V**(the first three right-singular values of

**X**) are

*Y*

_{1x},

*Y*

_{2x}, and

*Y*

_{3x}.

^{1}

The first three principal components for U.S. state mortality curves from 1980–2010 are shown in Fig. 1.^{2} They are based on mortality curves on the log scale, three of which are shown in the top-left graph: Florida in 1980, Hawaii in 1991, and California in 2010. In total, there are 50 *×* 31 = 1,550 of these curves, from which the principal components are derived. Broadly, the first principal component describes the overall mortality curve. The second principal component allows mortality at younger ages to be higher in relation to adult mortality. The third principal component allows adult mortality to be higher in relation to mortality at young and old ages. For example, in some regions of a country, child mortality might be relatively higher than the baseline schedule. In other regions, where prevalence of deaths due to accidents is higher, adult mortality would be higher than the baseline pattern.

The components capture overall patterns of mortality well; a wide range of mortality curves can be expressed as a linear combination of these three components. It is possible to include more or less than three principal components in the model for log(*m*_{x,a,t}). The more principal components used, the more flexible the fit. However, after experimenting with a wide range of standard mortality curves, we found that higher-order principal components generally did not display any regular patterns across age but instead picked up on residual variance in the data set, which has limited use for modeling purposes. Including the first three principal components in the model allows for a reasonably flexible fit while including only components that have some demographic interpretation.

*u*

_{x,a,t}, in the expression for log(

*m*

_{x,a,t}) accounts for potential overdispersion of deaths: that is, the case in which the variance in deaths is greater than the mean, which otherwise would not be expected given the assumption of Poisson-distributed deaths (Congdon 2009). These random effects are assumed to be centered at 0, with an associated variance:

### Pooling Information Across Geographic Area

_{p,a,t}for a particular area are drawn from a common distribution centered on a state (or country) mean:

*p*indicates the principal component (

*p*= 1, 2, 3). Larger areas in terms of population size (and death counts) have a bigger effect on the overall means. The fewer data available on deaths in an area, the closer the parameter estimates are to the mean parameter value. In this way, mortality patterns in smaller areas are partially informed by mortality patterns in larger areas. At the same time, mortality patterns in larger areas borrow little information from the pooling process and are largely determined by their own observed death counts.

The influence of the geographic pooling is illustrated in Fig. 2. The charts illustrate observed, true, and fitted log mortality rates for a hypothetical county with a population of 5,000 males. The dashed line is the true underlying log mortality rate. The circles represent the observed log mortality rates; these were simulated from the true rates using Eq. (2). Gaps represent an observed death count of 0. The solid line and associated shaded area are the fit and 95 % credible intervals, respectively. The graph on the left shows a fit without geographic pooling. The graph on the right shows a fit with geographic pooling and has an additional mean line derived from the mean parameters, $\mu \beta p,t$, which are informed by all counties in the state. The effect of pooling is seen most in log mortality rates at younger ages, among which rates are low. Given that many of the younger age groups have observed 0 death counts, the unpooled model estimates log mortality rates that are much lower than the true rates. The pooled model estimate is pulled up closer to the mean estimate, benefiting from information on young-age mortality from other counties.

In practice, the mean parameter values $\mu \beta p,t$ could be determined from any plausible group of areas that share similar characteristics. We tested the model using state-level mean parameter values, but other options include grouping areas by a smaller location scale, by rural/urban area within state, or by a common age distribution.

### Smoothing Across Time

_{p,a,t}, change gradually and in a relatively regular pattern over time. We impose this smoothing by penalizing the second-order differences across time in the mean parameters:

*p*= 1, 2, 3. This setup is penalizing differences from a linear trend in the mean parameters. Smoothing the mean parameters, rather than the actual parameters, β

_{p,a,t}, still allows for mortality trends to depart from a smooth trajectory if suggested by the data. For example, if a particular area suffered from an influenza outbreak—thus making mortality higher than in previous years—the β

_{p,a,t}terms would allow for higher mortality.

The effect of smoothing parameters over time is shown in Fig. 3. The graph shows the estimated median value of the parameter $\mu \beta 1,t$ over 31 years in a simulated U.S. county model. The solid line shows the estimates without smoothing, and the dashed line shows the effect of smoothing.

### Adding Constraints to the Model for Total Areas

*A*is the total number of areas.

We assume that the deaths *y*_{x,a,t} are conditionally independent; that is, given values of *P*_{x,a,t} and *m*_{x,a,t}, the *y*_{x,a,t} are independent, and thus the sum is also Poisson-distributed. In practice, the added constraint makes little difference to the estimates for any specific area. However, it ensures consistency between regional and national estimates. We believe that this is an important feature of the model that is relevant for most applications. Figure 4 shows estimates of life expectancy at birth for females for each French *département* in 2008. (Data are described in the upcoming section, Application to French *Départements*.) The resulting estimates show only negligible differences between the constrained and unconstrained models.

### Priors and Implementation

The model was fitted in a Bayesian framework using the statistical software R. Samples were taken from the posterior distributions of the parameters via a Markov Chain Monte Carlo (MCMC) algorithm. This was performed using JAGS software (Plummer 2003). Standard diagnostic checks using trace plots and the Gelman and Rubin diagnostic (Gelman and Rubin 1992) were used to check convergence.

### Model Summary

*m*

_{x,a,t}). The second level (Eqs. (14) and (15)) specifies distributions for the parameters in the first level (β

_{p,a,t}and

*u*

_{x,a,t}). Finally, the third level (Eqs. (16)–(19)) specifies the distribution for the parameters in the second level ($\mu \beta p,t$ and the variance terms).

## Results

### Simulated Data

*l*

_{x}is the survivorship at age

*x*in the county, and

*Y*

_{x}is the survivorship at age

*x*in the state of interest. To alter the shape of the survivorship curve for a particular county, the values of α and β were changed. The values of α and β were chosen randomly from the ranges [

*−*0

*.*75, 0

*.*75] and [0

*.*7, 1

*.*3], respectively, given that these ranges translate to a reasonable range of commonly observed age-specific mortality curves. We converted the survivorship rates to mortality rates using standard life table relationships.

After we obtained the true mortality rate schedules, we simulated deaths according to the relationship shown in Eq. (2). We tested a range of population sizes, with the minimum county size being 1,000 people of a particular sex. At this small population size, many simulated death counts for particular age groups are equal to 0.

Figure 5 shows the true, simulated (observed) data and estimated mortality rates on the log scale in three hypothetical counties that were in the same state but had different population sizes. The points show the observed data, which are simulated from the true underlying mortality rate, shown by the dashed line. For the smallest county, which has 1,000 people, many of the observed death counts are 0, so the data do not show up on the log scale. The solid line shows the estimated log mortality rates, and the corresponding shaded area shows the 95 % credible intervals. As the size of the county increases, the mortality pattern in the observed data becomes more regular. As such, the uncertainty around the estimates decreases with increased population size.

#### Evaluation of Model Performance

*x*, $mx\u2217$ is the true mortality rate, and

*G*is the number of age groups.

Table 1 shows the average RMSE for the three fits to a simulated data set containing 60 counties over 31 years (12 counties per size group). In all cases, the RMSE decreases as county size increases. This is intuitive because as the county population increases, there are fewer 0 death counts and thus more information about the shape of the mortality curve. The average RMSE for the model is always lower than Loess or Brass, irrespective of county size. Although the Brass RMSE seems reasonable, it is most likely because the data were generated using a Brass relational model.

*G*is the number of age groups; $mx\u2217$ is the true mortality rate for the

*x*th age group; and

*l*

_{x}and

*r*

_{x}are, respectively, the lower and upper bounds of the credible intervals for the

*x*th age group. Coverage at the 80 %, 90 %, and 95 % levels was considered. Table 2 shows the average coverage for the proposed model, fit to 60 counties over 31 years. In general, actual coverage levels are close to the nominal level, indicating that the model is well calibrated. The coverage level tends to decrease as county size increases.

### Application to French *Départements*

We also tested the model on real mortality data, applied to death and population counts by sex in French *départements* from the period 1975–2008.^{3} The annual life tables were constructed by the *Division des statistiques régionales, locales et urbaines* (Regional, Local and Urban Statistics Division) of the French *Institut National de la Statistique et des Études Économiques* (National Institute for Statistics and Economic Studies (INSEE)). The life tables were built from the vital statistics and census data also collected and processed by INSEE. There are 96 mainland French *départements* ranging in population size from approximately 35,000 to 1.5 million (for one sex).

We used national France mortality curves from 1975–2008 to form the set of principal components used in estimation. The model was applied to both sexes simultaneously. For illustration, Figs. 6 and 7, respectively, show the observed and estimated log mortality rates for males in the departments Lozère and Somme in 1975 and 2008. Lozère has a male population of approximately 38,000, and Somme has a male population of approximately 275,000. For both *départements*, mortality rates decreased from 1975 to 2008. As a consequence, more 0 death counts are observed for 2008 than for 1975, corresponding to more uncertainty around estimates in the more recent year. Additionally, there is less uncertainty around the Somme estimates because the population size is approximately seven times the population in Lozère.

After age-specific mortality rates are estimated, other mortality measures and associated uncertainty can be calculated. Figure 8 shows estimates of life expectancy at birth (*e*_{0}) for males in 2008 by *départements*. Life expectancy is estimated to be highest in areas around Paris and for the Midi-Pyrenees area and lowest in the northern part of the country—a well-documented pattern (Barbieri 2013).

Uncertainty in life expectancy estimates is also easily obtained. Life expectancy is calculated for each of the posterior samples of age-specific mortality rates. A 95 % uncertainty interval (UI) is then obtained by calculating the 2.5th and 97.5th percentiles. For example, the estimate for males’ life expectancy at birth in Paris in 2008 is 80.5 years (95 % UI: [79.4, 81.7]). For a smaller *département*, such as Lozère, the estimate is 79.1 years (95 % UI: [77.2, 80.8]). Complete results for all French *départements* are available online.^{4} Life expectancy estimates produced by our method are reasonably close to estimates produced by Institute National d’Études Démographiques (INED) (Barbieri 2013), with the average difference in life expectancy at birth across areas and years being approximately 0.5 years.

Another aspect of the results that could be of interest are the estimated variance parameters. It is assumed that the β_{p,t} parameters are normally distributed with mean μ_{p,t} and variance $\sigma \beta p,t2$. The variance terms may tell us something about how the spread of mortality outcomes is changing over time. Figure 9 shows the median and 95 % credible intervals of the standard deviation parameters associated with β_{1} and β_{2} over the period of estimation. Although there is no discernible trend in $\sigma \beta 1,t$, the parameter for the β_{2} term, $\sigma \beta 2,t$, appears to be increasing over time. The β_{2} term is related to the principal component that alters the magnitude of infant and child mortality compared with mortality at older ages (see Fig. 1).

An increase in the variance parameter suggests that *départements* are diverging more over time with respect to child mortality versus older mortality. Figure 10 illustrates two *départements* that have relatively high and low values of β_{2}. For Tarn, β_{2} is high, which results from infant mortality being relatively low compared with adult mortality. For Seine-Saint-Denis, the opposite is true.

## Conclusion

We presented a novel method to estimate mortality rates by age and sex at the subnational level. In our approach, we built on characteristic age patterns in mortality curves, pooling information across geographic space and smoothing over time within the framework of a Bayesian hierarchical model. When tested against simulated data, the model outperformed estimates from a simple Loess smoother and Brass model, especially for areas with smaller populations. The uncertainty in our estimates, reflected in the confidence intervals, is well calibrated. An application to real data for France illustrates how various parameter estimates from the model help to assess trends in overall mortality levels and inequalities within a country. The estimates produced by the model have direct applications to the study of subregional health patterns and disparities and how these evolve over time.

The model we outlined is proposed as a general framework for estimating mortality rates in subpopulations. The user can easily alter the framework to best suit the situation in which rates are being estimated. For example, the mean parameter values $\mu \beta p,t$ need not be defined on a state/county basis but may instead be defined by a smaller geographic area or based on some other characteristics, such as age distribution or rurality of an area. Additionally, it is possible to alter the random effects to be spatially structured, assuming some correlation in random effects by distance or of adjacent areas.

The focus of this project has been on estimation of past and present mortality trends rather than future ones. However, forecasting of age- and sex-specific mortality rates in a particular area can be obtained directly from model outputs. The mean parameters $\mu \beta p,t$ can be projected forward given the assumed linear time trend, which forms a basis to infer other parameters for areas of interest. Given that the relevant variance parameters are also estimated in the process, uncertainty around forecasts can also be inferred.

One of the contributions of our work is methodological. Estimates of mortality measures for small areas, most notably life expectancy, have been proposed previously. For example, Ezzati et al. (2008) used information about number of deaths, together with covariates related to socioeconomic status, to estimate mortality rates. Congdon (2014) developed a random-effects model to estimate life expectancy for subnational areas. Our method builds on some elements presented in the literature while incorporating demographic knowledge about regularities in the Lexis surface of mortality rates by age and over time. More specifically, we supplemented a Bayesian hierarchical modeling framework to pool information across space and time, with a classic demographic approach to borrow information across age groups. In particular, our use of principal components of schedules of log mortality rates was informed by a long tradition of demographic modeling of mortality and can be considered an extension of the Lee-Carter approach (Lee and Carter 1992).

In this article, we developed a general approach to complement classic demographic modeling ideas within a solid statistical framework. The model that we proposed and tested is quite minimalistic, relying on fairly simple rules for pooling across space and smoothing over time. However, the geographic scale at which spatial pooling will be implemented may depend on specific circumstances of different countries. Likewise, the type of time smoothing may vary. A number of rules can be accommodated within the framework that we propose.

## Acknowledgments

This study was made possible by an award from the Center on the Economics and Demography of Aging (National Institute on Aging grant P30-AG012839) at the University of California, Berkeley. However, any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors alone and do not necessarily represent the official views of the National Institute on Aging. The authors would like to thank two anonymous reviewers for helpful comments on an earlier version of this manuscript.

## Notes

^{1}

Throughout this article, we refer to the *Y*_{px} as principal components for simplicity, even though technically *Y*_{px} is the *p*th vector of principal component loadings.

^{2}

Data on age-specific mortality rates are available through the Centers for Disease Control and Prevention (CDC) Wide-ranging Online Data for Epidemiologic Research (WONDER) tool (https://wonder.cdc.gov/).

^{3}

Personal communication to Magali Barbieri by the *Division des statistiques régionales, locales et urbaines*, INSEE (May 28, 2013). We chose to use French *départements* data because at the time of writing, data for all U.S. counties were not readily available.

## References

*département*