Abstract

Population projections provide predictions of future population sizes for an area. Historically, most population projections have been produced using deterministic or scenario-based approaches and have not assessed uncertainty about future population change. Starting in 2015, however, the United Nations (UN) has produced probabilistic population projections for all countries using a Bayesian approach. There is also considerable interest in subnational probabilistic population projections, but the UN's national approach cannot be used directly for this purpose, because within-country correlations in fertility and mortality are generally larger than between-country ones, migration is not constrained in the same way, and there is a need to account for college and other special populations, particularly at the county level. We propose a Bayesian method for producing subnational population projections, including migration and accounting for college populations, by building on but modifying the UN approach. We illustrate our approach by applying it to the counties of Washington State and comparing the results with extant deterministic projections produced by Washington State demographers. Out-of-sample experiments show that our method gives accurate and well-calibrated forecasts and forecast intervals. In most cases, our intervals were narrower than the growth-based intervals issued by the state, particularly for shorter time horizons.

Introduction

Population projections or forecasts1 are used by governments at all levels for planning, by the private sector for strategic decision-making, and by researchers, particularly in the health and social sciences (Chi and Voss 2011; Rayer 2015; Shryock et al. 1976; Smith et al. 2001; Wilson et al. 2022). Until recently, population projections have mostly been produced using deterministic methods, notably the cohort-component method (Preston et al. 2001).

Such methods do not organically produce statements of uncertainty. Uncertainty estimates are often desired to get a general idea of how precise a forecast is, to assess the reality of changes in trends, or to make decisions that involve risks, costs, and benefits, such as whether to close or open schools. Typically, uncertainty has been communicated using deterministic scenarios. These approaches have been widely criticized as lacking a probabilistic basis and for giving results that are implausible over multiple geographic areas or time periods in the future. Instead, it has been recommended to produce probabilistic population projections (Bongaarts and Bulatao 2000; Keyfitz 1972).

The United Nations (UN) is the major organization that regularly produces national population projections for all countries by age and sex (Lutz and KC 2010), generally updated every two years in the World Population Prospects, in light of new data and better methods. The most recent update was completed in 2022 (United Nations 2022a). Since 2015, the UN's official projections have been probabilistic, based on Bayesian hierarchical models for fertility and mortality (Raftery, Alkema et al. 2014; Raftery et al. 2012; United Nations 2022b). While these methods have been extended to include probabilistic migration (Azose and Raftery 2015; Azose et al. 2016), probabilistic migration forecasts have not yet been incorporated into the official UN projections. Furthermore, probabilistic migration methods have not previously been incorporated into any Bayesian models for subnational or small area population forecasts, to our knowledge, and this approach represents one of the important contributions of this study.

Bayesian approaches have several advantages for this purpose. They organically produce uncertainty intervals for any demographic outcome of interest, which are well calibrated if the model is correct (e.g., 95% intervals contain the truth 95% of the time). They can be estimated using Markov chain Monte Carlo (MCMC) methods, which can be implemented even for the relatively large complex models involved, and more easily than non-Bayesian alternatives, such as maximum likelihood estimation. Bayesian approaches also automatically use information from other countries when producing estimates for a country of interest, which can be particularly useful when the information from the country of interest is of poor quality or less informative than that from other, similar countries.

Most of the development of these projection methods has been implemented at the national level. However, there is a considerable need and demand for subnational projections, for example, in the United States at the state and county levels (Rayer 2015; Smith et al. 2001; Wilson et al. 2022). These are needed for planning by local governments and the private sector, as well as for health and social science research on subnational variation and inequality. Even at the international level there is increased interest in subnational outcomes, exemplified by the slogan of the UN's 2030 Sustainable Development Goals—“Leave No One Behind.” While some efforts have been made to produce population projections for subnational geographies, such as U.S. counties (Hauer 2019), census tracts (Baker et al. 2021; Swanson et al. 2010), and Japanese prefectures (Inoue 2017), the methods employed—a cohort change difference model using Leslie matrices, the Hamilton‒Perry method, and a modified Hamilton‒Perry method with smoothing, respectively—are primarily deterministic and do not organically produce measures of uncertainty.

Indeed, the literature in this field indicates a growing need for subnational population forecasting models with uncertainty bounds. However, such forecasts can require extensive expertise, localized knowledge, and detailed data, making probabilistic population projections challenging to produce. Although some attempts have been made (Ševčíková et al. 2015; Wilson et al. 2022), there are few in existence, exceptions being Swanson and Beck (1994), who estimated short-term forecasts for Washington counties (10- and 20-year forecasts), and Swanson and Tayman (2014), who estimated state-level forecasts for four states. In both studies, uncertainties were generated using a least-squares regression estimator.

In this study we seek to extend the UN method for national probabilistic population projections to the subnational level. Initially, it might seem that this could be done simply by applying the existing international framework and treating a country (or state) as if it were the “world” and the states (or counties) as if they were “countries.” However, it turns out that the subnational context is different from the national one in several important respects, requiring substantial changes to the extant framework. These changes are the main focus and contribution of this article.

First, subnational between-area correlations in fertility and mortality tend to be much larger than the corresponding between-country correlations (Ševčíková and Raftery 2021; Ševčíková et al. 2018), and so different models are needed for the subnational context. Second, international migration is constrained to total zero across the globe, whereas subnational migration has no such constraint, given the existence of migration into and out of the overall area (state or county). Third, subpopulations that are largely defined by age, such as college students, can have a large impact within subnational jurisdictions and need to be treated specially, which is usually not necessary for countries as a whole. Finally, some subnational areas can have populations so small that stochastic variation in numbers of vital events, usually ignored in population projections, can have a significant impact on that area’s population forecast.

We propose a method for probabilistic subnational population projections that attempts to resolve these challenging issues. For projecting fertility, we use the method of Ševčíková et al. (2018), while for projecting mortality we use the method of Ševčíková and Raftery (2021). For the probabilistic projection of migration, we adapt the method of Azose and Raftery (2015) and Azose et al. (2016) to avoid the sum-to-zero constraint. Finally, we develop a method to allow for group quarters populations, such as college students, that have a large impact on the population dynamics within particular subnational areas.

To illustrate the method, we apply it to the counties of Washington State. Washington is a fairly typical state in the United States, so we believe that a method that works well for it will also work well for other states, and indeed potentially for subnational units in other countries. For example, the population of Washington State is close to the average population of a U.S. state, while its demography is similar to the demography of the United States as a whole. The average age of both is 38 to the nearest whole year. With 39 counties, Washington has more counties (or county equivalents) than 17 other U.S. states, and so is also not atypical in this regard and is located near the median of that distribution.

Washington has experienced large fluctuations in migration in recent years, in part because of the growth of the technology sector. These fluctuations account for a large proportion of population change in several counties, particularly the most populous ones (Felt 2017; Puget Sound Regional Council 2020; State of Washington Office of Financial Management (OFM) 2022a). These kinds of population dynamics are particularly hard to forecast, and so they also account for a large part of the uncertainty about future population. When we assess the validity of our population forecasts, we include an assessment of the success of the method in forecasting migration, which is in turn critical for success in forecasting population. Our validation assessment is a further contribution of this article, since validation assessments are frequently omitted from published work on subnational population forecasts; for an exception for deterministic subnational projections, see Baker et al. (2021).

Moreover, several counties in Washington have college populations that represent a large proportion of the total, and because of their peaked age profile, ignoring them would lead to large distortions in projected age distributions. We therefore develop a method specifically for college populations, though it can also be used to account for other group quarters populations.

We compare our probabilistic forecasts to Washington State's regularly produced deterministic projections. Required by the Washington State Growth Management Act, the State of Washington OFM produces county-level population projections for three different growth trajectories (low, medium, high) using deterministic approaches. These approaches are based on historic growth trends and assumptions about future growth (OFM 2018). The medium series is considered the most likely future trajectory, while the low and high series correspond to low- and high-growth situations. Even though the low and high trajectories can be thought of as bounds on predicted population growth, they are constructed using assumptions about how the population in each county might change in the future and do not represent statistical uncertainty. Since the OFM variants do not have a straightforward interpretation, it is, therefore, not easy to meaningfully quantify the differences between bounds. For planning purposes, we argue that it may be more helpful to have projections that provide statistically valid bounds on future population and components of population change with given probabilities.

Our article is organized as follows. The next two sections present the methods and data we used to produce probabilistic subnational projections of fertility, life expectancy at birth, migration, and population. It is followed by a description of the results for the main components for the counties of Washington State. Next, we include an out-of-sample validation analysis for both the overall population projections and the migration projections. We conclude with a discussion of the strengths and weaknesses of the approach, as well as suggestions for future research.

Methods

Probabilistic population projections provide statistical uncertainty measures around future population totals. In contrast to deterministic and scenario-based population projections, which can be somewhat subjective as they reflect the opinions of experts on how populations are expected to change, probabilistic projections provide quantitative measures of uncertainty with a statistical interpretation. These uncertainty measures are based on observed data and reflect conditions specific to the unit being examined. At the subnational level, this means that each area (county) has its own uncertainty measures.

The production of fully probabilistic population projections requires probabilistic forecasts of each component of population change—fertility, mortality, and migration. These projected components can then be used to project total population via the cohort-component method (Raftery et al. 2012). Existing probabilistic projections, such as the ones produced by the United Nations, have been produced for countries and various aggregates of multiple countries (Alkema et al. 2015; Raftery, Alkema et al. 2014; Raftery et al. 2012; United Nations 2022a). The UN's probabilistic projections use as input probabilistic projections of fertility (Alkema et al. 2011) and mortality (Raftery et al. 2013). Although the UN's current approach does not include probabilistic projections of migration, Azose and Raftery (2015) and Azose et al. (2016) have developed a Bayesian hierarchical model to predict future migration trends with uncertainty at the country level. A contribution of this study is the integration of the probabilistic migration method into subnational Bayesian population forecasts.

National Projections

To produce subnational probabilistic population projections, we start with the existing national-level forecasting approach, which uses Bayesian hierarchical models for all countries for each of total fertility rate (TFR), life expectancy at birth (e0), and net international migration (Alkema et al. 2011; Azose and Raftery 2015; Azose et al. 2016; Raftery et al. 2013).

Each projection model results in a sample of trajectories, with each trajectory representing a potential future of the respective quantity (Raftery et al. 2012). After disaggregating each trajectory into sex- and age-specific values (Ševčíková et al. 2016), the cohort-component method can be applied to obtain a sample of sex- and age-specific population trajectories. The sample can then be used to obtain any predictive quantile of any demographic quantity of interest, such as its median or the lower and upper bounds of the 80% and 95% probability intervals.

Subnational Projection of TFR and e0

Extensions of the national models for fertility and mortality to the subnational level have been implemented by Ševčíková et al. (2018) and Ševčíková and Raftery (2021), respectively. They found that a direct extension of the national models that treats the country as if it were the world, and the subnational units as if they were countries, did not work well. This is because the model was unable to represent the observed strong within-country correlations between subnational units. These are much larger than the correlations between countries.

Instead, the extensions are based on the observation that both TFR and e0 for subnational units are highly correlated with the national averages, with only slight variations. More specifically, it was found that subnational TFR was well predicted by a model that scales the national predictive distribution with a factor that follows a first-order autoregressive (AR(1)) process. For e0, good subnational predictions were derived using the national prediction shifted by an AR(1) process. The parameters of these AR(1) processes were estimated using subnational units from many countries, including the United States.

Thus, to apply the methods to predict TFR and e0 for Washington counties, the respective AR(1) process is applied to each future U.S. trajectory of the respective quantity. In the case of e0, the method is applied to predict female e0. Then the female‒male gap model of Raftery, Lalic et al. (2014) is applied to predict male e0. This process is implemented in the bayesTFR (Ševčíková et al. 2011) and bayesLife (Ševčíková et al. 2021) R packages, respectively, which makes it easy for practitioners to generate such projections. These methods have the advantage that they are simple and do not require any historical data on subnational units other than for the “present” time point from which the projection starts. The national predictions that are required here can be either generated using the bayesTFR and bayesLife R packages or downloaded from bayespop.csss.washington.edu. However, we found that a modification of the method for TFR is needed to account for college populations, which is described later in this section.

The methodological approach is illustrated in the pink and green boxes in Figure 1. For projecting TFR and e0, we start with the national model and generate trajectories for the United States, which we then use as input to the county-level projections. An alternative would be to base the county-level projections on state-level projections, but we did not find a substantial difference between the two approaches.

Subnational Migration

To produce subnational migration projections, we extend the methodology used to create national migration projections by Azose and Raftery (2015) and Azose et al. (2016) to the subnational context. The input data needed to produce these migration projections are observed net migration rates (NMR), which can be constructed from net migration counts and total population data.

A Bayesian hierarchical autoregressive model is used to predict net migration rates for each subnational unit. In the hierarchical model setting, the historical data from one county impact the estimates for other counties. We did not allow the 12 least populated counties in Washington (populations below 25,000) to impact the hierarchical parameters. This approach yields estimates that are not impacted by the volatility of migration data in small counties. However, the framework still yields county-specific parameters for all counties, regardless of their size. These parameters, which represent the long-term average of NMR and the rate of convergence to the long-term average for each county, are in turn used to create a set of future trajectories of net migration rates. This approach is implemented in the bayesMig R package (Azose et al. 2022).

While TFR and e0 are projected for five-year time intervals, for migration we found that applying the model to annual migration data yielded more accurate predictions. Thus, we first projected migration in one-year time intervals, and then aggregated the results into five-year intervals. However, if annual migration data are not available, the migration model can be applied directly to five-year data, in which case no aggregation is necessary. The yellow boxes in Figure 1 depict the procedure.

Accounting for the College Population

We found that in some counties it was necessary to account specifically for a significant subpopulation with unique population dynamics: in this case, the college student population. The population structure and growth patterns in counties where college students form a substantial proportion of the population are often highly skewed, with disproportionate numbers of individuals in the 15–19 and 20–24 five-year age groups. Moreover, these distinctive population structures are typically stable over time, as college population sizes are regulated by institutional enrollment policies and other factors, including physical capacity and zoning laws. In other words, it is expected that the age distribution of the student population in these “college counties” will remain relatively stable over time. Thus, this subpopulation needs to be separately accounted for in our projection model. This can be done by holding the size of the college population approximately constant over the course of the projection period. For these counties we estimated the subpopulation of college students aged 15–19 and 20–24 by age and sex. This population is excluded from the cohort-component method in the population projection process, but added back into the population forecasts.

College students tend to have low levels of fertility relative to their peers not enrolled in college (OFM 2018). To ensure that our fertility projections are not artificially inflated by the presence of women of reproductive age enrolled in college with low levels of fertility, the historical fertility data were adjusted to reflect the fertility schedule of noncollege females resident in the county. We call this adjusted fertility schedule the noncollege fertility schedule and the resulting total fertility rate the noncollege TFR (see the Data section for the data sources we used for the adjustments). The noncollege TFR was used as the starting point for applying the subnational TFR method described earlier. Thus, the counties' TFR projections represent the predictive distribution of the noncollege TFR.

The college population could be viewed as an instance of a group quarters population. There are other group quarters institutionalized populations, such as those in prisons, on military bases, or in long-term care facilities. We did not find these to have impacts on population projections as large as the college population, so we did not take account of them explicitly. However, this could be done.

Population Projections

Application of the foregoing methods yields a set of future trajectories for noncollege TFR, female and male e0, and net migration rates for every county in Washington State. These trajectories are then converted into sex- and age-specific rates (Ševčíková et al. 2016) using county-level historic data on age-specific fertility and age- and sex-specific mortality rates. To convert the NMR trajectories, we apply a Rogers‒Castro model migration schedule (Rogers and Castro 1981). The cohort-component method is applied to each converted trajectory. For trajectories in counties where there is a substantial college population, the college subpopulation is excluded, with the subpopulation added back after the projection. This yields a set of sex- and age-specific population trajectories over future five-year time intervals for each county, from which uncertainty measures can be derived, such as the 80th and 95th percentiles. This part of the framework is depicted in Figure 1 by the blue box.

Data

The data for evaluating our proposed method come from Washington State, which is located in the Pacific Northwest region of the United States. Approximately 7.8 million people, or roughly 2.3% of the total U.S. population, reside in Washington (OFM 2022a; OFM 2022b; U.S. Census Bureau 2021). There are 39 counties in the state, ranging in total population from 2.3 million (King County, where Seattle is located) to 2,300 (rural Garfield County) (OFM 2022a). In recent years, Washington State has experienced large population growth as a result of sustained in-migration. However, this growth has been unevenly distributed across the state. Many urban areas in the western part of the state have seen rapid population growth, while rural areas in the eastern part of the state have had slow growth or even population decline (OFM 2022a).

For county-level input data (shown in Figure 1 by the shaded parallelograms), we use publicly available data from the Washington State Department of Health (DOH) and the OFM. Specifically, we use DOH's county-level raw vital statistics provided in their Births and Deaths data dashboards (DOH 2021a; DOH 2021b) and OFM's county-level population estimates with age and sex details (OFM 2020). For comparison and validation purposes, we use OFM's county-level population projections with age and sex details from 2012 and 2017 (OFM 2012; OFM 2017b).

To adjust TFR to represent noncollege TFR, as well as to compile the college data set, we use American Community Survey (ACS) data and publicly available institutional enrollment counts from the major public and private colleges and universities in Washington State. We use state-level estimates from the 2015‒2019 five-year ACS and the 2015‒2019 one-year ACS Public Use Microdata Sample data (U.S. Census Bureau 2020a, 2020b) to estimate the following quantities for each county with a sizable college population: the size of the college population aged 15‒19 and 20‒24, the number of mothers among the undergraduate population aged 15‒19 and 20‒24, and the number of their dependent children, by age and sex. See online appendix Table A1 for a list of counties and institutions where we made these adjustments.

Results

We now present our projections from 2020 to 2050 and compare them to the OFM projections produced in 2017 (OFM 2017b). We begin by showing results for total population counts, then net migration counts, followed by the projection of births and an example of age-specific results. For population and migration, we first show the outcomes for Washington State, derived as an aggregation of the county-level projections results. We then illustrate variations in our results across three selected counties—King, Whitman, and Ferry—after which we show projections for all counties across the entire state.

Projected Population Totals

Figure 2 presents the projected total population of Washington State and the selected counties. The OFM deterministic projections are shown in blue and includes the medium projection (solid line) as well as the low and high series (dot-dashed lines). Our probabilistic projections are presented in red, with 80% and 95% prediction intervals surrounding the median projection (solid red line). The intervals, shown as dashed and dotted red lines, respectively, represent the range where the predicted population totals are likely to fall with 80% and 95% probability, respectively. For Washington State, shown in the top left panel, our probabilistic projections yield much narrower 80% and 95% prediction intervals than the bounds from the OFM projections. This is due partly to an implicit correlation of the aggregated low and high OFM variants. They represent trajectories that would happen if the low/high variant would be experienced by all counties in all time periods. In the probabilistic projections, where the aggregation happens over a set of trajectories, the prediction intervals show that such situations are considered to be unlikely. It is more likely that counties over time will experience population that falls somewhere between their respective bounds, resulting in much narrower intervals when aggregated to the statewide projection.

Overall, the total population in Washington State is expected to continue to grow over the course of the projection period. The probabilistic projections predict more growth than the deterministic OFM projections. This may partly reflect the fact that the probabilistic model incorporates observed population data through 2019, while the OFM 2017 projections use observed data through approximately 2017 (OFM 2017a). The period from 2017 to 2019 was one of unusually high in-migration, which was not fully captured in OFM's net migration projections.

In the top right panel of Figure 2 are the results for King County, which is the most populous county in Washington, with almost 2.3 million residents (OFM 2022a). It has experienced rapid population growth in recent years from continued positive net migration driven by the thriving technology and related industries in the Seattle metropolitan area (Felt 2017; Puget Sound Regional Council 2020; OFM 2022a). However, these high levels of positive net migration are not expected to be sustained. Instead, migration is expected to peak and eventually return to a lower, long-term average level (OFM 2018; OFM 2022a). Indeed, all sets of projections for King County forecast slower growth over time. However, as indicated by the growing width of the dotted 95% probabilistic prediction interval over the projection period, the future population size of King County is very uncertain, with possibilities of rapid, increasing growth as well as sluggish growth. It is interesting to note that the dashed 80% prediction interval tracks with the low and high deterministic projections produced by OFM, especially toward the end of the projection period.

The bottom left panel of Figure 2 shows the results for Whitman County; this county is located in the eastern part of the state and has a population of approximately 50,000 (OFM 2022a). The age structure in Whitman County is dominated by the presence of a large, college-age population. As described earlier, we adjusted our models for counties like Whitman to account for these conditions. For all sets of projections, the total population in Whitman County is expected to increase slowly over the course of the projection period. Our forecasts suggest that while there is some uncertainty about future population growth, it will be approaching 60,000 by 2050, with a 95% prediction range of 53,000 to 68,000. In contrast, the OFM deterministic bounds are much wider.

Finally, Ferry County is shown in the lower right panel of Figure 2. It is a landlocked, largely mountainous county in the northeastern part of the state. One of the least densely populated counties in Washington State, Ferry County had just over 7,200 inhabitants in 2021 (OFM 2022a). Given its small population, the uncertainty around our projections is quite large, especially further out on the time horizon. Our median projection indicates a slight upward growth trajectory toward the end of the projection period, while OFM's medium trajectory trends slightly downward. Moreover, our 80% prediction intervals are narrower than OFM's upper and lower bounds. While the upper bound of the 95% interval exceeds OFM's high-trajectory estimates as the projections approach 2050, the lower bound of the 95% interval follows OFM's low-trajectory estimates closely.

Figure 3 shows the predicted total population for all 39 counties in Washington State. The counties are arranged to approximate their geographic location in the state. For most, the probabilistic and deterministic forecasts suggest similar trajectories of population growth. However, the deterministic OFM projections have far wider ranges in projected population sizes than do our forecasts. Our narrower confidence intervals may offer planners and policymakers greater precision in their planning process, especially in the near term. In a few cases the probabilistic intervals are wider, which can be attributed to volatility in past migration and thus significant uncertainty in migration forecasts, often seen in less populated counties.

Projected Net Migration

Net migration has been the primary driver of population growth in Washington State in recent decades. Indeed, approximately 70% of the recent growth in Washington can be attributed to positive net migration, with the remaining 30% arising from natural increase (OFM 2022b).

While migration is expected to remain an important contributor to population growth for the state, the level of in-migration is expected to decline (OFM 2018). However, it is unclear how quickly the migration rate might decline, making deterministic projections of migration levels in Washington State difficult to produce and assess. Nevertheless, such an effort is worthy, given that this kind of challenge around estimating and then integrating net migration rates into probabilistic forecasts has not been done at the subnational level. At the county level, in the U.S. context, this is particularly challenging, since migration flows to and within U.S. states are not evenly distributed. In Washington, most migration in-flows are concentrated in the five largest counties (OFM 2022a). This makes probabilistic county-level migration projections appealing, as they provide county-specific measures of uncertainty.

As discussed in the Methods section, our migration model yields a set of trajectories of future NMR. These are then passed to the cohort-component method, where they are converted to net migration counts. Figure 4 displays results of such projected counts for the state overall and for King, Whitman, and Ferry Counties. In reviewing the results, it should be noted that the OFM migration estimates rely on observed data through 2017, while our forecasts rely on observed data through 2019. The two-year period between 2017 and 2019 was a period of high in-migration, which is reflected in our models but not OFM's.

In the top left panel of Figure 4, all sets of forecasts show that net migration is projected to remain positive for Washington State between 2020 and 2050. OFM does not yet provide upper and lower bound estimates for migration, providing little guidance for planners, whereas our probabilistic projections do provide an assessment of the uncertainty around future migration. Our 80% and 95% prediction intervals are wide, indicating considerable uncertainty about future net migration in Washington State over the projection period. The 95% prediction interval suggests that by 2050, the total net migration per five-year period may range from about 250,000 to 730,000, with a median projection of around 450,000.

In King County (top right panel), the probabilistic median forecast is that net migration would begin tapering off, stabilizing toward a median level of around 90,000 to 100,000 net migrants per five-year period. However, there is considerable uncertainty around this projection, as indicated by the wide 80% and 95% prediction intervals. Notably, the prediction intervals also suggest the possibility of observing negative net migration.

For both Whitman and Ferry Counties (bottom panels), our probabilistic projections suggest that net migration will fluctuate around a long-term average level, which is estimated from each county's historical levels of net migration. In Whitman County, net migration of about 1,200 is expected per five-year period over the projection interval, while net migration of around 300 per five-year period is expected in Ferry County. As the 95% prediction intervals indicate, there is a possibility of negative net migration in these counties. Although there is considerable uncertainty about projected net migration per five-year period, the uncertainty ranges offer additional, quantitative information about how levels of migration might change and affect population growth.

In Figure 5 we illustrate the predicted number of net migrants for all counties in Washington State (arranged in their approximate geographic location). Both the probabilistic and deterministic approaches predict net migration per five-year period to be positive for most counties. All probabilistic projections of net migration involve considerable uncertainty. The state's deterministic approach offers no insight about upper and lower bound estimates, but it is notable that the median probabilistic and medium deterministic projections are similar in magnitude for most counties in Washington.

Births and Age-Specific Projections

Figure 6 presents the projected number of births for all 39 counties in the state. The projected number of births increases for most counties, albeit with considerable uncertainty. This arises from the uncertainty around future fertility rates and numbers of net migrants, as many migrants are of childbearing age.

Our probabilistic population projections include age and sex details with uncertainty bounds for each subgroup. Figure 7 illustrates an example of a probabilistic population pyramid for King County, showing that the population is expected to grow in all age groups from 2020 to 2050, with much of this growth concentrated in the older age groups. However, the younger population is also expected to grow in size. Note that the uncertainty regarding the population in the older age groups is smaller than that regarding the younger ones. This stems from the fact that older individuals have already completed their fertility and most of their migration by 2020 and so are merely being projected forward, whereas for younger individuals, the projected numbers can be highly influenced by changing conditions in fertility and net migration.

In addition to age and sex details for total population, our probabilistic projections also include other indicators, such as sex- and age-specific counts of deaths, mortality rates, and any other population quantities of interest that are functions of age and sex.

Out-of-Sample Validation

To validate our methodology, we conducted several out-of-sample validation exercises. Because migration is in some ways the most important component in the population projection, we validated our methodology for both migration and total population. For comparison, we used the OFM projections published in 2012 (OFM 2012), in which the projection starts in the time period 2015‒2020. However, since OFM used observed data only until 2012, we will treat the time period 2010‒2015 also as projected.

To produce projections using our methodology that is comparable with OFM 2012, we used annual observed data for net migration rate from 1990 to 2012 to train the migration model and to generate annual migration rates from 2013 to 2020. We aggregated each migration trajectory to five-year time periods, as discussed earlier, resulting in two projected time periods, namely, 2010‒2015 and 2015‒2020. For TFR and e0, we reestimated the national as well as the subnational models where post-2010 data were removed and produced projections for 2010‒2015 and 2015‒2020. Using the subnational projections of TFR, e0, and migration for these two time periods, we generated population projections for the years 2015 and 2020. We then compared our projections and the OFM 2012 projections of net migration counts and total population in each county to the actual observed values in these two time periods. The low and high variants of the OFM population projection were also included in the assessment of the predictive performance.

Table 1 presents the results of this validation. It includes the bias and the mean absolute error (MAE), as well as the half-width and the coverage of the 95% prediction interval. For MAE and bias, which compare the projection medians to the actual values, the smaller the better. In both cases, for migration as well as population, the probabilistic approach yields better results on these measures than the OFM projections.

Because the OFM migration projections are deterministic, one cannot examine the half-width and the coverage for them. The coverage of a prediction interval is defined as the proportion of the time that the truth lies in the interval, and an ideal method would match the corresponding nominal percentage to within sampling error. We found that the observed migration counts fell into our 95% prediction interval in 92.3% of the cases, and into our 80% prediction interval (not shown) in 82.1% of the cases. The actual observed population counts fell into our 95% and 80% prediction intervals in 93.6% and 85.9%, respectively, of the cases. Note that the low and high variants of the OFM population projections are not associated with any statistical interpretation and so we would not expect them to match any particular percentages. However, it can be seen from the half-widths and the coverage measures in the table that the spread of the OFM variants is much wider than that of the probabilistic projections. The coverage results suggest that the probabilistic projections are reasonably well calibrated. Figure A1 in the online appendix shows the actual projections from the out-of-sample simulation for four selected counties.

Discussion

We have developed a method for probabilistic subnational projections of population, fertility, mortality, and net migration. To assess the method, we applied it to the counties of Washington State, which is fairly typical of U.S. states in terms of population, demography, and number of counties. This leads us to believe the results may be informative for potential applications to other U.S. states and subnational units.

Our proposed approach extends the national probabilistic population projection method used by the UN, which is based on Bayesian hierarchical models for the components of population change. However, a direct application of the country-level model was not appropriate, and we introduce several innovations. The fertility and mortality models were not direct extensions of the world model, but rather accounted for the high observed within-country correlations by models that scaled the country probabilistic projections in a stochastic way. Our approach adjusts this process by calibrating subnational projections to the national-level rate. Since net migration is the biggest contributor to both population change and uncertainty about future population, we incorporated uncertainty about future net migration into our method, the first time this has been done, to our knowledge. Finally, college populations have a big impact in several counties, and we developed a method for accounting for their potential outsized impact on components of population dynamics. Our approach for this special population subgroup could be extended to other special populations if their population composition suggests that they might distort components of population change.

In an out-of-sample validation study, we found that the resulting probabilistic projections were both reasonably accurate and well calibrated. In terms of point forecasts, they were broadly comparable with those by the OFM. However, our prediction intervals were in most cases much tighter than those of OFM (the latter based on expert assumptions rather than being probabilistic), while still being well calibrated. Since the OFM variants do not have a straightforward interpretation and it is not easy to quantify the significance of those differences, one should exercise some caution in comparing these sets of estimates. Migration is both difficult to forecast and a major contributor to future population and uncertainty and, as noted, our forecasts were well calibrated. Overall, we had higher forecasts of migration than OFM, both in the past and going forward.

Several counties in Washington State have very small populations, and this raised the question of whether the present method, which does not include stochastic variation in the numbers of births, deaths, and net migration, could forecast them adequately, in particular whether uncertainty would be appropriately assessed. Perhaps surprisingly, we found that our method was well calibrated even for these small counties. It would be possible to extend the method to account for this additional source of variability, but this would complicate the method considerably and our results suggest that it may not be necessary.

In our probabilistic projections, we believe that we have accounted for the most important sources of uncertainty. However, other sources of uncertainty are not explicitly accounted for by our method. Perhaps most important of these is uncertainty about current and past values of population by age and sex, fertility, mortality, and migration, which are taken as known and accurate. These other sources of uncertainty would likely be significant for countries without good vital registration systems. However, for countries with longstanding vital registration systems, like the United States, estimates of current fertility and mortality are typically of high quality, and so the associated uncertainty is very small. Estimates of current population are also reliable and accurate, as they are typically based on census data or intercensal estimates that draw on other data sources such as administrative records. Perhaps the most important source of uncertainty in the U.S. context is that concerning current and past migration, since estimates of current and recent migration to and from counties are less precise than those of fertility and mortality. Accounting for this uncertainty would be a goal for future research.

Another important limitation of our method is that, while it produces forecasts of population, births, deaths, and migration by age and sex, it does not yet produce these measures by race and ethnicity (or, in other settings, other significant social categories of difference). Such estimates are important for many policy and planning decisions when decision-makers are seeking to redress inequalities and mitigate disparities. In principle, the generation of such measures could be accomplished by disaggregating the total numbers produced for each trajectory by observed racial and ethnic distributions. Precisely how best to do so is another topic for future research.

Acknowledgments

We thank Mike Mohrman, Erica Gardner, and Robert Kemp from the Washington State Office of Financial Management for helpful discussions about data inputs and methodology. Partial support for this research came from a Shanahan Endowment Fellowship, T32 grant HD101442-01 and grant P2C HD042828 to the Center for Studies in Demography and Ecology at the University of Washington, and grant R01 HD070936 (A. Raftery, P.I.), all from the Eunice Kennedy Shriver National Institute of Child Health and Human Development.

Note

1

We use the terms population projections and population forecasts interchangeably.

References

Alkema, L., Gerland, P., Raftery, A., & Wilmoth, J. (
2015
).
The United Nations probabilistic population projections: An introduction to demographic forecasting with uncertainty
.
Foresight
,
2015
(
37
),
19
24
.
Alkema, L., Raftery, A. E., Gerland, P., Clark, S. J., Pelletier, F., Buettner, T., & Heilig, G. K. (
2011
).
Probabilistic projections of the total fertility rate for all countries
.
Demography
,
48
,
815
839
.
Azose, J., Ševčíková, H., & Raftery, A. (
2022
).
bayesMig: Bayesian projection of migration
(R package version 0.2-3) [Computer software]. Available at https://github.com/PPgp/bayesMig
Azose, J. J., & Raftery, A. E. (
2015
).
Bayesian probabilistic projection of international migration
.
Demography
,
52
,
1627
1650
.
Azose, J. J., Ševčíková, H., & Raftery, A. E. (
2016
).
Probabilistic population projections with migration uncertainty
.
Proceedings of the National Academy of Sciences
,
113
,
6460
6465
.
Baker, J., Swanson, D., & Tayman, J. (
2021
).
The accuracy of Hamilton–Perry population projections for census tracts in the United States
.
Population Research and Policy Review
,
40
,
1341
1354
.
Bongaarts, J., & Bulatao, R. A. (Eds.). (
2000
).
Beyond six billion: Forecasting the world's population
.
Washington, DC
:
National Academy Press
.
Chi, G., & Voss, P. R. (
2011
).
Small-area population forecasting: Borrowing strength across space and time
.
Population, Space and Place
,
17
,
505
520
.
Felt, C. (
2017
).
King County's changing demographics: Investigating our increasing diversity
[PowerPoint slides]. King County Office of Performance, Strategy and Budget. Retrieved from https://kingcounty.gov/~/media/depts/executive/performance-strategy-budget/documents/pdf/RLSJC/2017/Feb23/KingCountyDemographics022317
Hauer, M. E. (
2019
).
Population projections for U.S. counties by age, sex, and race controlled to shared socioeconomic pathway
.
Scientific Data
,
6
,
190005
. https://doi.org/10.1038/sdata.2019.5
Inoue, T. (
2017
).
A new method for estimating small area demographics and its application to long-term population projection
. In Swanson, D. A. (Ed.),
Applied demography series: Vol. 9. The frontiers of applied demography
(pp.
473
489
).
Cham, Switzerland
:
Springer International Publishing
.
Keyfitz, N. (
1972
).
On future population
.
Journal of the American Statistical Association
,
67
,
347
363
.
Lutz, W., & KC, S. (
2010
).
Dimensions of global population projections: What do we know about future population trends and structures?
Philosophical Transactions of the Royal Society B
,
365
,
2779
2791
.
Preston, S. H., Heuveline, P., & Guillot, M. (
2001
).
Demography: Measuring and modeling population processes
.
Oxford, UK
:
Blackwell Publishers
.
Puget Sound Regional Council
. (
2020
).
Population change and migration
(Puget Sound Trends report).
Seattle, WA
:
Puget Sound Regional Council
. Retrieved from https://www.psrc.org/media/7275
Raftery, A. E., Alkema, L., & Gerland, P. (
2014
).
Bayesian population projections for the United Nations
.
Statistical Science
,
29
,
58
68
.
Raftery, A. E., Chunn, J. L., Gerland, P., & Ševčíková, H. (
2013
).
Bayesian probabilistic projections of life expectancy for all countries
.
Demography
,
50
,
777
801
.
Raftery, A. E., Lalic, N., & Gerland, P. (
2014
).
Joint probabilistic projection of female and male life expectancy
.
Demographic Research
,
30
,
795
822
. https://doi.org/10.4054/DemRes.2014.30.27
Raftery, A. E., Li, N., Ševčíková, H., Gerland, P., & Heilig, G. K. (
2012
).
Bayesian probabilistic population projections for all countries
.
Proceedings of the National Academy of Sciences
,
109
,
13915
13921
.
Rayer, S. (
2015
).
Demographic techniques: Small-area estimates and projections
. In Wright, J. D. (Ed.),
International encyclopedia of the social & behavioral sciences
(2nd ed., pp.
162
169
).
Oxford, UK
:
Elsevier Science
.
Rogers, A., & Castro, L. J. (
1981
).
Model migration schedules
(Report No. RR-81-030).
Laxenberg, Austria
:
International Institute for Applied Systems Analysis
.
Ševčíková, H., Alkema, L., and Raftery, A. (
2011
).
bayesTFR: An R package for probabilistic projections of the total fertility rate
.
Journal of Statistical Software
,
43
(
1
),
1
29
.
Ševčíková, H., Li, N., Kantorová, V., Gerland, P., & Raftery, A. E. (
2016
).
Age-specific mortality and fertility rates for probabilistic population projections
. In Schoen, R. (Ed.),
The Springer series on demographic methods and population analysis: Vol. 39. Dynamic demographic analysis
(pp.
285
310
).
Cham, Switzerland
:
Springer International Publishing
.
Ševčíková, H., & Raftery, A. E. (
2021
).
Probabilistic projection of subnational life expectancy
.
Journal of Official Statistics
,
37
,
591
610
.
Ševčíková, H., Raftery, A., & Chunn, J. (
2021
).
bayesLife: Bayesian projection of life expectancy
(R package version 5.0-3) [Computer software]. https://CRAN.R-project.org/package=bayesLife
Ševčíková, H., Raftery, A. E., & Gerland, P. (
2018
).
Probabilistic projection of subnational total fertility rates
.
Demographic Research
,
38
,
1843
1884
. https://doi.org/10.4054/DemRes.2018.38.60
Ševčíková, H., Simonson, M., & Jensen, M. (
2015
).
Assessing and integrating uncertainty into land-use forecasting
.
Journal of Transport and Land Use, 8(3)
,
57
70
.
Shryock, H. S., Siegel, J. S., & Stockwell, E. G. (
1976
).
The methods and materials of demography
(Condensed ed.).
New York, NY
:
Academic Press
.
Smith, S. K., Tayman, J., & Swanson, D. A. (
2001
).
State and local population projections: Methodology and analysis
.
New York, NY
:
Kluwer Academic/Plenum Publishers
.
State of Washington Office of Financial Management
. (
2012
).
2012 projections: Growth Management Act population projections for counties: 2010 to 2040
[Data set].
Olympia
:
State of Washington, Office of Financial Management, Forecasting & Research Division
. Retrieved from https://ofm.wa.gov/washington-data-research/population-demographics/population-forecasts-and-projections/growth-management-act-county-projections/growth-management-act-population-projections-counties-2010-2040
State of Washington Office of Financial Management
. (
2017a
).
2017 projections: Growth Management Act population projections for counties: 2010 to 2040
[Data set].
Olympia
:
State of Washington, Office of Financial Management, Forecasting & Research Division
. Retrieved from https://ofm.wa.gov/washington-data-research/population-demographics/population-forecasts-and-projections/growth-management-act-county-projections/growth-management-act-population-projections-counties-2010-2040-0
State of Washington Office of Financial Management
. (
2017b
).
2017 Growth Management Act county population projections
(Technical documentation).
Olympia
:
State of Washington, Office of Financial Management, Forecasting & Research Division
. Retrieved from https://ofm.wa.gov/sites/default/files/public/dataresearch/pop/GMA/projections17/gma_2017_tech_doc.pdf
State of Washington Office of Financial Management
. (
2018
).
2017 projections: County growth management population projections by age and sex: 2010–40
(Report).
Olympia
:
State of Washington, Office of Financial Management, Forecasting & Research Division
. Retrieved from https://ofm.wa.gov/sites/default/files/public/dataresearch/pop/GMA/projections17/GMA_2017_county_pop_projections.pdf
State of Washington Office of Financial Management
. (
2020
).
Components of population change
[Data set].
Olympia
:
State of Washington, Office of Financial Management, Forecasting & Research Division
. Retrieved from https://ofm.wa.gov/washington-data-research/population-demographics/population-estimates/components-population-change
State of Washington Office of Financial Management
. (
2022a
).
State of Washington: 2022 population trends
(Report).
Olympia
:
State of Washington, Office of Financial Management, Forecasting & Research Division
. Retrieved from https://ofm.wa.gov/sites/default/files/public/dataresearch/pop/april1/ofm_april1_poptrends.pdf
State of Washington Office of Financial Management
. (
2022b
, June 29).
Washington tops 7.8 million residents in 2022
.
Olympia
:
State of Washington, Office of Financial Management, Forecasting & Research Division
. Retrieved from https://ofm.wa.gov/about/news/2022/06/washington-tops-78-million-residents-2022
Swanson, D. A., & Beck, D. M. (
1994
).
A new short-term county population projection method
.
Journal of Economic and Social Measurement
,
20
,
25
50
.
Swanson, D. A., Schlottmann, A., & Schmidt, B. (
2010
).
Forecasting the population of census tracts by age and sex: An example of the Hamilton–Perry method in action
.
Population Research and Policy Review
,
29
,
47
63
.
Swanson, D. A., & Tayman, J. (
2014
).
Measuring uncertainty in population forecasts: A new approach
. In Marsili, M. & Capacci, G. (Eds.),
Proceedings of the sixth Eurostat/UNECE work session on demographic projections
(pp.
203
215
).
Rome, Italy
:
Italian National Institute of Statistics
.
United Nations
. (
2022a
).
World population prospects 2022
(Report No. UN DESA/POP/2022/DC).
New York, NY
:
United Nations, Department of Economic and Social Affairs, Population Division
. Retrieved from https://population.un.org/wpp/
United Nations
. (
2022b
).
World population prospects 2022: Methodology of the United Nations population estimates and projections
(Report No. UN DESA/POP/2022/DC/NO.6).
New York, NY
:
United Nations, Department of Economic and Social Affairs, Population Division
. Retrieved from https://population.un.org/wpp/Publications/Files/WPP2022_Methodology.pdf
U.S. Census Bureau
. (
2020a
).
American Community Survey table B14004: Sex by college or graduate school enrollment by type of school by age for the population 15 years and over
[Data set]. Retrieved from https://data.census.gov/table?q=B14004&g=050XX00US53033,53037,53053,53063,53071,53073,53075&tid=ACSDT5Y2019.B14004
U.S. Census Bureau
. (
2020b
).
American Community Survey: Public use microdata sample (PUMS)
[Data set]. Retrieved from https://www.census.gov/programs-surveys/acs/microdata.html
U.S. Census Bureau
. (
2021
).
State population totals and components of change: 2020–2021
[Data set]. Retrieved from https://www.census.gov/data/tables/time-series/demo/popest/2020s-state-total.html
Washington State Department of Health
. (
2021a
).
All births dashboard - County
[Data set].
Olympia
:
Washington State Department of Health, Center for Health Statistics
. Retrieved from https://doh.wa.gov/data-statistical-reports/washington-tracking-network-wtn/birth-outcomes/county-all-births-dashboard-0
Washington State Department of Health
. (
2021b
).
All deaths - County and state dashboards
[Data set].
Olympia
:
Washington State Department of Health, Center for Health Statistics
. Retrieved from https://doh.wa.gov/data-statistical-reports/washington-tracking-network-wtn/death/county-all-deaths-dashboard
Wilson, T., Grossman, I., Alexander, M., Rees, P., & Temple, J. (
2022
).
Methods for small area population forecasts: State-of-the-art and research needs
.
Population Research and Policy Review
,
41
,
865
898
.
This is an open access article distributed under the terms of a Creative Commons license (CC BY-NC-ND 4.0).

Supplementary data