Abstract
Empirical tests of migration systems theory require consistent and complete data on international migration flows. Publicly available data, however, represent an inconsistent and incomplete set of measurements obtained from a variety of national data collection systems. We overcome these obstacles by standardizing the available migration reports of sending and receiving countries in the European Union and Norway each year from 2003–2007 and by estimating the remaining missing flows. The resulting harmonized estimates are then used to test migration systems theory. First, locating thresholds in the size of flows over time, we identify three migration systems within the European Union and Norway. Second, examining the key determinants of flows with respect to the predictions of migration systems theory, our results highlight the importance of shared experiences of nation-state formation, geography, and accession status in the European Union. Our findings lend support to migration systems theory and demonstrate that knowledge of migration systems may improve the accuracy of migration forecasts toward managing the impacts of migration as a source of social change in Europe.
Introduction
Migration systems theory (MST) situates international migration against a backdrop of the ties shared between sending and receiving countries (Kritz et al. 1992; Mabogunje 1970). It is a theoretically encompassing perspective (Massey et al. 1998), but efforts to substantiate MST empirically are few. If “international migration were perfectly measureable, migration systems might be identified by examining the matrices of in-flows, out-flows, and net-flows between all countries as they evolved through time” (Zlotnik 1992:20). Publicly available migration data, however, lack a consistent metric given diverse national data collection systems and timing criteria used to validate migrations.
Discrepancies between the migration reports of sending and receiving countries are well documented (Bilsborrow et al. 1997; Kupiszewska and Nowok 2008; Lemaitre 2005; Poulain et al. 2006). For example, Sweden and the United Kingdom (UK) each use a one-year timing criterion to validate the migrations of nationals and foreigners, but rely on different data collection systems (a population register and a passenger survey, respectively). Other countries (e.g., Romania) count only permanent moves as migrations and use separate registers for nationals and foreigners, tracking only their respective emigrations and immigrations. These sorts of problems render publicly available migration data inconsistent and the identification of migration systems tenuous (Zlotnik 1992).
Funded by Eurostat and coordinated by the Netherlands Interdisciplinary Demographic Institute, the MIgration MOdeling for Statistical Analysis (MIMOSA) project addressed these issues by standardizing the available migration reports of sending and receiving countries and estimating missing flows for countries neither collecting nor providing these data to Eurostat.1 Harmonized estimates were developed for flows among 31 countries in the European Union (EU) and European Free Trade Association (EFTA) each year from 2002 to 2007 using an optimization procedure (de Beer et al. 2010; Raymer et al. 2011).2
In the early stages of the MIMOSA project, one method of standardization considered involved ranking countries by the quality of their migration data, and calculating country-specific immigration and emigration adjustment ratios to scale the reports of countries with less reliable data (van der Erf and van der Gaag 2007). Missing flows were then estimated in a regression framework using covariate information (Raymer and Abel 2008). Although the MIMOSA project ultimately settled on an optimization procedure for standardization, the strategy for estimating missing flows remained the same (Raymer et al. 2011).
In this article, our methodological contribution is to extend the method proposed by van der Erf and van der Gaag (2007). Because relative data quality cannot be fully known a priori (Poulain et al. 2006), our approach to standardizing the available migration reports of sending and receiving countries incorporates uncertainty with respect to country rank. We then estimate the remaining missing flows using a technique new to this area of research, the k-nearest neighbor (kNN) algorithm. These two steps yield harmonized estimates of flows among EU-27 countries and Norway each year from 2003 to 2007.
Our empirical contribution uses these harmonized estimates to test MST. Broadly defined as “a group of countries that exchange relatively large numbers of migrants” (Kritz and Zlotnik 1992:2), migration systems have typically been identified through two approaches. The first attempts to locate thresholds in the size of flows, where “any submatrix whose entries remained above the threshold during five or ten years would indicate the potential existence of a system” (Zlotnik 1992:20). The second attempts to isolate the key ties shared by sending and receiving countries that influence the size of flows (Boyd 1989). With respect to the latter, the MIMOSA estimates are potentially unsuited for this task because the project relied on covariate information, which overlaps the indicators required to test MST (Fawcett 1989; Zlotnik 1992). Our use of the kNN algorithm overcomes this issue, estimating missing flows exclusively from the standardized estimates developed in the first step of the harmonization process.
International migration has captured the attention of analysts and policy makers on topics ranging from population aging and public pensions (Bongaarts 2004) to labor market pressures associated with EU expansion into Central and Eastern Europe (Bauer and Zimmermann 1999). Knowledge of migration systems can help unpack these issues by improving migration forecasts, thereby enhancing the ability to determine the impact of migration as a catalyst for social change (Bijak 2006).
Background
MST is an encompassing perspective, combining elements of neoclassical economics, the new economics of migration, world systems theory, bifurcated labor market theory, and social capital theory (Jennissen 2004). Viewed by Massey et al. (1998) as international labor markets, migration systems are characterized by the unique set of ties shared by sending and receiving countries (Bonifazi et al. 2008; Boyd 1989; Kritz et al. 1992). The first stylized account of MST detailed three such linkages (Fawcett 1989). Relational ties include historical and cultural similarities, and their implications for the integration of migrants in receiving countries. Regulatory ties include congruent migration policies rooted in shared economic and political memberships (e.g., the EU). Tangible ties include capital and trade flows, and encompass the notions of economic and relative economic advantage (Greenwood and McDowell 1991). Summarizing these linkages, Zlotnik (1992:20) argued that shared geography, “comparable levels of development . . . [and] cultural affinity” are the essential features of migration systems, further distilled by Andrienko and Guriev (2004:2) as “geography, initial conditions and legacies.”
Migration stems from push factors at origin, pull factors at destination, and “shared community” ties linking sending and receiving countries (Greenwood 1997; van Tubergen et al. 2004:705). MST highlights the third type of ties. Empirical work has demonstrated the importance of such relational linkages as shared colonial histories and common language(s) for the size of flows (Kim and Cohen 2010; Pedersen et al. 2008). Geographic isolation serves to strengthen regional and national expressions of solidarity, indicated by the positive association between country contiguity and the size of flows (Karemera et al. 2000). And tangible linkages, such as relative economic advantage, increase flows when, for example, the GDP per capita ratio of sending to receiving countries favors destinations (Greenwood and McDowell 1991; Leblang et al. 2009).
Despite the encompassing potential of MST, few efforts have substantiated its claims empirically because of problems with publicly available migration data. A product of different national data collection systems and timing criteria used to validate migrations, these data represent an inconsistent and incomplete set of measurements and are unsuited for cross-national comparison (Lemaitre 2005). As a consequence, research on MST has relied heavily on data on birthplace-specific migrant stocks (Zlotnik 1992), which is problematic because these confound mortality and naturalization with migration (Massey et al. 1998:112) and the distinction between past and recent migrants (Rogers 2008). Labor force surveys are likewise inadequate, given insufficient sample sizes to capture flows between all pairs of sending and receiving countries (Nowok et al. 2006:212).
Country-specific data on immigration and emigration flows offer a more promising avenue but are not immune to problems. To illustrate, we present in Fig. 1 the migration reports of sending and receiving countries in the EU-15 in 2003, obtained from Eurostat’s New Cronos database (Kupiszewska and Nowok 2008:43–45). Of 420 possible reports, only 72 flows are reported by both sending and receiving countries, and rarely do these agree.3 The rest are reported by only one country or are missing.
The problems of inconsistent data reflect different conventions used to track and measure migration (Kupiszewska and Nowok 2008; Poulain et al. 2006). A population registration system can be compared against many alternatives: for example, separate registration systems for nationals and foreigners in Slovenia, residence permits for foreigners in Hungary, and passenger surveys in Ireland and the UK. Timing criteria also vary, ranging from none specified to permanence, with 3-, 6-, and 12-month variants in between. Ideally, migration reports should reflect a single timing criterion: for example, the 12-month long-term criterion recommended by the United Nations (1998). Finally, additional problems arise because persons have fewer incentives to deregister when migrating abroad, resulting in emigration reports that are downwardly biased (Kupiszewska and Nowok 2008). Together, these problems raise questions about the adequacy of publicly available migration data in empirical work, including on MST.
In this article, we address these issues by extending a method to harmonize data on migration flows developed by van der Erf and van der Gaag (2007). Their approach assumes that countries can be ranked by the quality of their migration data. The migration reports of countries with less reliable data are then scaled to reporting conventions of countries with more reliable data using a set of immigration and emigration adjustment ratios. However, our assessment of the available information by which to rank countries suggests that relative data quality cannot be fully known a priori. We assume that only groups of countries can be ranked by relative data quality. Within each group, we permit country rank to vary randomly over 10,000 permutations. By averaging immigration and emigration adjustment ratios across permutations, we account for uncertainty in country rank in the standardization process. We then use the kNN algorithm to estimate the missing flows without the use of covariate information. Harmonized (i.e., standardized and complete) estimates are developed for flows among EU-27 countries and Norway each year from 2003 to 2007 and are suitable for testing MST.
Harmonization of Flow Data
Methodology
Data on migration flows within Europe are publicly available from Eurostat’s New Cronos database.4 We use the reports of sending and receiving countries classified by next and previous country of residence. The harmonization method detailed herein is divided into two steps: (1) standardization of available flow reports and (2) estimation of missing flows.
Standardization of Available Flow Reports
The starting point for standardizing the available migration reports of sending and receiving countries is to organize them into two matrices—reported immigration and reported emigration—for each year. Elements of these matrices correspond to flows between pairs of sending and receiving countries. We illustrate the method developed by van der Erf and van der Gaag (2007) in Fig. 2, using a hypothetical example of flows among six countries (A, B, C, D, E, and F) in a single year. Countries A and B are assumed to have reliable data. Country C is assumed to have less reliable data than A and B, but more reliable data than D. Countries E and F provide no data. Relative data quality is reflected in the order in which countries are listed in the matrices in Step 1.
First, we identify the most reliable immigration reports. In Step 1 of Fig. 2, these are the immigration reports of Countries A and B, and are considered fixed with immigration adjustment ratios of 1.00. Second, the emigration reports of Countries A and B are adjusted in Step 2 of Fig. 2. The emigration adjustment ratio for Country A is obtained by dividing the fixed immigration report of B (20) by the corresponding emigration report of A (15), for a ratio of 20 / 15 = 1.33. This ratio for Country B is 100 / 80 = 1.25. The emigration reports of Countries A and B are then standardized using these ratios.5
We proceed iteratively in Step 3 of Fig. 2 and adjust the migration reports of Country C. The immigration adjustment ratio for Country C is obtained by dividing the sum of the two standardized flows to this point (27 and 31) by the corresponding immigration reports of C (25 and 30), for a ratio of 1.05. This ratio is used to standardize the immigration reports of Country C from D, E, and F.6 The emigration adjustment ratio for Country C is calculated by dividing the sum of the two standardized flows (125 and 30) by the corresponding emigration reports of C (120 and 30), for a ratio of 1.03. The emigration reports of Country C to D, E, and F are then standardized using this ratio.7
Following standardization in Step 4 of Fig. 2, immigration from Country E to D is reduced slightly from 25 to 24, whereas emigration from D to E increases substantially from 40 to 153. The adjustment ratios used to produce these results are 0.94 and 3.83, respectively. This method is unable to estimate the missing flows between Countries E and F because data on these flows were not reported.
Having illustrated the method developed by van der Erf and van der Gaag (2007), a key assumption is that countries can be ranked by the relative quality of their migration data, which presumes sufficient a priori knowledge of data quality. Country rank is important because those with more reliable reports are treated as the standard to which less-reliable reports are benchmarked. Relative data quality, however, cannot be fully known a priori. The available taxonomies detailing the sets of conventions used to track and measure migration permit only informed guesses (Kupiszewska and Nowok 2008; Poulain et al. 2006). To overcome this problem, we therefore extend the preceding method by treating country rank—and thus the order of countries in the initial immigration and emigration matrices (hereafter, “rank-order”)—as a permutation problem.
We analyze the migration reports of EU-27 countries plus Norway each year from 2003–2007. Absent information on data quality, there are 28! combinations by which to rank-order these countries. Because this is computationally unmanageable, we proceed first by combining the data over the 2003–2007 period. Using the information provided by Poulain et al. (2006:222–227), we then assign each country to one of four groups, rank-ordered by the comprehensiveness of the data collection system and proximity to a 12-month timing criterion (United Nations 1998). From most to least reliable, these groups include (1) Nordic countries, (2) non-Nordic countries with reliable data, (3) non-Nordic countries with semi-reliable data, and (4) non-Nordic countries with unreliable data.8
No assumptions are made about relative data quality within each group. Instead, we generate 10,000 permutations, preserving the rank-order of the four groups but permitting the rank-order of countries within each group to vary randomly. For each permutation, we implement the standardization procedure detailed in Fig. 2 and calculate immigration and emigration adjustment ratios for each country. We then average each ratio across all permutations and apply these to standardize the migration reports of sending and receiving countries for each year, one at a time. We use the harmonic average because in theory, each permutation should produce estimates of the same flow. Average adjustment ratios should reflect this by mitigating the impact of large values and aggravating the impact of small ones. These ratios produce smoother, less fluctuating patterns over time, and therefore represent a more conservative approach.
Estimation of Missing Flows
With the available migration reports of sending and receiving countries standardized to the conventions of Nordic countries each year from 2003 to 2007, we estimate the remaining missing flows for countries that neither collect nor provide these data to Eurostat using the kNN algorithm. This algorithm uses neighboring observations (defined herein) to impute and smooth data points. The basic steps are (1) locating observations in a defined space; (2) setting the parameter k, the number of nearest neighbors; (3) calculating the distance to each neighbor for each observation; (4) adjusting each observation using an inverse distance weighted average of neighboring observations; and (5) repeating until convergence (Cover and Hart 1967).
We implement three variants of the kNN algorithm and illustrate our approach in Fig. 3. In Step 1, we extend the example in Fig. 2 by adding two matrices of hypothetical flows, yielding three years of standardized estimates among six countries. Flows with Countries A, B, and C as either the sending or receiving country are complete at all time points. Two sets of flows—E to F, and F to E—are missing at all time points. And one set of flows, D to F, is partially complete during the period because one flow is missing at Time 2. Our approach to estimating missing flows begins by imputing flows for pairs of sending and receiving countries with partially complete data.
The matrices in Step 1 of Fig. 3 contain one partial row sum and one partial column sum. These sums represent total flows from Country D to A and B (row sum) and from Countries A and B to D (column sum). As we did in Fig. 2,we assume that Countries A and B have the most reliable data, thereby providing a consistent metric to implement the kNN algorithm. The partial row and column sums can be viewed as x and y coordinates, respectively, to locate the flows from Country D to F in two-dimensional space. We display these coordinates in Step 2 of Fig. 3.9 We then calculate the Euclidean distance between the missing flow at Time 2 and each flow at Times 1 and 3. These distances are 20.59 and 81.41, respectively; thus, parameter k = 2.10 The inverse of each distance is then used as a weight to estimate the missing flow at Time 2, which is 126.11 The matrices in Step 3 of Fig. 3 now contain pairs of countries with either complete or missing data over the three-year period.
Because this approach is potentially sensitive to period fluctuations, we smooth flows for pairs of countries with complete data over the period (not shown). This step is similar to the last; however, the partial row and column sums now include flows from and to Countries A, B, C, and D. As before, parameter k = 2 because each pair of countries has three years of data and thus two neighbors per flow. We then calculate the appropriate Euclidean distances and weights. Because these flows are complete over the period, the flow being smoothed is assigned a weight of 0.50. Smoothing thus retains 50 % of the original standardized estimate. The remaining 50 % is a distance weighted average of neighboring flows.
The last step is to estimate flows between sending and receiving countries missing data at all time points. In Step 3 of Fig. 3, we again calculate partial row and column sums to include flows from and to Countries A, B, C, and D. Unlike in Step 2, however, we cannot use neighboring observations to impute missing flows because these data are missing at all time points. Instead, we use information from other pairs of countries with complete data. Treating the partial row and column sums in Step 3 as x and y coordinates, respectively, we define a plane in two-dimensional space for each pair of sending and receiving countries missing data over the period. The coordinates defining each plane are given in Step 4 of Fig. 3. Flows from Country E to F have x and y coordinates at Times 1, 2, and 3 of (397, 339), (437, 374), and (596, 510), respectively. The coordinates for flows from Country F to E are (443, 358), (489, 393), and (666, 537), respectively. The corresponding planes are thus bound by the points (397, 510), (596, 510), (596, 339), and (397, 339) for flows from Country E to F; and by (443, 537), (666, 537), (666, 358), and (443, 358) for flows from Country F to E.
We define as k-nearest neighbors those flows displayed in the matrices in Step 3 of Fig. 3 that occupy the aforementioned planes. Were one to plot these 84 flows using the partial row and columns sums, one would identify eight and four neighbors (k = 8; k = 4), respectively.12 We then proceed to calculate the appropriate Euclidean distances and inverse distance weights in Step 5 of Fig. 3.13 Missing flows are estimated as a distance weighted average of neighboring flows. These estimates are displayed at the bottom of Fig. 3.
The standardization and estimation procedures detailed herein yield a harmonized set of flows among EU-27 countries and Norway each year from 2003 to 2007. The estimates are consistent because they are standardized to the conventions of Nordic countries. They are complete because they rely on information from neighboring observations to estimate missing flows and for data smoothing.
Results
Average immigration and emigration adjustment ratios are displayed in Table 1. These ratios were generated from 10,000 random permutations of countries rank ordered in the initial immigration and emigration matrices, and are the first to be developed with a corresponding measure of dispersion. The immigration adjustment ratios for Nordic countries are equal to 1.00, reflecting our treatment of these migration reports as the standard at the outset. The corresponding emigration adjustment ratios are slightly greater than 1.00. The ratios for countries in the remaining three groups exhibit considerable variation. For example, the emigration adjustment ratio for Spain is 4.94, and reflects the average level of scaling required to standardize Spain’s emigration reports to the conventions of Nordic countries, Austria, Germany, and the Netherlands.
In Table 2, we assess our harmonized estimates relative to those developed by the MIMOSA project (de Beer et al. 2010; Raymer et al. 2011).14 Our total estimate of migration among countries in the EU-27 and Norway is 7.9 million persons over the 2003–2007 period. The corresponding MIMOSA estimate is 7.7 million. Means and standard deviations each year and over the period are also largely consistent. The largest discrepancy occurs in 2003, for which our mean estimate of migration (1,895) among these 28 countries is 13 % higher than the corresponding MIMOSA estimate (1,754). The congruence of our estimates and the MIMOSA estimates is due to similar immigration and emigration adjustment ratios (de Beer et al. 2010:471–473), with the two sets of estimates picking up a positive trend in the volume of migration flows over the period.
To further examine whether our estimates are reasonable, we display in Fig. 4 the emigration and immigration reports for selected sending and receiving countries, respectively, and corresponding harmonized estimates developed in this and the MIMOSA projects. For flows from Denmark to Sweden, the MIMOSA estimates are identical to the latter’s immigration reports, whereas our estimates are slightly lower given smoothing prior to estimating missing flows.
The emigration and immigration reports of Austria and Germany, respectively, lack the agreement evident among Nordic countries. Excluding 2007, our estimates fall between the reports of Austria and Germany and are lower than the MIMOSA project’s estimates. We selected this example to show that even minor discrepancies in adjustment ratios can produce notable differences in harmonized estimates. We calculated emigration and immigration adjustment ratios for Austria of 1.13 and 0.86, respectively; those from the MIMOSA project are 1.35 and 1.17, respectively (de Beer et al. 2010:473). Likewise, our emigration and immigration adjustment ratios for Germany are 0.87 and 0.71, respectively, compared with 0.71 and 0.81 in the MIMOSA project.
The bottom panels in Fig. 4 display reported and harmonized flows from Belgium to the Netherlands and from Portugal to Spain. The emigration reports of Belgium and Portugal are not collected by or provided to Eurostat. The immigration reports of the Netherlands and the two sets of harmonized estimates pick up the same positive trend in flows from Belgium. In contrast, Spain’s reported immigration from Portugal is substantially higher than the two sets of harmonized estimates. Here, our estimates in 2003 and 2004 are similar to those from the MIMOSA project but diverge thereafter. The divergence is due to the fact that the MIMOSA project imputed these flows, whereas we used emigration data covering only foreigners to calculate the adjustment ratios.
Identification of Migration Systems
Methodology
DXY is the average distance between clusters X and Y. The number of observations in each cluster is denoted by NX and NY, respectively, and d(xiyi) is the distance between observation xi in cluster X and yj in cluster Y. Eq. (1) can be expanded to accommodate k clusters.
To illustrate this method, designate the flow from Portugal to Spain in 2003 as cluster X, which in Fig. 4 is xi = 8,235 persons. Cluster Y is composed of the remaining flows, yj (j = 1,2, . . ., 3,779).15 Average linkage clustering calculates the absolute distance between the flow in cluster X and each flow in cluster Y, and records the average. At each iteration, one flow from cluster Y is reallocated to X, and the average distance recorded. Clusters X and Y are identified (in our case, as migration systems) when the average distance between X and Y is maximized.
Cluster analysis lacks a likelihood-based goodness-of-fit measure for determining the optimal number of clusters; thus, a set of stopping rules is typically employed. Milligan and Cooper (1985) examined 30 such rules and found via Monte Carlo simulations that the Calinski and Duda-Hart Indexes perform best in the analysis of continuous data. Large values of the Calinski Index, combined with large values of the Duda-Hart Index and corresponding small pseudo t-squared ratios, jointly determine the optimal number of clusters and are employed in our analysis (Milligan and Cooper 1985; Rabe-Hesketh and Everett 2006).
We perform average linkage clustering on a natural logarithmic transformation of our harmonized estimates after excluding flows equal to 0 (n = 20). We exclude these flows because migration systems require some volume of migration to be identified (Zlotnik 1992). Given the exploratory nature of this analysis, this fixes our attention on 3,760 flows among EU-27 countries and Norway each year from 2003 to 2007.16
Results
From exploratory cluster analysis, we find evidence for three migration systems among countries in the EU-27 and Norway from 2003 to 2007 after imposing two additional constraints from the literature on MST.17 First, flows between pairs of sending and receiving countries must fall within the same cluster during the five-year period (Zlotnik 1992:20). Second, countries must exchange (i.e., send and receive) flows with all or most countries sharing the same cluster to be considered part of the migration system (Zlotnik 1992:39).
After we impose these restrictions, the clusters identified earlier coalesce into three more or less geographically distinct migration systems: flows primarily among (1) five countries in the core of Europe (France, Germany, Italy, Spain, and the UK); (2) 14 countries located largely at the periphery (Bulgaria, Cyprus, the Czech Republic, Estonia, Greece, Hungary, Latvia, Lithuania, Luxembourg, Malta, Poland, Romania, Slovakia, and Slovenia); and (3) eight countries in the intermediate region (Austria, Belgium, Denmark, Finland, Ireland, the Netherlands, Norway, and Sweden). Of the 28 countries considered in our analysis, only Portugal could not be assigned to one of these three systems; despite exchanging large flows with Spain, migration between Portugal and France, Germany, Italy, and the UK is not sufficiently large to warrant inclusion in the core migration system.
The three migration systems, displayed in Fig. 5, are consistent with Salt’s (2001:31) conclusion that migration systems are, to some extent, “geographically discrete.” Although Massey et al. (1998:110) invoked the institutional ties shared by countries—for example, the Treaties of Rome—to argue for a single migration system (see also, Salt 1989), our evidence for the three aforementioned systems is derived purely from our harmonized estimates of migration flows and the set of criteria for identifying migration systems provided by MST.
Flows among the five countries in the core migration system are considerably large (Castles and Miller 2003). The lower threshold with respect to the size of flows defining this system is 6,722 persons per year, which is reasonable given that countries with larger populations tend to produce larger flows in both absolute (Kim and Cohen 2010) and relative (DeWaard and Raymer 2012) terms. At the other extreme, flows among periphery countries are quite small but nonetheless consistent over the period, with an upper bound of 175 persons per year. Despite concerns about the implications of East-West migration in Europe (Bauer and Zimmermann 1999), Kaczmarczyk and Okólski (2005:4) noted that periphery migration does not necessarily “spill over the region’s boundaries, especially to the West, but to a large extent [is] contained within” them. Although these flows are small, their consistency over the period supports this idea and thus supports viewing these flows as a distinct migration system.
These systems are more or less geographically discrete, but they are not closed. Castles and Miller (2003), for example, argued that Italy and Spain are relatively recent migrant destinations for persons seeking entrance into the core of Europe through what Calavita (2003:347) terms the “back door” (see also, Zolberg 2006:22). Although our data do not permit examination of these patterns, they do provide an indirect glimpse of a related feature of migration systems: namely, step, return, and circular migration. Typically described within the framework of cumulative causation, these patterns obtain when migration “tends to sustain itself” (Massey et al. 1998:45). Informed by MST, the two restrictions imposed at the outset of our analysis effectively require that flows between sending and receiving countries in a migration system be consistent over time. To the extent that patterns of step, return, and circular migration require consistency, the countries that make up the three migration systems cited herein are likely candidates for these processes.
Despite the exploratory nature of our analysis, the three systems identified represent an important step in the evolution of MST. Identifying migration systems is a “hard task, considering the complexity of . . . economic, social and political interactions” between sending and receiving countries (Bonifazi et al. 2008:123). Restricting—or, more aptly, reorienting—one’s efforts to the harmonized data themselves represents a viable starting point for empirical assessments of MST toward resolving debates on the existence, quantity, and character of migration systems in Europe (Massey et al. 1998; Salt 1989, 2001; Zlotnik 1992).
Determinants of Migration Flows
Methodology
Several recent efforts have attempted to identify the key determinants of international migration flows (Cohen et al. 2008; Kim and Cohen 2010; Mayda 2005; Pedersen et al. 2008); however, none have connected their work to MST or used harmonized flow data. Using a gravity model approach (Greenwood 1997), we aim to identify the key ties shared by sending and receiving countries that influence the size of flows. Although we consider the relevant push and pull factors of sending and receiving countries, respectively, our focus is with the “shared community” ties that link countries (van Tubergen et al. 2004:705).
Recalling Fawcett’s (1989) trichotomy, shared relational ties include historical similarities and intersections of sending and receiving countries with respect to nation-state formation and past colonial relationships. Shared official or national language has also been widely cited as a relevant relational linkage (Kim and Cohen 2010; Mayda 2005). Regulatory ties include geographic isolation, typically measured by country contiguity or shared region (Pedersen et al. 2008), as well as economic and political memberships. Raymer et al. (2011), for example, considered new accession status to the EU as a relevant predictor of flows. Finally, tangible ties are usually taken to mean the volume of trade flows between countries (Pedersen et al. 2008). Thinking more broadly, however, these also include features of relative economic well-being and standards of living in sending and receiving countries (Greenwood and McDowell 1991).
Descriptions of the measures used to capture the above linkages are provided in the Appendix table. To take one example, consider the indicator of shared national or colonial origins. We combined two variables to express whether sending and receiving countries were ever the same country or in a colonial relationship for a period of 75 years or more, up to and including the nineteenth century, or for 25–50 years during the twentieth century, thereby capturing such historical features as the Austro-Hungarian Empire and the division of Czechoslovakia in 1993.
We also consider the following push and pull factors in sending and receiving countries, respectively: population size, percentage of the population living in urban areas, old-age dependency ratio, change in labor force participation from the prior year, and social expenditures per household head. Additionally, as is typical in gravity models, we include a measure of the geographic distance between sending and receiving countries.
Our unit of analysis is the harmonized migration flow between a pair of sending and receiving countries in a single year. Using the natural logarithm of these flows as the dependent variable, we estimate a set of generalized estimating equations (GEE), lagging all time-varying measures by one year. Recent work on gravity models of migration flows has increasingly made use of GEE, given their ability to treat both clustering and autocorrelation (Kim and Cohen 2010; Pedersen et al. 2008). With respect to the latter, GEE provide explicit treatment of the within-panel correlation structure but do not impose predetermined assumptions about the origins of these dependencies (Hardin and Hilbe 2003; Liang and Zeger 1986).
Because GEE are not based on maximum likelihood theory, conventional indices of model fit do not apply (Cui 2007; Pan 2001). We therefore use the quasi-likelihood information criterion (QIC) and the marginal R2 statistic (Hardin and Hilbe 2003; Zheng 2000). Like the Bayesian information criterion (BIC), smaller QIC values indicate better model fit relative to nested models. Use of the marginal R2 is intended to overcome the problem raised by Cui (2007): namely, that there exist no rules for model selection like those suggested by Raftery (1995) using the BIC. The marginal R2 can be interpreted like the conventional R2.18
Results
GEE results are presented in Table 3. Model 1 includes only the relevant push and pull factors associated with sending and receiving countries, respectively, and serves as the baseline model. Each of the year coefficients show a steady increase in the size of flows over the period, from 0.7 % in 2004 to 10.8 % in 2007, relative to 2003.19 It is worth remembering, however, that the EU expanded considerably in 2004 from 15 to 25 countries. The lack of significant difference between the size of flows in 2003 and 2004 may therefore reflect this period of adjustment.
Hallmarks of gravity models are measures of the distance between sending and receiving countries and the population size of each. Because distance acts as a proxy for the monetary and psychic costs of migration (Greenwood 1997), it is not surprising that flows decline at a rate of 5.8 % when the distance separating potential migrants from their desired destinations increases by 10 %.20 In contrast, population size in both sending and receiving countries promotes migration, as does the percentage of the population living in urban areas. Neumayer (2005) envisioned these as simultaneous processes, whereby potential migrants gravitate to more populated urban areas, where migrant networks are relatively dense and information is more accessible. As forms of social capital, these networks ultimately increase the likelihood of migration (Massey et al. 1998).
Beyond demographic factors, economic conditions in sending and receiving countries are also important predictors of flows (Todaro 1976). Growth in labor force participation relative to the prior year signals economic growth but is associated with small increases in the size of flows—less than one-tenth of 1 % for a 10 % change in the former. A seemingly anomalous finding, the rate of total unemployment in sending countries deters flows. Andrienko and Guriev (2004) explained this in context of internal migration among Russian territories by suggesting that migration requires some minimal level of human and financial capital. In our case, to the extent that EU expansions into central and eastern Europe in 2004 and 2007 enlarged the pool of potential migrants too economically vulnerable to realize their migration intentions, the negative association in Model 1 is theoretically plausible. A similar negative association is found for the rate of total unemployment in receiving countries, which is expected, and aligns with prior research (Mayda 2005; Pedersen et al. 2008). Finally, to the extent that social welfare systems in sending and receiving countries attenuate the aforementioned dynamics, their contribution is largely offsetting, repelling and attracting flows by rates of −1.6 % and 2.6 %, respectively, when social expenditures are increased by 10 % (Svaton and Warin 2008).
In Model 2, we examine the ties shared by sending and receiving countries as informed by MST. In her analysis of publicly available immigration reports for 14 Organisation for Economic Co-operation and Development (OECD) countries from 1980–1995, Mayda (2005:17) concluded that “past colonial relationships do not appear to significantly affect migration.” Our findings contradict this conclusion. Shared national and colonial origins reflect continuity and overlap with respect to the historical contexts of migration (Massey et al 1998:112). Castles and Miller (2003), for example, noted that nation-state formation is intimately connected to the institutional and legal frameworks of jus sanguinis and jus soli as criteria for citizenship. Our results show that these linkages increase flows by a factor of 2.81. Considering only the shared colonial ties of sending and receiving countries, Kim and Cohen (2010:912–915) reported similar results in their analysis of publicly available emigration and immigration reports for 13 and 17 Western countries, respectively, from 1950 to 2007.
With respect to sharing an official or national language, our results confirm Mayda’s (2005:17) observation that “common language . . . is not always statistically significant” (emphasis ours). Although she attributes this to model specification (Pedersen et al. 2008), it is an open question whether findings from prior research, both in support of and against shared language, are products of model specification, versus reliance on publicly available migration reports. Unfortunately, there exists no set of harmonized flows that are comparable to the ones employed here to examine this claim.21
Moving from the relational to regulatory ties linking sending and receiving countries together, “state-to-state” relations are enhanced and even insulated by geographic isolation (Fawcett 1989:673–675). For example, the Inter-Nordic Migration Agreement ensures that persons can be registered in only one country at a time by requiring that migrants carry an inter-Nordic migration certificate (Poulain et al. 2006). These sorts of agreements ultimately shape migration in the region as a whole, not to mention the positive benefits for the accuracy of migration data across countries. Shared geographic region represents one approach to examining this dynamic. As could only be suggested by Fig. 5, flows between sending and receiving countries in the same European region are 78.8 % higher than those between countries lacking a common geography. As Fawcett (1989:674) noted, these ties catalyze similar migration policies, and thereby the broader “social acceptance” of migrants.
Despite geographic similarity, sending and receiving countries also differ with respect to their institutional affiliations. Raymer et al. (2011) suggested that accession status in the EU is a relevant predictor of migration flows. Our results confirm this observation, and suggest that flows between new and old accession countries are significantly higher than those between countries with similar accession dates. For example, flows between countries joining the EU in 2007 (i.e., Bulgaria and Romania) and member countries from the signing and later enforcement of the Maastricht Treaty in 1993 are 11.2 % higher than flows among the latter set of countries. As the EU considers further expansions to include Croatia, the former Yugoslav Republic of Macedonia, and Turkey (to name a few), these accessions are likely to fuel increased migration and anxieties about their potential labor market implications (Bauer and Zimmermann 1999).
Turning to the tangible ties shared by sending and receiving countries, neoclassical economics has a made a strong case for the role of relative economic advantage in both micro- and macro-level accounts of migration (Todaro 1976). Although the coefficient for the GDP per capita ratio in Model 2 is in the expected direction, it is not statistically significant. This finding does not imply that material ties shared by sending and receiving countries are altogether unimportant; instead, in the current context, the primacy of economic factors is not substantiated.
Two measures of model fit displayed at the bottom of Table 3 show that ties shared by sending and receiving countries are relevant for the size of flows. Our efforts broadly substantiate the tenets of MST, but the fit of Model 2 improves only modestly by 4.0 % and 2.7 % as judged by the QIC and marginal R2, respectively. Nonetheless, as markers of potential migration systems, consideration of these ties is of increasing importance in efforts to forecast migration given the potential for future EU expansions. Knowledge of migration systems provides a unique set of tools to gauge these trends toward managing the impact of migration as a catalyst for social change (Bijak 2006).
Discussion and Conclusion
In 2007, the European Parliament established clear definitions of emigration and immigration, and required that member countries provide these data to Eurostat (Regulation (EC) No 862 2007). These regulations, however, are indeterminate with respect to the methods used to produce estimates of flows (Fassmann 2009). Until a uniform set of conventions is in place, efforts to harmonize migration data are crucial to ensure that flows reflect a common and meaningful metric.
Because “lack of comparable data on migration flows hinders the demarcation of [a migration] system” (Zlotnik 1992:32), this article makes two contributions. First, we extended the harmonization method proposed by van der Erf and van der Gaag (2007) to account for uncertainty in rank ordering countries by data quality. The emigration and immigration adjustment ratios provided in Table 1 are the first to be accompanied by a corresponding measure of dispersion. Second, the resulting harmonized estimates were used to test MST empirically. Exploratory cluster analysis revealed three migration systems within Europe, each more or less geographically bound. Because these systems are congruent with prior research (Kaczmarczyk and Okólski 2005; Salt 2001), our contribution lies in using harmonized migration data to arrive at these findings per the expectations of MST. We also examined the key ties shared by sending and receiving countries that shape migration flows. Consistent with prior research (Kim and Cohen 2010; Raymer et al. 2011), relational and regulatory ties shared by sending and receiving countries emerged as paramount, thereby lending support to the tenets of MST (Fawcett 1989).
Empirical work on MST is a natural extension of efforts to harmonize migration flow data. This article provides an initial synthesis of these efforts and a template for future work. Subsequent efforts should consider the following potential issues with our approach. First, although we attempted to be systematic in assigning countries to one of four groups ranked by data quality, our classification is open to question. It is also invites alternative approaches. One currently in progress is the Integrated Modeling of European Migration (IMEM) project, which combines expert judgments and Bayesian methods to compensate for inconsistencies and incompleteness in migration reports (Raymer et al. 2010).22 Other approaches may wish to consider expanding the set of countries to include those outside the EU-27 and Norway.
Second, because there are few references in MST about how to go about “examining the matrices of in-flows, out-flows, and net-flows” to identify migration systems (Zlotnik 1992:20), our exploratory use of cluster analysis could be revised to include a more rigorous set of theoretical expectations and methodological strategies amenable to hypothesis testing. Similarly, the use of explanatory models to identify the key ties shared by sending and receiving countries would benefit from a more theoretically explicit set of hypotheses and indicators. Among these, Kim and Cohen (2010) cited the dearth of data on the congruence of policy measures between sending and receiving countries over time (Mayda 2005).
Acknowledgments
Jack DeWaard is supported by NICHD Training Grant T32-HD07014 and Center Grant R24-HD047873 to the Center for Demography and Ecology at the University of Wisconsin–Madison. James Raymer received support from the ESRC Research Centre for Population Change (Grant Reference RES-625-28-0001). The authors acknowledge the MIgration MOdeling for Statistical Analysis (MIMOSA) project in providing harmonized flow data for comparison, and comments from Theodore P. Gerber, Katherine J. Curtis, Jenna Nobles, Mary M. Kritz, Douglas T. Gurak, Joel E. Cohen, Stewart Tolnay, and three anonymous reviewers. Previous versions of this article were presented at the annual meeting of the Population Association of America on April 15, 2010 and the Integrated Modeling of European Migration (IMEM) workshop on May 27, 2011.
Appendix
Notes
The methodology and estimates are available online (www.nidi.knaw.nl/en/projects/230211/).
We use the term harmonize to mean both standardization of available migration data and estimation of the remaining missing flows. We distinguish these and the methods associated with each throughout this article.
420 = 15 sending countries × 14 receiving countries × 2 reports per flow (i.e., sender and receiver).
Retrieved online (http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=migr_imm5prv&lang=en and http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=migr_emi3nxt&lang=en).
For emigration flows from Country A to C, D, E, and F, the standardized figures are 20 × 1.33 = 27, 175 × 1.33 = 233, 35 × 1.33 = 47, and 40 × 1.33 = 53, respectively. The standardized figures for Country B to C, D, E, and F are 25 × 1.25 = 31, 40 × 1.25 = 50, 65 × 1.25 = 81, and 100 × 1.25 = 125, respectively.
The standardized flows are 55 × 1.05 = 58, 65 × 1.05 = 68, and 100 × 1.05 = 105, respectively.
The standardized flows are 90 × 1.03 = 93, 75 × 1.03 = 77, and 45 × 1.03 = 46, respectively.
Nordic countries: Denmark, Finland, Norway, and Sweden. Non-Nordic countries with reliable data: Austria, Germany, Netherlands, and Spain. Non-Nordic countries with semi-reliable data: Cyprus, Czech Republic, Italy, Latvia, Lithuania, Luxembourg, Poland, Slovakia, Slovenia, and the United Kingdom. Non-Nordic countries with unreliable data: Ireland, Portugal, and Romania (emigration only). Countries with missing data: Belgium, Bulgaria, Estonia, France, Greece, Hungary, and Malta.
The x and y coordinates are (95, 178), (105, 196), and (143, 268) at Times 1, 2, and 3, respectively.
In our analysis, 1 ≤ k ≤ 4 because pairs of sending and receiving countries with partially complete data have between one and four years of valid data from 2003 to 2007.
.
In our analysis, the maximum value of k is 125.
To save space, we show only the results of these calculations, which can be replicated by expanding the equation in Step 2. For example, the missing flow from Country F to E at Time 1 is estimated as follows: .
Abel (2010) and Poulain (1993, 1999) developed harmonized migration estimates, but for fewer sending and receiving countries.
NY = 3,779 = 28 sending countries × 27 receiving countries × 5 years of data – 1 flow in cluster X.
3,760 = 28 sending countries × 27 receiving countries × 5 years of data – 20 zero flows.
The Calinski Index for the three-cluster solution is 6,743.92; the Duda-Hart Index and its corresponding pseudo t-squared ratio are 0.387 and 443.80, respectively. Relative to a four-cluster (or higher-cluster) solution, with values of 4,634.51 on the Calinski Index and 0.335 and 3,252.41 for the Duda-Hart Index and pseudo t-squared ratio, respectively, the stopping rules employed suggest three optimal clusters (Milligan and Cooper 1985; Rabe-Hesketh and Everett 2006).
, where
.
.
.
Recall that the MIMOSA project used covariate information, including shared language family, to estimate missing flows (Raymer and Abel 2008; Raymer et al. 2011).
A summary of the IMEM is available online (http://www.norface.org/migration12.html).