Abstract
This research note reinvestigates Abdellaoui et al.’s (2019) findings that genetically selective migration may lead to persistent and accumulating socioeconomic and health inequalities between types (coal mining or non–coal mining) of places in the United Kingdom. Their migration measure classified migrants who moved to the same type of place (coal mining to coal mining or non–coal mining to non–coal mining) into “stay” categories, preventing them from distinguishing migrants from nonmigrants. We reinvestigate the question of genetically selective migration by examining migration patterns between places rather than place types and find genetic selectivity in whether people migrate and where. For example, we find evidence of positive selection: people with genetic variants correlated with better education moved from non–coal mining to coal mining places with our measure of migration. Such findings were obscured in earlier work that could not distinguish nonmigrants from migrants.
Introduction
The geographic distribution of genetic predispositions is not random. Individuals with genetic traits correlated with higher educational attainment (EA) are more likely to live in socially advantaged areas than their counterparts with genetic traits correlated with lower EA (Belsky et al. 2019; Domingue et al. 2015). Geographic mobility (i.e., migration) is a process that shapes the nonrandom geographic distribution of genetic predisposition. Recent work by Abdellaoui et al. (2019) (henceforth, AA) demonstrated the role of migration in the geographic clustering of genetic variants, particularly alleles associated with EA. Specifically, those who moved to places with low socioeconomic status (SES; represented by coal mining) carry genetic variants correlated with low EA, whereas those who have genetic traits correlated with high EA were selected to leave low-SES places.1 These migration patterns are consistent with the findings that skilled individuals are more likely to migrate to seek better education and occupational opportunities (Coulter and Scott 2015), whereas the durable and affordable housing and public transportation networks in places with coal mining attract poor, low-skilled migrants (Rodden 2010). AA concluded that the patterns of migration selection they found exacerbate geographic inequalities in genetic variants.
AA uncovered an important role of geographic mobility in genetic research on sociogeographic patterning. However, we note a limitation in their migration measure: AA's analysis abstracted from typical measures of migration to consider only flows between types of places (i.e., coal mining vs. non–coal mining). Therefore, people who migrated between non–coal mining places or between coal mining places were classified as having “stayed” in (non–)coal mining places. This operationalization of migration blurs the interpretation of their analysis because their “stayer” groups included migrants and nonmigrants, who can be substantially dissimilar, as demography and migration studies have shown.
The healthy migrant hypothesis posits that skilled and healthy individuals are more likely to be selected to migrate than their less-skilled and less-healthy counterparts (Jasso et al. 2004; Palloni and Arias 2004; Palloni and Morenoff 2006). Ample empirical evidence supports the healthy migrant hypothesis in the context of international (Crimmins et al. 2005; Fuller-Thomson et al. 2015; Landale et al. 2000; Riosmena et al. 2013; Rubalcava et al. 2008) and internal migration (Lu 2008; Ma et al. 2020; Molloy et al. 2011; Nauman et al. 2015; Tong and Piotrowski 2012; Wilding et al. 2016). These consistent findings of migrant–nonmigrant differences imply the importance of clearly distinguishing movers from nonmovers in migration research, which AA's migration measure does not accomplish. AA's combined comparison category prevented them from distinguishing between genetic selection of whether to migrate versus where to migrate.
Our analysis tests whether genetics differ by whether respondents migrate, where they migrate, or both. We focus on reassessing AA's work with a revised migration measure, distinguishing migrants from nonmigrants. However, our study provides important contributions to research on migration and social genomics. Despite theoretical and empirical reasons to expect genetic differences by whether and where people migrate, to our knowledge, no research has examined these two migration processes simultaneously.2 Therefore, which migration process is the primary driver of genetic differences remains uncertain. Our study fills this gap. Results of a revised migration measure reveal important differences among groups who were combined in AA's work. Contrary to a simple story of cumulative advantage through migration, our findings show that migrants from non–coal mining to coal mining places have genetic variants correlated with higher EA than those who remain (nonmigrants) in non–coal mining places. Our findings thereby demonstrate that genetics select whether people migrate, rather than where they migrate, even for those migrating to economically deprived places.
Data and Methods
Data and Sample
We used data from the UK Biobank (UKB), data on geographic boundaries of coal fields, and local authorities for the 2011 census in the United Kingdom.3 The UKB is a large-scale, population-based, longitudinal biobank study of more than 500,000 people that collected baseline data from 2006 to 2010. The UKB recruited the baseline sample through an invitation letter sent to individuals aged 40–69 who registered with a National Health Service General Practitioner and lived reasonably close to one of the 22 catchment areas where UKB assessment centers were located. Of the respondents who completed the study (n = 502,505), we excluded individuals of non-European ancestry (n = 93,549) to increase the ancestral homogeneity of our sample.4 We also excluded those without data on place of birth (n = 23,421) and genetics (n = 678), leaving us an analytic sample of 384,857 respondents.
Variables
Genetic Measurements
We used polygenic indices (PGIs) as a genetic marker to reevaluate how migration patterns are associated with genetics. A PGI is a summary measure representing cumulative correlations between independent genomic loci and an outcome of interest (e.g., EA) (Conley 2016). We used published results of genome-wide association study (GWAS) summary statistics to construct 13 PGIs: EA (Lee et al. 2018), cognitive function (Rietveld et al. 2014), body mass index (BMI) (Locke et al. 2015), waist-to-hip ratio (Shungin et al. 2015), attention-deficit/hyperactivity disorder (ADHD) (Demontis et al. 2019), coronary artery disease (CAD) (Nikpay et al. 2015), ever having smoked (Tobacco and Genetics Consortium 2010), height (Wood et al. 2014), age at menopause (Day et al. 2015), number of cigarettes smoked per day, age at smoking initiation, smoking cessation (Tobacco and Genetics Consortium 2010), and bipolar disorder (Stahl et al. 2019). To avoid overfitting, we excluded UKB samples from GWAS summary data used for PGI training.5 Following the standard procedure to construct a PGI, we clumped single-nucleotide polymorphisms using Phase 3 European samples from the 1,000 Genomes Project as the linkage disequilibrium reference. The linkage disequilibrium window size of 1 megabase (Mb) and a pairwise R2 threshold of 0.1 were used. No p-value thresholding was applied for variant selection. The PGIs were calculated using PRSice-2 software (Choi and O'Reilly 2019) and were standardized to a mean of 0 and a variance of 1 in downstream analyses.
Migration Patterns
To construct a migration measure, we used coordinate variables of places of birth and current residence at the time of the initial survey interview (2006–2010) and coordinate information in a follow-up survey for the few respondents (n = 11) who did not provide this information in an initial survey but provided it in a follow-up survey.6 Replicating AA's work as closely as possible, we first classified UKB respondents into four migration groups: (1) moved away from a coal mining place; (2) stayed out of a coal mining place; (3) moved to a coal mining place; and (4) stayed in a coal mining place. The second and fourth (“stay”) categories combine respondents who did not move and those who moved but remained in the same type of place (e.g., moved between two coal mining places). Thus, the four-type classification cannot separate the analyses of genetic differences by where UKB respondents moved and genetic differences in whether they moved (or stayed).
We then expanded AA's four-category migration measure to a six-category migration measure: (A) moved from a coal mining to a non–coal mining place; (B) moved between two non–coal mining places; (C) stayed in the same non–coal mining place; (D) moved from a non–coal mining to a coal mining place; (E) moved between two coal mining places; and (F) stayed in the same coal mining place. Note that AA classified Groups B and E as having “stayed.” To construct a six-category migration measure, we first identified whether respondents' birthplace and place of current residence are coal mining or non–coal mining places by using geographic boundary data for coal mining places. We then classified respondents as nonmigrants if they lived in the local authorities where they were born and as migrants otherwise. We employed the local authorities used in the 2011 census as a geographic unit because we have publicly available geographic boundary data. The 2011 census covered 404 local authorities, and the average land area of local authority districts in 2015 was approximately 240 square miles.7Table 1 summarizes the migration classification and displays the sample sizes of the migration groups.
Analytic Strategy
We began by investigating how migration and types of places are associated with genetic predispositions. Using an ordinary least-squares regression, we estimated the equation
where is the polygenic index. , , and are dichotomous measures distinguishing migrants from nonmigrants (Groups A, B, D, and E vs. others), identifying those living in a coal mining place (Groups D, E, and F vs. others), and indicating those born in a coal mining place (Groups A, E, and F vs. others). is a vector of the first 20 genetic principal components that account for population structure–related confounding effects (Price et al. 2006).
We then reassessed AA's work with a revised migration measure. The regression equation for this test can be expressed as
where is a dummy-coded six-category migration group assignment. Standard errors are robust to heteroskedasticity.
Results
Migration and EA PGI
Figure 1 illustrates differences in EA PGI based on migration status, current place of residence, and birthplace; estimated regression coefficients are available in Table A1. We observe the largest difference in EA PGI between migrants and nonmigrants, followed by the difference by the current place of residence and then the difference by birthplace. The postestimation Wald tests show that the estimated coefficient for the difference in EA PGI between migrants and nonmigrants is significantly larger than the other coefficients at the 0.1% level, suggesting migration's stronger signal for genetic selectivity over geographic characteristics.
Predicted EA PGI by the six-category migration group is presented in Figure 2 (for regression coefficients, see Table A2). Consistent with the findings displayed in Figure 1, these results show clear differences between migrants and nonmigrants. For example, among those living in a non–coal mining place, EA PGI is substantially lower for nonmigrants (Group C) than for migrants (Groups A and B). Similar migrant–nonmigrant differences are evident for those living in a coal mining place (Groups D–E vs. Group F).
Migrants and nonmigrants also differ by birthplace. Specifically, among those born in a non–coal mining place, migrants (Groups B and D) carry a significantly and substantially higher EA PGI than nonmigrants (Group C). Similarly, among those born in a coal mining place, EA PGI is substantially higher for migrants (Groups A and E) than for nonmigrants (Group F). Furthermore, nonmigrants in a non–coal mining place (Group C) have a significantly lower EA PGI than migrants from one coal mining place to another (Group E). Overall, these results suggest that regardless of birthplace and current place of residence types, migrants have genetic variants correlated with higher EA than nonmigrants.
Results of the six-category migration group also demonstrate differences among migrants. For migrants from a coal mining place, those who moved to a non–coal mining place (Group A) have a significantly higher EA PGI than those who moved to another coal mining place (Group E). Likewise, EA PGI significantly differs by the current place of residence type among migrants born in a non–coal mining place (Groups B and D). Differences by birthplace are also evident (Group A vs. B, and Group D vs. E). The difference is not as large as the differences in EA PGI by where migrants moved, but migrants born in a non–coal mining place have higher EA PGI than migrants born in a coal mining place.
Migration and PGI for Health-Related Traits
Because migration is also a selective process in health (Jasso et al. 2004; Palloni and Arias 2004; Palloni and Morenoff 2006), we conducted parallel analyses with 12 health-related PGIs. While the evidence found for EA PGIs was clearer, Figure 3 also illustrates migration's stronger signal for genetic selectivity over geographic characteristics in health-related PGIs (see Table A1 for regression coefficients). Figure 4 summarizes the relationships between migration and 12 health-related PGIs (estimated regression coefficients are available in Table A2). Although less clear than results with EA PGIs, these results are generally consistent with the findings for EA PGIs. Specifically, nonmigrants who stayed out of coal mining places (Group C) have PGIs correlated with lower cognitive function, higher BMI and waist-to-hip ratio, shorter height, higher risks of ADHD and CAD, higher likelihood of ever having smoked, and more frequent cigarette smoking than migrants who moved to a coal mining place (Group D), at the 5% level. Further, nonmigrants in coal mining places (Group F) have genetic variants correlated with the poorest health relative to any migrant group. Importantly, genetic risks of health issues are higher for migrants than for nonmigrants, except for the genetic risk of bipolar disorder, which is higher for nonmigrants than migrants.
We also observe differences among migrants. For migrants born in a coal mining place, those who moved to a non–coal mining place (Group A) have significantly higher cognitive function, height, and age at menopause PGIs and lower BMI, waist-to-hip ratio, ADHD, CAD, and ever having smoked PGIs than those who moved between two coal mining places (Group E), at the 5% level. Similarly, those who moved between two non–coal mining places (Group B) have significantly higher cognition, height, and smoking cessation PGIs and lower PGIs correlated with BMI, ADHD, CAD, ever having smoked, and number of cigarettes smoked per day relative to those who moved from a non–coal mining to a coal mining place (Group D). Further, among migrants living in a non–coal mining place, those born in a non–coal mining place (Group B) generally have genetic variants correlated with better health than their counterparts born in a coal mining place (Group A). Similar differences are evident among migrants living in a coal mining place (Groups D and E).
Additional Analyses
Overrepresentation of Well-Educated UK Residents
The overrepresentation of well-educated UK residents in the UKB is widely known (Munafò et al. 2018). To assess the potential consequences of this overrepresentation on our findings, we conducted parallel analyses without college graduates (n = 115,432, 30% of the analytic sample).8 This robustness check reduces the data's cases of migration to pursue higher education.
Panel a of Figure 5 shows that, on average, the EA PGI for those who stayed in a non–coal mining place (Group C) is 0.049 standard deviations lower than for those who moved to a coal mining place (Group D), with a 95% confidence interval of 0.035–0.063 standard deviations (estimated regression coefficients are available in Table A3). This finding suggests that non–coal mining places do not send a group of individuals with lowest EA PGI to coal mining places, consistent with the main findings. Further, nonmigrants in coal mining areas (Group F) have significantly lower EA PGI than migrants within coal mining areas (Group E). These results, combined with the findings that nonmigrants' lowest EA PGI is within non–coal mining areas, imply that genetic predispositions differ by whether people migrate, rather than where they migrate, even after college graduates are excluded.
We also conducted robustness checks for health-related PGIs (see Figure A2 and Table A3). Results of cognitive function PGI are very similar to EA PGI. Nonetheless, the differences in the other health-related PGIs between nonmigrants in non–coal mining areas (Group C) and migrants to coal mining areas (Group D) are imprecise. One exceptional case is the PGI for ever having smoked, which shows that nonmigrants in non–coal mining areas have a significantly lower genetic risk of ever having smoked than migrants to coal mining areas, at the 5% level. By contrast, clear migrant–nonmigrant differences can be seen in coal mining areas (Groups E and F). Relative to the nonmigrants, migrants have genetic variants correlated with better cognition, lower BMI and waist-to-hip ratio, a lower risk of CAD, taller height, and an older age at smoking initiation. Overall, the results of these robustness checks do not question the main findings.
Birthplace as a Confounder
It is now evident that birthplace is an unignorable confounder in genetic research (Abdellaoui et al. 2022). To mitigate the confounding effect of birthplace on our findings, we estimated relationships between a six-category migration measure and PGIs with the birthplace fixed effect. Specifically, we created birthplace dummy variables at the Middle Layer Super Output Area (MSOA) level and included them in regression Eq. (2).9
Predicted EA PGIs by migration groups with the birthplace fixed effect are presented in panel b of Figure 5 (estimated regression coefficients are available in Table A4). We find that the EA PGI is significantly lower for nonmigrants in a non–coal mining place (Group C) than for migrants, except for migrants who moved within coal mining places (Group E) (p = .105). By contrast, nonmigrants in a coal mining place (Group F), on average, have significantly lower EA PGI than migrants, regardless of birthplace and current place of residence (Groups A, B, D, and E). Further, PGIs for cognitive function, ADHD, CAD, ever having smoked, and height demonstrate that nonmigrants in a non–coal mining place (Group C) have significantly higher genetic health risks than migrants to coal mining places (Group D) (see Figure A3). Additionally, nonmigrants in coal mining places (Group F) have genetic traits correlated with worse health than any migrant group, except for the genetic risk of bipolar disorder. Overall, our main findings—larger genetic selectivity, especially in EA PGI, by whether rather than where to migrate—hold even after we control for the birthplace fixed effect.
Discussion
Geographic mobility is one mechanism shaping the geographic distribution of genetic predispositions. AA demonstrated the role of geographic mobility in the geographic distribution of genetic predisposition; however, their migration measure did not clearly distinguish migrants from nonmigrants. We aimed to deepen our understanding of the relationships among migration, social stratification, and genetic predispositions by reassessing AA's work with an updated migration measure distinguishing migrants from nonmigrants.
Results of the six-category migration group demonstrate important differences among groups who were combined in AA's work. AA's result suggests a clear gradient among the four groups: the group with the highest EA PGI left coal mining areas, followed by the group remaining in non–coal mining areas, the group moving to coal mining areas, and the group remaining in coal mining areas. This pattern demonstrates a cumulative process over time, whereby coal mining areas become more disadvantaged by losing individuals with high EA PGI and receiving individuals with low EA PGI. The results of our study that further split these groups suggest a reordering and reinterpretation prioritizing mover–nonmover comparisons, regardless of destination. We find that each of the mover groups has a higher EA PGI than the stayer groups, including those who stayed in economically less deprived areas. Contrary to AA's interpretation, these results do not align with the recent SES-driven migration patterns (Coulter and Scott 2015; Rodden 2010). Because non–coal mining places do not send a group of individuals whose PGIs are correlated with the lowest EA, our results do not support the finding that the durable and affordable housing and public transportation networks in coal mining places attract poor, low-skilled migrants (Rodden 2010). Alternatively, given the strong association between EA and health (Barrow and Malamud 2015; Elo 2009; Hout 2012), our results are more consistent with the healthy migrant literature (Jasso et al. 2004; Palloni and Arias 2004; Palloni and Morenoff 2006), suggesting that skilled and healthy individuals are more selected to migrate than less-skilled and unhealthy counterparts. Overall, each comparison of EA PGI by a six-category migration group suggests that the decision to migrate is the core dimension of genetic stratification and that migration from non–coal mining to coal mining places is less consistent with the place-based cumulative process.
We also investigated relationships between migration patterns and health-related PGIs. Similar to the results of EA PGI, we found that migrants who moved out of a coal mining place have PGIs correlated with better health, but non–coal mining places do not send a group of individuals with genetic variants correlated with the worst health to coal mining places. Whereas migration out from a coal mining place facilitates the geographic clustering of PGIs correlated with better health in non–coal mining places, geographic clustering of PGIs correlated with worse health in coal mining areas through migration is less supported. Importantly, as AA also reported, we found a higher genetic risk of bipolar disorder among migrants relative to nonmigrants, probably because skilled individuals are more likely to suffer from bipolar disorder (MacCabe et al. 2010; Tiihonen et al. 2005). Thus, non–coal mining places attract skilled individuals who also have a higher genetic risk of bipolar disorder.
This study is not without limitations. First, the UKB is not a nationally representative survey. The UKB recruited only those living reasonably close to assessment centers and oversampled well-educated and healthy UK residents (Fry et al. 2017; Munafò et al. 2018). Therefore, our findings may not be representative of the relationships between migration and genetic variants in the United Kingdom. Second, excluding individuals of non-European ancestry prevents us from generalizing our findings because we cannot infer the relationship between migration and genetic predispositions to non-European ancestral groups. This limitation is important, given that unequal advancement of genetic research by ancestral groups may exacerbate disparities between Europeans and other ancestral groups (Martin et al. 2019). Finally, our migration measure relied on the places of birth and current residence at the time of the survey, but this snapshot precludes us from capturing migration histories. Respondents who migrated to pursue college education in a non–coal mining place but returned to their coal mining birthplace to work may carry different genetic traits than those who stayed in their coal mining birthplace over the life course. Failure to capture migration histories is a well-known measurement error in migration research (Molloy et al. 2011), and lifetime migration histories would allow us to evaluate the relationship between geographic mobility and genetic variants in greater detail. However, such data do not exist.
With our extended migration measure that distinguishes migrants from nonmigrants, we shed light on unobserved but unignorable differences in genetic traits by whether people migrate. We find limited support for the cumulative disadvantage of coal mining places through in-migration of those with low EA PGI from non–coal mining places because a group of individuals with the lowest EA PGI in non–coal mining places stay in non–coal mining places. Nevertheless, these results do not indicate the absence of exacerbated genetic-related social inequalities between places through migration because people with genetic variants correlated with higher SES are leaving coal mining places, consistent with AA's findings. Findings for the exacerbation of genetic-related social inequalities through migration highlight the need for continued scholarly investigations of the role of migration in the geographic inequality of genetic variants.
Acknowledgments
The authors gratefully acknowledge the use of the facilities of the Center for Demography of Health and Aging (P30 AG016266) and the Center for Demography and Ecology (P2C HD067873) at the University of Wisconsin–Madison. We thank the University of Wisconsin's Social Genomics Research Group members for helpful comments. We also thank Abdel Abdellaoui and colleagues for providing programming code and geographic boundary data to reassess their work. They also provided invaluable comments. An earlier version of this paper was presented at the 2021 annual meeting of the Population Association of America and the 2021 Integrating Genetics and the Social Sciences conference (R13-AG062366). This research was conducted using the UK Biobank Resource under application number 57284 (http://www.ukbiobank.ac.uk/). Q. Lu and J. M. Fletcher codirected this research.
Notes
Because genetic variants are determined at conception and thus before migration, genetic migrant–nonmigrant differences result from selection (i.e., those with certain genetic traits are selected to migrate), but not from causation (i.e., migration does not change genetic traits).
Belsky et al. (2016) found that migrants’ genetic traits are correlated with higher EA than nonmigrants’ among New Zealand origins, and Belsky et al. (2019) demonstrated that those with genetic traits correlated with lower EA are more likely to move to disadvantaged neighborhoods. However, neither study examined how genetics differ by both whether and where people migrate simultaneously.
See http://infuse.ukdataservice.ac.uk/help/definitions/2011geographies/index.html. The geographic boundary data on coal fields are no longer publicly available. We thank Abdellaoui and colleagues for providing the programming code and geographic boundary data to reassess their work.
Pooling multiple ancestral groups induces false associations in genetic research (Conley 2016). Given the UKB’s limited sample of non-European ancestral participants, we focus on European ancestry.
Although more recent GWAS summary statistics exist, we did not use them because they include UKB respondents.
The UKB used different approaches to collect coordinate information for respondents’ places of residence at the time of survey and birthplace. Place of residence utilized respondents’ residential postcode (see https://biobank.ndph.ox.ac.uk/showcase/ukb/docs/UKgrid.pdf). Birthplace was based on town or district respondents selected from roughly 43,000 options organized in a tree structure. Although the many options offer a high granularity (see the geographic distribution of UKB respondents’ birthplace in Figure A1; all tables and figures designated with an “A” are available in the online appendix), respondents might have been able to select not only specific end nodes but also parent nodes within the hierarchy. Therefore, some respondents might have reported a less-precise birthplace (e.g., “London” or “Liverpool”). We tested the sensitivity of our findings to this issue and found no empirical evidence of a substantial impact of this issue on our main findings (see section 2 of the online appendix).
This land area is roughly the average of counties in Rhode Island (207 square miles) and Virginia (295 square miles), which are the second and third smallest average land areas in a state, just following the District of Columbia.
We also excluded those missing educational attainment information (n = 3,622, 0.94% of the analytic sample) in this robustness check.
The boundary data at the MSOA level are publicly available (http://infuse.ukdataservice.ac.uk/help/definitions/2011geographies/index.html). The MSOA is a smaller geographic unit than the local authority district. England and Wales comprised 7,201 MOSAs in the 2011 census but encompassed only 324 local authority districts in 2011.