Abstract
Migration is selective, resulting in inequalities between migrants and nonmigrants. However, investigating migration selection is empirically challenging because combined pre- and post-migration data are rarely available. We propose an alternative approach to assessing internal migration selection by integrating genetic data, enabling an investigation of migration selection with cross-sectional data collected post-migration. Using data from the UK Biobank, we utilized standard tools from statistical genetics to conduct a genome-wide association study (GWAS) for migration distance. We then calculated genetic correlations to compare GWAS results for migration with those for other characteristics. Given that individual genetics are determined at conception, these analyses allow a unique exploration of the association between pre-migration characteristics and migration. Results are generally consistent with the healthy migrant literature: genetics correlated with longer migration distance are associated with higher socioeconomic status and better health. We also extended the analysis to 53 traits and found novel correlations between migration and several physical health, mental health, personality, and sociodemographic traits.
Introduction
Multiple theories posit that migrants are not randomly selected from a population. Examples include the healthy migrant hypothesis (Jasso et al. 2004; Palloni and Arias 2004; Palloni and Morenoff 2006), Borjas' (1987) application of the Roy (1951) model of selection in the economics literature to migration, and Ravenstein's (1885) law of migration. One illustrative migration selection process is that skilled and healthy individuals are more likely to migrate than less skilled and unhealthy counterparts because these qualities are necessary for the benefits of migration to outweigh its economic, personal, physical, and psychological costs (see Feliciano 2020).
Much research has presented empirical evidence supporting this positive migration selection. For example, immigrants in the United States, particularly long-distance migrants, tend to be more educated than those remaining in their home countries (Feliciano 2005). Earlier research also found better health among U.S. immigrants relative to nonmigrants residing in countries of origin, consistent with positive health selectivity of migration (Bostean 2013; Crimmins et al. 2005; Morey et al. 2020; Ro et al. 2016; Rubalcava et al. 2008).1 Similarly, research in Europe has demonstrated that migrants have higher childhood socioeconomic status (SES) and health than nonmigrants in their sending countries (Fuller-Thomson et al. 2015; Schmidt et al. 2022), suggesting positive migration selection. Importantly, migration selection on SES and health can be found in internal migration contexts (Borjas et al. 1992; Lu 2008; Nauman et al. 2015; Rauscher and Oh 2021; Wilding et al. 2016).2 Overall, international and internal migration are highly selective along many dimensions of SES and health.
Despite these established theoretical frameworks, data availability is a crucial limitation for studies examining migration selection. Given the effects of migration on migrants' SES and health (Lu 2010), a simple comparison of SES and health between migrants and nonmigrants reflects both selection and causation of migration. Innovative research has examined migration selection using longitudinal data that include both pre- and post-migration information (Abramitzky et al. 2012; Fuller-Thomson et al. 2015; Lu 2008; Nauman et al. 2015; Rubalcava et al. 2008), but such data are rarely available. This data constraint generally prevents scholars from separating migration selection and migration effects (Darlington et al. 2015).
The issue of data availability in migration studies goes beyond the lack of longitudinal data tracking migration behaviors. Prior research examining the healthy migrant hypothesis relied on subjective health assessments (Akresh and Frank 2008; Mehta and Elo 2012; Nauman et al. 2015).3 However, these measures might partially reflect systematic differences in reporting tendencies between sociodemographic groups (Altman et al. 2016; Grol-Prokopczyk et al. 2011; Rossouw et al. 2018). An alternative to self-assessment is biomarker data, which can represent objective measures of risks of future diseases (Crimmins et al. 2010; Harris and Schorpp 2018). Thus, biomarker measures might uncover migration selection in latent health risks that do not appear in subjective health assessments. This feature of biomarkers is important in a case such as internal migration in the United Kingdom, where migration is concentrated among young adults (Bernard et al. 2016), who are less likely to perceive health issues.
We propose a novel approach to assess migration selection using a combination of standard genomic analysis toolkits: a genome-wide association study (GWAS) and genetic correlation analysis.4 We first explore genetic variants correlated with migration through a GWAS and then use a genetic correlation analysis to assess whether and how these genetic variants correlated with migration are also associated with SES and health. Given that skilled and healthy individuals are selected to migrate, migrants are expected to have genetic traits correlated with higher SES and better health than nonmigrants: genetic variants correlated with migration will also be correlated with higher SES and better health.
The framework takes advantage of the fact that genetic variants are determined at conception, remain unchanged throughout the life course, and thus cannot be affected by migration, SES, or self-assessed health.5 These qualities allow us to rule out migration effects and collect these measures post-migration. The broad scope of prior genetic analysis also enables us to consider many traits in our genetic correlation analysis, even those that affect older individuals relative to our sample. Together, these methods allow novel correlations between migration and other traits that are not typically measured (or cannot be measured). Additionally, future research can use our GWAS findings for migration in downstream analysis of migration in smaller datasets that contain genetic data, such as the Health and Retirement Study and the English Longitudinal Study of Aging. Furthermore, our results will directly show the role of genetics in migration selection, which demographers have suggested (Palloni and Arias 2004). Overall, our exploration of genetic correlations between migration and SES and health sheds light on understudied but potentially important dimensions of migration selection and integrates a social genomics approach into the migration literature.
Data and Methods
The UK Biobank (UKB) is a large-scale biobank study of more than 500,000 people that collected baseline data in 2006–2010. The UKB recruited the baseline sample through an invitation letter sent to individuals aged 40–69 who were living reasonably close to one of the 22 catchment areas where UKB assessment centers were located (see Figure 1). The UKB is suitable for our purposes because it includes a large sample of genotyped individuals, allowing us to implement a GWAS. The UKB also collected coordinate information on places of birth and current residence (at the time of the survey), which are required to construct a migration measure (described later). Of the respondents who completed the study (n = 502,505), we excluded those with no migration distance data (n = 61,672) and those of non-European ancestries (n = 50,024). After additional quality control, 359,571 samples remained.6
We used single-nucleotide polymorphisms (SNPs) as a genetic marker.7 Our dependent variable is migration distance, representing the routing distance between the self-reported coordinates for respondent's places of birth and current residence. We measured migration distance as a continuous outcome because it does not require an arbitrary classification of respondents as (internal) migrants versus nonmigrants or long-distance versus short-distance migrants. Because preliminary analyses showed that more genetic variants correlated with logged migration distance than with migration distance, we focus on logged migration distance.8 Additionally, we included the following control variables: age; sex; the type of chip used for genotyping; and the first 20 principal components, which account for population structure–related confounding (Price et al. 2006).
We performed a GWAS for logged migration distance using Hail, a software tool for genetic analysis (https://hail.is/). GWAS runs millions of regressions to investigate how the variation of an outcome variable is associated with each genetic variant. Following conventions in the GWAS literature (e.g., Loh et al. 2015), we removed SNPs with a missing call rate greater than 0.01, a minor allele frequency less than 0.01, and a Hardy–Weinberg equilibrium test p value <1.0e–6. To control type I error, we used genomic control estimates (i.e., intercept) in linkage disequilibrium score (LDSC) regression (Bulik-Sullivan, Loh et al. 2015) to inflate standard errors for GWAS associations. Next, we calculated genetic correlations between our GWAS findings for logged migration distance and GWAS findings for 53 traits from other published studies. These traits include some genetic components that are direct (i.e., operate through inherited genetic variants) and some that are indirect (i.e., operate through the family environment) (Wu et al. 2021). This decomposition of genetic correlation allowed us to assess underlying mechanisms for the association between genetic variants and migration.9 We used LDSC to estimate genetic correlations (Bulik-Sullivan, Finucane et al. 2015) and adjusted the significance cutoff using Bonferroni correction to account for multiple testing. GWAS summary statistics for the 53 traits are shown in Table A1 (tables and figures designated with an “A” are in the online appendix).
Results
Main Findings
Illustrating GWAS results, Figure 2 shows a Manhattan plot of 1,858 SNPs from 21 independent loci that reach the genome-wide significance level (p < 5.0e–8); genetic researchers use this very low p-value threshold to adjust for the hundreds of thousands of results estimated to control for false positive findings.10 These SNPs are also associated with several SES and health outcomes. For example, outcomes associated with the SNP with the lowest p value in our migration analysis include educational attainment (Davies et al. 2016), cognitive performance (Lee et al. 2018), and anorexia nervosa (Peyrot and Price 2021). A measure of overall genetic contribution (SNP heritability) to logged migration distance is 0.0629 (standard error [SE] = 0.003): 6% of the variation is from commonly measured genetic variation.11
Figure 3 and Table A2 summarize genetic correlations (rg) between logged migration distance and 53 traits. We find a strong positive genetic correlation between logged migration distance and educational attainment (rg = 0.886); this level of genetic correlation is among the highest reported in the literature, exceeding that for cognitive performance. Further, our results of direct (i.e., own genetics) and indirect (i.e., parental/family genetics) components demonstrate strong genetic correlations with both dimensions but indicate a stronger genetic correlation with the indirect component (rg = 0.856) than with the direct component (rg = 0.624).
The results also reveal significant negative genetic correlations with several health-related issues, including coronary artery disease, Type 2 diabetes, major depressive disorder, neuroticism, and attention-deficit/hyperactivity disorder. Among fertility-related outcomes, genetic correlations with age at first birth and age at menopause are positive; the genetic correlation with the number of children is negative. Finally, several genetic correlations with health outcomes were unanticipated. Specifically, positive genetic correlations with anorexia nervosa, autism spectrum disorder, and bipolar disorder suggest higher genetic risks of these mental disorders among migrants relative to nonmigrants.12
Robustness Checks
Additional GWAS and Genetic Correlation Analyses
We conducted several robustness checks to assess the impacts of UKB's sampling design on our findings. Specifically, we investigated (1) the consequences of the overrepresentation of well-educated UK residents in the UKB (Munafò et al. 2018), (2) the impacts of the potential oversampling of health professionals,13 and (3) the effects of sampling selection based on migration distance. To test the robustness of our findings on the first issue, we reimplemented GWAS while excluding those with professional education and college graduates. Our goal was to reduce the data's cases of migration for pursuing higher education. Similarly, we also ran GWAS excluding health professionals to eliminate the impacts of health professionals' migration into places around the medical assessment centers.14 Regarding the third issue, we split the 22 catchment areas into two groups on the basis of place-specific median migration distance and performed GWAS separately for these two groups. A key sampling feature of the UKB is that only people living close to one of 22 assessment centers were asked to participate. This feature may truncate some internal migration distances in the full UK population. We split the data based on the place-specific migration distance distributions and examined the similarity of the results between the two subsamples. That is, we further truncated migration distance in each subsample and explored whether doing so would shape our results to gauge whether the unknown truncation due to the UKB sampling strategy is likely to have affected our main results.
Figures A4–A7 show that genetic correlations between migration distance and 53 traits are generally consistent across the ways we select the analytic sample. The correlation coefficient of genetic correlations between the sample with and without professional or college education is 0.98. Likewise, the correlation coefficient of genetic correlations between the sample with and without health professionals is 0.99. Further, the correlation coefficient of genetic correlations between our two subsamples based on migration distance is 0.97. Regression slopes in Figures A5–A7 are close to 1, ranging from 0.943 (SE = 0.022) to 1.165 (SE = 0.031). Hence, these results do not present empirical evidence that sampling selection issues in the UKB substantially alter our findings.
Within-Sibling Analyses
Differences in genetic ancestries have unignorable impacts on GWAS findings when genetic ancestries affect genetic variants and an outcome of interest. To account for this population stratification issue, we restricted the analytic sample to individuals of European ancestries and included genetic principal components in GWAS. However, principal components might not fully account for population stratification (Howe et al. 2022). To further eliminate the impacts of population stratification on our main findings, we conducted within-sibling GWAS, which compares genetic variants between siblings. This approach ensures that differences in genetic variants are not due to population stratification because siblings share genetic ancestries (Raffington et al. 2020).
Although within-sibling GWAS effectively reduces the threat of population stratification, this approach has limitations. First, within-sibling GWAS includes only UKB respondents whose siblings also participated in the UKB and therefore has a much smaller sample size (16,220 pairs and 32,440 individuals) than population GWAS. Second, within-sibling GWAS accounts for not only population stratification but also any other shared traits between siblings, such as the family of origin's socioeconomic background and childhood neighborhood environments. Because these shared traits explain some variance of an outcome measure, the remaining variance that genetic predispositions can explain is small in within-sibling GWAS. These limitations result in larger standard errors for genetic correlations in within-sibling GWAS than for population GWAS. Therefore, we primarily focused on sign concordance of genetic covariances for associations between genetic variants correlated with migration distance and other phenotypes. Finally, differences between population and within-sibling GWAS should be interpreted with caution. Consistent signs between population and within-sibling GWAS support our main findings with the least threat of population stratification. However, different signs do not imply that population stratification induces biased estimates in population GWAS because the within-sibling comparison accounts for all shared traits between siblings, including but not limited to population stratification.
Results of sign tests are presented in Table 1. Among the 33 traits that reached the 5% significance level after Bonferroni correction in the genetic correlation analysis, genetic covariances of 28 traits (85% of the 33 traits) from within-sibling GWAS have signs consistent with those from population GWAS (p = , binomial test).15 These results suggest that significant genetic correlations for migration distance with SES and health in the main analyses are generally robust to population stratification.
By contrast, the signs of genetic covariances of the other five traits from within-sibling GWAS differ from those in population GWAS. These traits are autism spectrum disorder, body mass index, HDL cholesterol, height, and Type 2 diabetes. Although these differences are probably due to population stratification in population GWAS, we cannot reject an alternative scenario that other shared traits between siblings alter the signs of genetic covariances of these characteristics. Furthermore, these genetic covariances are not statistically significant even without Bonferroni correction, suggesting that the inconsistent signs may result from low statistical power and imprecise estimation. Because most of the traits show consistent directions of genetic covariances across within-sibling and population GWAS, we conclude that within-sibling GWAS does not present empirical evidence casting doubt on our main findings in the genetic correlation analysis.
Analyses With U.S. Data
To further validate our findings, we conducted similar analyses with different samples. However, we are unaware of datasets that provide a migration distance measure and genetic data with a sufficiently large sample size to implement a GWAS and genetic correlation analysis.16 As an alternative, we used our GWAS findings to create a migration distance polygenic index (PGI): a summary measure representing cumulative correlations of independent genomic loci of small correlations with migration distance. We then assessed how migration distance PGI is associated with migration distance, health, SES, and skills in a U.S. population.17 On the basis of our main findings, we expected that those with genetic variants correlated with longer migration distances (i.e., higher migration distance PGIs) move longer distances, are healthier, and have more socioeconomic resources than those with lower migration distance PGIs. Such results would provide additional support validating the findings in the GWAS and genetic correlation analysis.
We analyzed data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) and the Health and Retirement Study (HRS), which collected genetic data. These datasets allowed us to construct migration distance PGIs. Further, because these datasets cover different age and birth cohorts, we could assess whether the associations of the migration distance PGI with health, SES, and skills depend on these demographic characteristics. We used Add Health Wave I to explore the association between the migration distance PGI and phenotypic traits among adolescents.18 Additionally, we analyzed Add Health Wave IV and the 2012 round of HRS, which collected data for young adults and older people, because these survey waves collected completed genetic data.
Migration distance is measured by the distance of locations between Waves I and III in Add Health.19 We created health, SES, and skill measures by using Add Health Waves I and IV and the 2012 round of HRS. Our health outcome measures include self-reported health, height, body mass index, and depression (assessed with the Center for Epidemiologic Studies Depression Scale). For Wave I respondents, we also used picture vocabulary test scores and grades in English, math, social studies, and science to measure respondents' abilities and skills. With Add Health Wave IV and HRS data, we measured respondents' SES as educational attainment and log-transformed individual and household income (Table A4 details the operationalizations of these outcome measures).
Table 2 summarizes the results of the association between migration distance PGI and the location distance between Add Health Waves I and III. Net of age, sex, and the first 20 principal components, a higher migration distance PGI is significantly associated with a longer migration distance. This finding suggests that genetic variants correlated with a longer migration distance among the UKB participants are also associated with a longer migration distance among the Add Health participants.
We then examined the associations with health, SES, and skills. Table 3 demonstrates that net of age, sex, and the first 20 principal components, a higher migration distance PGI is associated with better health and higher skills and SES, regardless of age groups and birth cohorts. These results are consistent with the positive genetic correlations between migration distance and health, SES, and skills in the main findings. Overall, the additional analyses with two U.S. datasets suggest that our UKB results generalize to other contexts.
Discussion
This study provides novel assessments of migration selection using genetic analytic tools. We found many genetic variants associated with logged migration distance. Because genetic variants are not affected by migration, these results provide direct evidence of genetic migration selection: those with certain genetic variants are more likely to migrate than those without these variants. These findings support Palloni and Arias' (2004) speculation of the presence of migration selection at the genetic level.
Further, we found that genetic variants correlated with migration distance are also associated with many dimensions of SES and health outcomes. These results imply that migration selection at the genetic level is tied to many other characteristics. The positive genetic correlations with educational attainment, income, and cognitive performance suggest that skilled individuals are more likely to migrate and that long-distance migrants pursue better educational and occupational opportunities. These results show that the Roy model (Borjas 1987; Borjas et al. 1992) and the law of migration (Ravenstein 1885) have implications for the genetic profiles of migrants compared with nonmigrants. In the case of educational attainment–related genetics, our decomposition of genetic correlations into direct and indirect components provides some insights into mechanisms: genetics correlated with family environments related to higher educational attainment contribute to genetic migration selection more than genetics correlated with own skills and abilities for successful educational attainment.
One major advantage of genetic measures is the wide coverage of genetic correlations, especially with health outcomes. This feature allows us to examine a broader set of outcomes that are rarely available in most datasets. Indeed, this wide coverage in genetic correlations provides several important theoretical implications for the healthy migrant hypothesis (Jasso et al. 2004; Palloni and Arias 2004; Palloni and Morenoff 2006). First, the healthy migrant hypothesis is valid for health conditions and risks that people may not perceive before migration. For example, genetic correlations with chronic diseases usually appearing at middle or older ages (e.g., coronary artery disease) uncover the likelihood of migration selection in latent health risks, given that internal migration in the United Kingdom is concentrated at young adult ages (Bernard et al. 2016). By contrast, our findings also provide nuance to the healthy migrant hypothesis and suggest the need for additional research. Specifically, significant positive genetic correlations between migration distance and bipolar disorder and anorexia nervosa suggest that those with higher genetic risks of these mental conditions are more likely to migrate. These genetic correlations are counterintuitive to the theoretical explanation that healthy individuals (in this case, those with lower risks of mental disorders) are more likely to migrate. One possible interpretation for these unanticipated results is that those with a high genetic risk of these mental conditions may be skilled individuals. This scenario is consistent with our genetic correlations between educational attainment and these mental disorders, as well as prior epidemiological research (MacCabe et al. 2010; Tiihonen et al. 2005). Because skilled individuals are more likely to migrate, those with genetic variants correlated with a higher risk of these mental disorders may also be more likely to migrate. Overall, these findings lead us to hypothesize that the healthy migrant hypothesis may not apply to some mental conditions, which are positively correlated with SES.
We acknowledge several limitations in this study. First, UKB is not a nationally representative survey and does not provide sampling weights to adjust the unique sampling strategy. Although we conducted many robustness checks to assess the potential impacts of sampling selection, subsequent research with large, nationally representative data can further explore the consequences of this limitation. Second, we excluded respondents of non-European ancestries to increase ancestral homogeneity. This exclusion limits the generalizability of our findings of genetic migration selection. Finally, the genetic correlation analysis does not fully reveal the underlying mechanisms of migration selection. The result of a higher genetic correlation with the indirect component of educational attainment than with the direct component provides insights into the mechanisms, but we remain uncertain about what specific family or nurturing environments contribute to migration selection by educational attainment.
Despite these limitations, our study makes valuable contributions to the study of migration selection, which is typically constrained by data availability. By leveraging the unique feature of genetic measurements, we documented the presence of migration selection at the genetic level. Although we showed that genetic migration selection is generally consistent with theories and empirical evidence in migration selection, we also found genetic migration selection counter to our theoretical expectation. These unanticipated results generate novel hypotheses, and subsequent tests of the hypothesis will shed light on understudied aspects of migration selection.
Acknowledgments
The authors acknowledge the use of the facilities of the Center for Demography of Health and Aging (P30 AG016266) and the Center for Demography and Ecology (P2C HD067873) at the University of Wisconsin–Madison. We thank the University of Wisconsin's Social Genomics Research Group members for helpful comments. An earlier version of this paper was presented at the 2021 annual meeting of the Population Association of America and at the National Institute on Aging and the 2021 Integrating Genetics and Social Sciences Conference (R13-AG062366). This research was conducted using the UK Biobank Resource under application number 57284 (http://www.ukbiobank.ac.uk/). GWAS summary statistics for (logged) migration distance are available at http://qlu-lab.org/data.html. Q. Lu and J. M. Fletcher codirected this research.
Notes
Some of these studies showed negative migration selection on self-reported health (Bostean 2013; Rubalcava et al. 2008).
For example, in the United States between 1880 and 1990, Black migrants from the South to the North had higher educational attainment than Black nonmigrants in the South (Tolnay 1998).
Some studies used biomarkers (Beltrán-Sánchez et al. 2016; Crimmins et al. 2005; Riosmena et al. 2013; Rubalcava et al. 2008), but their biomarker variation was limited.
A GWAS is a hypothesis-free scan of the genome that estimates statistical associations between each genetic location (variant) and an outcome of interest. Estimates from a GWAS can then be used in several types of downstream analysis. Genetic correlation analysis compares the similarity of GWAS estimates for one outcome (in this case, migration) with GWAS estimates from other outcomes (here, SES and health) to assess an overall genetic correlation among the outcomes. Alternatively, GWAS estimates can be combined into a polygenic index (PGI) at the individual (respondent) level. A few studies have compared educational attainment PGIs between migrants and nonmigrants (Abdellaoui et al. 2022; Abdellaoui et al. 2019; Belsky et al. 2019; Belsky et al. 2016), but we are unaware of research performing GWAS for migration outcomes.
Migration and SES (and probably health outcomes) are distal phenotypes, suggesting that proximate variables mediate the associations of genetic traits with these outcomes. However, the presence of mediating factors does not eliminate the value of this study’s unique contributions discussed in the following passage in this paragraph.
For example, we randomly selected individuals among those in second-degree relative dyads by using KING (https://www.kingrelatedness.com/) to calculate the genetic relatedness of all UKB respondents, as is standard in genetic analysis.
SNP is a genetic variation in a single base pair at a specific location in DNA.
Results of non-log-transformed migration distance are similar to our main findings (see section 2 of the online appendix).
One way to evaluate underlying mechanisms is to conduct a mediation analysis. However, in our case, a conventional mediation analysis is difficult to implement because we do not know the timing of migration. We therefore assess underlying mechanisms by decomposing genetic correlations into direct and indirect components.
See Figure A2 for the quantile–quantile plot.
To test potential mechanisms of the association between genetic variants and migration distance, we assessed whether sex and birth cohort (the 1940s, 1950s, and 1960s cohorts) moderate this relationship. We found no empirical evidence that these axes of social stratification moderate the relationship between genetic variants and migration distance (results available upon request).
Similarly, Figure A3 shows that genetic correlations between educational attainment and these health measures are also positive.
We expect health professionals to be overrepresented in the sample because the UKB recruited individuals living close to one of 22 medical assessment centers.
We identified health professionals using an employment history question in the UKB inquiring about paid jobs and apprenticeships held, in alignment with the international classification of health workers provided by the World Health Organization (https://www.who.int/publications/m/item/classifying-health-workers). We then excluded individuals categorized as health professionals and health associate professionals. Table A3 summarizes the job code in the UKB and the occupational classification.
Because there are positive and negative signs, we expect that half of the genetic covariances from within-sibling GWAS show inconsistent signs with the genetic covariances from population GWAS if the signs of genetic covariances from within-sibling GWAS are random. Therefore, the null hypothesis in this binomial test is that the probability that the signs of genetic covariances from within-sibling GWAS are consistent with those from population GWAS is 0.5.
For example, Add Health provides migration distance information and genetic data, but the sample size (N = 4,508 after quality checks) is too small to run a GWAS and genetic correlation analysis.
To create the migration distance PGI, we used the results of the logged migration distance GWAS of UKB data. Following the standard procedure to construct PGI, we clumped SNPs using Phase 3 European samples from the 1,000 Genomes Project as linkage disequilibrium reference. The linkage disequilibrium window size and a pairwise R2 threshold were set at 1 megabase (Mb) and 0.1, respectively. We did not use p value thresholding for variant selection. We calculated migration distance PGI with PRSice-2 software (Choi and O’Reilly 2019) and standardized with a mean of 0 and a variance of 1 in downstream analyses.
Add Health Wave II also provides data for adolescents, but the sample size is somewhat smaller in Wave II than Wave I.
Add Health also provides the location distance between Waves I and II and between Waves II and III. However, only small variations in the location distance exist between Waves I and II because Wave II collected data one or two years after Wave I. Further, the location distances between Waves II and III are similar to those between Waves I and III, but Wave II has fewer observations than Wave I. Therefore, we used the location distance between Waves I and III.