We test the effectiveness of a link-tracing sampling approach—network sampling with memory (NSM)—to recruit samples of rare immigrant populations with an application among Chinese immigrants in the Raleigh-Durham area of North Carolina. NSM uses the population network revealed by data from the survey to improve the efficiency of link-tracing sampling and has been shown to substantially reduce design effects in simulated sampling. Our goals are to (1) show that it is possible to recruit a probability sample of a locally rare immigrant group using NSM and achieve high response rates; (2) demonstrate the feasibility of the collection and benefits of new forms of network data that transcend kinship networks in existing surveys and can address unresolved questions about the role of social networks in migration decisions, the maintenance of transnationalism, and the process of social incorporation; and (3) test the accuracy of the NSM approach for recruiting immigrant samples by comparison with the American Community Survey. Our results indicate feasibility, high performance, cost-effectiveness, and accuracy of the NSM approach to sample immigrants for studies of local immigrant communities. This approach can also be extended to recruit multisite samples of immigrants at origin and destination.
Demographers have long relied on and driven innovations in survey methods that facilitate research into core demographic concepts (e.g., the World Fertility Surveys). The study of migration is no exception (e.g., the Mexican Migration Project, MMP). In recent years, specialized regional surveys dedicated to understanding migration have generated new evidence, study designs, and measurement techniques (e.g., the Latin American Migration Project, LAMP; the Migration between Africa and Europe, MAFE; the Mexican Family Life Survey, MxFLS; the NIDI–Eurostat Push-Pulls International Migration project). These efforts complement attempts to measure migration in population-monitoring surveys (e.g., the American Community Survey, ACS); in administrative, commercial, or other novel data sources (Caballero et al. 2018; Cesare et al. 2018); or through ethnographic or in-depth interview-based studies of migrant communities (Hagan 1998; Lee and Zhou 2015; Mountz and Wright 1996; Parrado et al. 2005).
In the United States, changes in immigrants' profiles and spatial distribution have increased the importance of these multifaceted attempts, but they also raise new challenges for researchers interested in migration and its dynamics. Immigrants in the United States are increasingly heterogeneous in their origins and demographic profiles. Among recent cohorts, Asian immigrants outnumber Mexican immigrants, with China and India representing the top two origin countries of Asian immigrants (Budiman 2020). Immigration trends from Central America (Cohn et al. 2017) and Africa (Elo et al. 2015) further increase the complexity of immigration. Within migrant sending countries, more subnational areas are sending large numbers of migrants to the United States, and compositional shifts are evident in the urban–rural gradient of migrant sending communities (Liang and Morooka 2004; Riosmena and Massey 2012). Recent cohorts of Mexican immigrants are also more likely to work outside agriculture (Garip 2012, 2016), and more than half of Asian immigrants who arrive as adults have at least a bachelor's degree (Pew Research Center 2013). This heterogeneity strongly suggests the need to collect sufficiently large samples by country of origin rather than relying on overly broad race and panethnic categories, such as “Asian” or “Hispanic.”
Additionally, immigrants in the United States are increasingly dispersed: more live in suburban areas (Lee and Kye 2016; National Academy of Sciences, Engineering, and Medicine (NAS) 2015) and new destination areas (Flippen and Kim 2015; Sakamoto et al. 2013), and fewer are concentrated in ethnic neighborhoods in the largest cities of traditional gateway areas of California, Texas, Illinois, and the Northeast. These new patterns of residential settlement reduce segregation within metro areas and across regions, with important implications for our understanding of immigrant incorporation and the role of social networks in this process (NAS 2015). The spatial dispersion of immigrants also implies that the foreign-born represent smaller proportions of local populations, making the sampling of rare immigrant population groups with conventional place-based household sampling strategies more challenging and costly. These factors limit demographers' ability to study specific immigrant groups in new destinations, where rapidly growing local immigrant communities remain understudied (Parrado et al. 2005). Compounding these concerns are high nonresponse and noninterview rates in contexts with harsh immigration policies, especially those requiring legal documentation (Van Hook et al. 2014).
Beyond the need for new approaches to collecting information, evidentiary gaps in the understanding of fundamental migration processes—such as the decision to migrate or the long-term dynamics of social and economic incorporation—have required a broad range of research designs that include longitudinal and multisited perspectives. These approaches capture the migration process and enable analyses of the linkages between origins and destinations in a migration system as well as comparisons of movers and stayers (Beauchemin 2014; Billsborrow 2016; Fawcett and Arnold 1987). These advances have been accompanied by critical efforts to document migrant networks, which are an important part of the meso-level social structure that connects migrants at destination to individuals at origin and fosters more migration (Boyd 1989; de Haas 2010; Massey 1990). Researchers agree that migration is a network-related social process (Bashi 2007; Garip and Asad 2016) and that migrants create transnational social fields that connect origin and destination through their activities and relationships (Levitt and Glick Schiller 2004; Lubbers et al. 2020; Waldinger 2013). However, the measurement of social networks relevant to migration and the reliance on networks to recruit population-representative samples of migrants have not been a privileged domain in traditional surveys of migration.
Here, we illustrate the application of a network sampling methodology to study migration—network sampling with memory (NSM) (Merli et al. 2016; Mouw and Verdery 2012). NSM relies on migrants' social networks to efficiently recruit samples of rare populations of migrants. It incorporates a network-based probability sampling framework, a peer referral process that encourages survey participation, and an efficient and cost-effective sample recruitment strategy that reduces the screening costs associated with sampling rare populations.1 Further, this approach collects social network data that can improve the understanding of the mechanisms shaping migration decisions and preserving coethnic ties across interconnected local, origin, and global networks.
We apply this method to recruit a sample of a local immigrant community in a U.S. destination for the Chinese Immigrants in Raleigh-Durham (ChIRDU) Study. We evaluate NSM performance for the feasibility and practicality of collecting the required network data within a survey instrument that focuses on migration and immigration experiences. We illustrate the potential of the ancillary network data to investigate immigrant incorporation and transnationalism and the utility of these data to assess the role of peer referral embedded in NSM's link-tracing design in achieving high response rates. To assess the accuracy of the sampling method, we compare the characteristics of the ChIRDU Study sample recruited with NSM techniques with those of the ACS. The ACS is an annual survey that collects information on the demographic, social, and housing characteristics of 1% of U.S. households. We conclude with a discussion of the power of NSM and the associated network data collection for studying migration in the twenty-first century.
Networks and Migration
A migrant network perspective is rooted in the original concept of chain migration, whereby the decision to migrate is a function of a potential migrant's individual characteristics and a broader system of social linkages between migrants and nonmigrants connecting origin and destination (Boyd 1989; MacDonald and MacDonald 1964). Migration researchers have noted that friends and relatives are key sources of information, assistance, and motivation, which affect individuals' propensity to migrate (Boyd 1989; Curran et al. 2005; de Haas 2010; Dolfin and Genicot 2010; MacDonald and MacDonald 1964; Massey et al. 1993; Munshi 2020; Palloni et al. 2001; Tilly and Brown 1967). Scholars also argue that social network dynamics play a central role in migrants' social and economic incorporation in their host societies (Lancee 2012; Nee et al. 1994; Ryan et al. 2008) and that the corresponding set of ties connecting migrants to their origin communities is an important indicator of transnationalism (Haug 2008; Levitt and Glick Schiller 2004; Lubbers et al. 2020). Nonetheless, a recurring challenge in studying the role of networks in migration is the need to collect high-quality network data that measure variably meaningful links among migrants, potential migrants, and nonmigrants in origin and destination communities (Boyd 1989; Fawcett 1989; Haug 2008; Lubbers et al. 2020).
Several large community-based demographic studies—such as the MMP (Fussell and Massey 2004; Massey et al. 1987), the LAMP (Donato et al. 2010), the Nang Rong Projects (Jampaklay et al. 2007; Korinek et al. 2005), and the MAFE (Beauchemin 2015, 2018)—have contributed substantially to the collection and analyses of data on networks and migration. The strength of these studies lies in their use of survey methods to capture individual migration histories with which to compute retrospective measures of social networks connecting kin, as well as shared households and communities, in sending and receiving locations. As a significant innovation, the MAFE and Nang Rong Projects also collected data on egocentric personal networks connecting migrant respondents to friends and kin at origin using name generators motivated by specific migration-related questions and, in the Nang Rong case, augmented by a longitudinal study design (Rindfuss et al. 2004).
One useful illustration of the progress made by the MMP, LAMP, Nang Rong Projects, and MAFE is their advancement of empirical evidence in support of the theory of cumulative causation in migration (Baizan and Gonzalez-Ferrer 2016; Côté et al. 2015; Durand et al. 2001; Fussell and Massey 2004; Garip and Asad 2016; Massey and Espinosa 1997; Massey and García-España 1987; Massey and Zenteno 1999; Palloni et al. 2001; Taylor 1986, 1987; Toma 2016). This theory explains how migration emerges, builds momentum, and is sustained through the network-based institutions that facilitate the flow of resources and information about destination opportunities, reduce the risks and costs of migration, and fuel imagination about the possibilities of migration (Massey 1988; Massey, Arango et al. 1994).
Despite considerable progress in understanding the role of networks in migration, substantial questions remain. De Haas (2010) suggested that the literature on cumulative causation neglects the contingent nature of networks, which depend crucially on network members' willingness to help. He argued that this willingness explains why only some pioneer migrants initiate a network-driven system of self-sustaining migration flows (e.g., MacDonald and MacDonald 1964) and why well-established migration systems decline over time as destination communities become disengaged from ties and connections to the origin. Similarly, others have argued that migration networks should not be thought of as an “undifferentiated resource” but, instead, as structured by the labor demand of employers (Krissman 2005) or reflecting fundamental regional, class, and gender differences (Curran et al. 2005; Curran and Rivero-Fuentes 2003; Guarnizo et al. 1999; Toma and Vause 2014). Yet others have stressed the importance of better understanding the mechanisms behind network effects (Dolfin and Genicot 2010; Garip and Asad 2016), noting the downside of social capital caused by potentially exclusionary and exploitative network ties (de Haas 2010; Gold 2005; Hill 2018; Hoang 2015; Menjívar 2000; Portes 1998). Finally, researchers have called attention to the fact that migrants may need to reconstruct their networks because the disruptive act of migration can change the composition of networks postmigration (Lubbers et al. 2010; Ryan 2011) and constrain migrants' social capital (Lee 2015).
Some of these critiques reflect how the limited collection of social network data has constrained demographers' understanding of the heterogeneous role of social networks in migration. Although previous research has focused on the types and nature of migrants' networks that fuel migration behaviors (Curran et al. 2005; Curran and Rivero-Fuentes 2003; Davis et al. 2002; Palloni et al. 2001), the literature is dominated by proxy measures of migrant networks. Social ties to previous migrants have been operationalized through aggregate measures of migrant prevalence in the sending community (e.g., Massey and Aysa-Lastra 2011; Massey, Goldring, and Durand 1994; Taylor et al. 2003; Zhao 2003) or through counts in household rosters that miss other relevant social contexts (e.g., friends, family, colleagues) (Palloni et al. 2001). Similarly, proxy measures are often used to determine tie strength. For example, village members may be considered weak ties, whereas family or household members are considered strong ties (Garip 2008); alternatively, blood proximity or the generation of the family tie may indicate strong ties, whereas friendships may indicate weaker ties (Liu 2013). Absent from most migration surveys are aspects of social ties that shape migration decisions, such as strength (amount of time spent together, level of closeness, and communication frequency), type of interaction (mutual confiding and help/services rendered), and flow (resources or information shared). Reflecting on the limitations of existing data, Haug (2008:600) argued that “what is lacking is an elaborated method to collect data on social networks of migrants at relatively low cost in order to be able to investigate network structures in migration contexts.”
A fuller understanding of how migrant networks shape migration and incorporation requires more granular network data. Lubbers et al. (2010) and Vacca et al. (2020) illustrated the benefits of detailed network data on migration: they collected small-scale surveys of personal networks with up to 45 kin and friend contacts per respondent, enabling sophisticated analysis of the temporal dynamics of network change and the impact of features of immigrants' network structure (e.g., density, composition, network position) on their social incorporation. Although larger surveys must consider a trade-off between the costs (time, respondent burden, and nonresponse) involved in eliciting many network contacts and the amount of information sought per contact, these studies suggest that it is possible to collect network data that go beyond the kinship networks available in the main demographic migration studies (Lubbers et al. 2020).
Link-Tracing Designs for the Recruitment of Migrant Samples: Network Sampling With Memory
Link-tracing designs, in which new respondents are referred by previous respondents, hold several advantages for the study of migration and immigration experiences. Evidence of a friend's survey participation may reduce potential respondents' survey confidentiality concerns, perhaps making them more likely to participate. Peer referral to promote community survey participation may be particularly important for hard-to-survey and hidden populations, such as undocumented migrants; in comparison to surveys where the first point of contact is an unknown researcher, it can increase the survey credibility and the willingness of reluctant potential respondents to participate.
Samples recruited with link-tracing designs, often referred to as snowball or chain-referral samples, are generally regarded as convenience samples because of the absence of a sampling frame and the lack of a probability-based sampling strategy (Biernacki and Waldorf 1981; Sudman and Kalton 1986). Although such methods have important uses in qualitative research, allowing for case study logic rather than sampling logic (Small 2009), they are at odds with orthodox demographic sensibilities that demand generalization from samples to populations. Respondent-driven sampling (RDS) (Heckathorn 1997, 2002) has sought to make chain-referral sampling more generalizable by tracing the links in the social network of hidden populations and proposing an inference strategy for estimating population parameters (Salganik and Heckathorn 2004; Volz and Heckathorn 2008). This inference strategy relies on assumptions about the network structure and the peer-driven process of sample recruitment over the network. When these assumptions are not met, RDS estimates may be far from population parameters, and RDS estimators exhibit large sampling variance in mean estimates and large design effects.2
NSM (Mouw and Verdery 2012) improves on RDS: it directly collects detailed network information as part of the survey through multiple waves of respondents' nominations and referrals (i.e., provision of contact information) and the recruitment of new respondents to improve the sampling efficiency and precision of link-tracing designs. Sample recruitment starts with the identification of a limited number of initial respondents (or seeds) known to the researcher. Seeds and subsequent waves of respondents are asked to nominate their social contacts who are target population members (referred to as alters) by providing minimally identifying information. The amount and type of minimally identifying information sought from respondents about their alters may include initials of surnames, full first names, last four digits of a cell phone number, or any other sufficiently detailed information. Whereas peers select and recruit the next wave of respondents in RDS, researchers control the sampling process in NSM. Respondents are asked to refer their nominated alters to the researcher, who will select and contact a subset for a survey interview.
NSM uses the information that respondents provide about their nominated alters (nodes in the network) to reveal the network and employs a sampling algorithm to determine which of those currently nominated should be contacted for a survey interview. The algorithm directs the sampling process to spread through the network by placing a higher sampling probability on nodes that were nominated less frequently by previously sampled respondents. NSM seeks to balance two competing interests: finding all members of the social network and sampling from the social network to yield representative insights. As the sampling progresses, the reconstructed network mapped from the minimally identifying information on nominated alters increasingly resembles the true network that links members of the target population. More information on the NSM sampling process can be found in section 1 of the online appendix.
Research using simulations to evaluate NSM's statistical performance has yielded three noteworthy results. First, in simulated sampling based on known networks from Add Health and Facebook, NSM reduces design effects by 98.5% relative to RDS, while generating asymptotically unbiased population estimates (Mouw and Verdery 2012). Thus, at a given sample size, one can achieve the same or better statistical precision (and hence narrower confidence intervals) using NSM. Because design effects are a function of sample size, these results also imply that researchers using NSM need samples only in the hundreds rather than the thousands, as would be required by RDS for similarly precise and accurate estimates (Goel and Salganik 2010; Mouw and Verdery 2012; Spiller et al. 2018; Wejnert et al. 2012). Second, NSM is practical: the number of nominations and referrals required is not large. Mouw, Verdery et al. (2014) showed with simulations using the largest Add Health network that collecting at least six nominations and three referrals per sampled respondent does not compromise NSM's sample precision, reducing sampling efficiency only slightly relative to simple random sampling. Finally, NSM collects explicit social network data that can be concurrently used to test hypotheses about substantive migration topics (e.g., Merli et al. 2016).
Less is known about how NSM performs in the field. In what follows, we evaluate the first field application of NSM and its data collection protocol.
The Context: Chinese Immigrants in the Raleigh-Durham Area of North Carolina
Asian immigrants are dispersing across many new destinations in the United States. In particular, the South is experiencing some of the fastest immigrant growth rates but lacks established coethnic settlements typical of immigration gateways and other, more traditional immigrant destinations (Flippen and Kim 2015). In North Carolina, Asians are now the fastest-growing population group in the state, with a documented increase of 5.1% between 2016 and 2017 (Tippett 2018). Contrary to the negative Asian–White wage difference observed elsewhere in the United States, Asian immigrants in the South have higher levels of education and hourly wages than Whites. Compared with Asian immigrants in other U.S. regions, those in the South have higher levels of education, hourly wages, and employment in professional and technical occupations (Sakamoto et al. 2013). The Raleigh-Durham area of North Carolina—which comprises Orange, Durham, and Wake counties—is home to three major universities, large university health systems as well as many IT and pharmaceutical companies, and a rapidly growing population of Chinese and Indian immigrants (Tippett 2018). However, Chinese immigrants account for only 1% of the area's residents and are spatially dispersed, posing challenges for the recruitment of sufficiently large samples using conventional probability sampling. Moreover, Chinese immigrants in this area provide an ideal population to evaluate the accuracy of NSM samples in comparison with the ACS because they are highly educated and more likely to be adequately covered by the ACS.
The ChIRDU Study: Sample, Survey, and Data
The ChIRDU Study was conducted between March 2018 and January 2019. Eligibility for participation included being born in mainland China, Taiwan, or Hong Kong; being age 18 or older; and currently residing in Durham, Orange, or Wake counties. Individuals on long-term temporary visas, such as international students, were not eligible. Sampled respondents who agreed to participate were interviewed in person, by phone, or via the web.3 As detailed in section 2 of the online appendix, we determined that a sample size of 200 respondents per survey mode is sufficiently large to detect significant differences between key demographic characteristics of the NSM samples and the ACS sample. Institutional review boards of Duke University and the University of North Carolina at Chapel Hill conducted an ethical review of the study.
The study's in-person and phone arms were launched first, in March 2018. Four bilingual field interviewers who were independently recruited from the Raleigh-Durham area and had been trained in interviewing techniques conducted the interviews in Chinese or English (depending on respondent's preference). Recruitment started with the selection of seven seed respondents known to the project's principal investigators or to the interviewers. Seed selection featured stratification by socioeconomic status, gender, county of residence, and birthplace. All seven seeds were interviewed in person to establish rapport and to motivate them to recruit their network alters into the study. The in-person and phone survey modes proceeded in parallel such that the same interviewer used the same mode to interview referrals from each mode.
The web survey was launched in June 2018. It was preceded by a pilot to test core content, response missingness, and usability issues. Three new seed respondents, stratified by gender and county of residence, were administered the web survey and were asked to nominate their network alters. Email survey invitations to alters selected for participation contained a unique link to a Qualtrics survey. When those invited clicked on the survey link, a web page opened with the informed consent, request for approval of participation, responses to eligibility questions, and a request to complete the survey in the preferred language (simplified Chinese used in mainland China, traditional Chinese used in Taiwan and Hong Kong, or English). The web survey design displayed logos of the authors' institutions on each page, appearing professional and engaging.
All respondents, regardless of survey mode, were invited to complete a series of questionnaires. An individual questionnaire captured respondents' sociodemographic characteristics, household composition, education, job and migration histories, future migration intentions, earnings, legal status, physical and psychological well-being, experiences with discrimination, and dimensions of acculturation. Upon completion of the individual questionnaire, respondents were asked to complete a first network roster (Network Roster A) in which they provided minimally identifying information for six alters by answering the following question: “Please provide the first initial of the last name, the first name in Pinyin and Chinese character(s) and, where applicable, the English name of 6 people you know who were born in China, Taiwan, or Hong Kong, who are 18 or older and who reside in Durham, Orange or Wake county. These are people whose name you know and who know yours and with whom you might stop and talk at least for a moment if you ran into them on the street.” This question defines the target population of our sampling approach: Chinese immigrants living in the Raleigh-Durham area. This type of question, referred to as a name generator, is a common and well-studied method of eliciting socially relevant peers with reasonably high levels of validity despite some biases toward nominating peers with whom respondents interact more frequently (Campbell and Lee 1991; Marin 2004; Marsden 1993; McCarty et al. 2019; Straights 2000). The name generator in Roster A was followed by name interpreter questions aimed at collecting alter attributes (gender; education; country of birth; and, if born in mainland China, province of origin) and attributes of the relationships between the ego (respondent) and the alter, with a focus on the mutually exclusive relationship type (kin, friends, neighbors, or coworkers)4 and the strength of the tie (determined by mode, duration, and communication frequency). Respondents were then asked to provide referral information (email, phone, or WeChat) of each alter. Information on alters nominated in Roster A was shared globally in the NSM sampling process described in section 1 of the online appendix. When nominated by multiple survey modes, ties were assigned to the mode that generated the nomination first. Alters with referral information who were selected by the NSM algorithm for survey participation were sent a personalized invitation by text, email, or WeChat; this invitation was followed by up to four reminders sent to nonrespondents at four-day intervals.
To explore questions on immigrant incorporation and transnationalism, the ChIRDU survey collected two additional network rosters referring to respondents' social contacts beyond the population of interest to the NSM sampling process. Network Roster B elicited network alters with the name generator, “Please provide the first initials of three people you know who were born in the U.S. and reside in the Raleigh-Durham area.” Network Roster C elicited alters with the name generator, “Please provide the first initials of three Chinese people you know who reside outside of the U.S. including in your country of origin.” Name interpreter questions similar to those in Roster A were then asked to elicit Rosters B and C alters' attributes (with the current country of residence elicited for alters in Roster C) and attributes of the relationship with the respondent.
In the in-person and phone modes, the individual questionnaire was administered through a computer-assisted interview in Qualtrics. However, interviewers entered the three network rosters and contact information of Roster A–nominated alters on paper-and-pencil questionnaires to keep this information separate from the respondent's information, ensuring confidentiality. Full in-person interviews lasted approximately 50 minutes, whereas phone interviews were slightly shorter.
After respondents in the self-administered web survey completed the three network rosters, a new page opened alerting them to expect an email or text from the research team within 24 hours seeking referral information for the alters nominated in Roster A. Web respondents received up to two reminder emails to provide this information before the survey was considered closed (four weeks after the first invitation). A full web interview took approximately 35–40 minutes to complete. The ChIRDU Study used a number of strategies to facilitate respondents' provision of nominations and alter referral information, to encourage the participation of referred alters selected for an interview, and to preserve data confidentiality (see sections 3 and 4 of the online appendix).
Table 1 shows information on sample recruitment, the number of alters obtained in the three rosters, and the estimated cost per interview. The in-person and phone interviews were completed in September 2018, when the target of 200 completed interviews per mode was achieved. Key indicators of NSM feasibility are the number of nominations and referrals per respondent (Mouw and Verdery 2012). In the ChIRDU Study, referrals are a subset of nominations because some respondents did not provide contact information for some or all of their nominated alters. High mean numbers of nominations and referrals in in-person and phone interviews were achieved primarily through the strategies enacted to obtain them and the field interviewers' personalized rapport with the respondents. We attribute the smaller numbers of nominations and referrals provided in phone interviews to respondent fatigue because nominations and referrals were sought from respondents at the end of the interview. The ChIRDU web-based survey respondents nominated and referred the smallest number of alters. We identify a number of reasons for this. First, some web respondents reported that they were reluctant to provide nominations because of concerns about privacy and survey length; privacy concerns are not surprising, given that Americans do not trust institutions to protect their digitized personal data (Olmstead and Smith 2017). Second, a lack of resources to follow up with web nonrespondents by mail or phone prevented us from building rapport and trust with respondents. Our detailed informed consent form containing strong assurances of data confidentiality and the legitimacy of the research team was insufficient to alleviate respondents' concerns. Third, because our recruitment protocol required the survey mode of the parent and referred respondents to be the same, the lower mean number of web nominations and referrals compared with other modes meant a slow accumulation of new web nominations, a slower selection of the next wave of web respondents, and, ultimately, below-target sample size recruited through the web mode.
The cost of a completed in-person interview was the highest, at $177—approximately $50 higher than a telephone interview. In line with other studies (Sinclair et al. 2012), the cost of a self-administered web interview was considerably lower, at $84.
The Network of Chinese Immigrants in the Raleigh-Durham Area and Their Social Ties
Figure 1 depicts the sample network of Chinese in the Raleigh-Durham area revealed by NSM and generated by the network data collected in Network Roster A. This graph consists of 1,982 social ties of ChIRDU respondents to 1,644 unique local Chinese alters, for a total of 1,654 nodes in the network. Of these nodes, 97.58% are mapped to one giant connected component. The identifying information collected in Network Roster A (first name in Chinese characters and its transliteration, English first name, first initial of surname, gender, and contact information) allowed us to connect nominations shared by multiple respondents while minimizing challenges with the disambiguation of high-frequency Chinese names. The colors of the network nodes denote the NSM sample recruitment steps and the disposition of each nominated alter. Red square nodes represent the 10 seed respondents, and circle nodes represent nominations. Nominations include those who were (1) not selected by the NSM algorithm (white); (2) selected, referred, and contacted by the study team and participated (orange); (3) selected, referred, and contacted but did not respond (purple); (4) selected, referred, and contacted but refused to participate (blue); (5) selected, referred, and contacted but deemed ineligible after contact (black); and (6) selected and referred with invalid contact information (green). All nominations contribute information about the network configuration and improve the NSM sampling algorithm's efficiency.
Table 2 provides an illustration of the information on network ties collected in all three network rosters, each corresponding to a different social context: local ties to other Chinese immigrants in Roster A, local ties to natives in Roster B, and transnational ties to Chinese living outside the United States in Roster C. In fact, of the ties listed in Roster C, 80% are with someone in the respondent's country of origin.
Panel A of Table 2 depicts the respondent's relationship with the alter. A substantial proportion of local ties to natives are through work (.364), particularly compared with local and transnational Chinese ties. The data on relationship type, as measured by communication frequency, is shown in panel B. Forty-one percent (6.9% + 34.1%) of the transnational ties in panel B involve daily or weekly communication, suggesting robust connections to Chinese outside the United States. Data on these cross-border ties are important, given that some researchers have argued that new forms of Chinese migration networks are emerging that are increasingly diverse in education and geography (Chan and Koh 2018; Yin 2007). These networks are linked to the maintenance of business and career opportunities in both the United States and China (Grossman 2010; Wong 2005; Wong and Tan 2018) and are based on new forms of repeated, return, and circular migration (Biao 2011; Wong and Tan 2018; Yang 2013). Finally, panel C of Table 2 shows that ties with Whites dominate the ties of Chinese in Raleigh-Durham with natives. The relative frequency and intensity of ties to particular demographic groups vis-à-vis those to other Chinese immigrants can offer clues about the broader process of social incorporation (Zhou and Liu 2016).
Survey Performance by Mode
Table 3 presents ChIRDU survey outcome rates (American Association for Public Opinion Research 2016) by interview mode computed from information on the disposition of nominated alters in Figure 1. Response rates for the in-person and phone modes were quite high: 72.6% and 69.4%, respectively. Although finding suitable comparisons is not straightforward and confidentiality concerns surrounding studies of immigration were heightened in the national political climate prevailing during data collection, our study's overall response rate falls in the upper end of the 61% to 76% range of the main U.S. nationally representative in-person household surveys of 2014 (Williams and Brick 2018). The 2003 New Immigrant Survey, which conducted in-person interviews with a cohort of 12,500 newly admitted legal permanent residents, reached a response rate of 68.6%. The response rate for the web arm of ChIRDU is 51%, lower than those achieved by the other modes. This finding is consistent with findings that web surveys achieve response rates that are, on average, 11% lower than those of other survey modes (Manfreda et al. 2008). However, the contact rate for the web survey is only slightly higher than the response rate, revealing the challenges of effectively reaching respondents in web surveys (barring mail or phone follow-up of nonrespondents). In addition, other than highlighting our affiliation with institutions of higher learning in the sender's address and message text of the invitation email, we did not make special efforts to distinguish ourselves from spammers in ways that might have helped avoid spam filters, reach the designated recipient, strengthen the legitimacy of our survey in the recipient's eyes, and increase the response rate. Despite these limitations, our web-mode response rate is higher than the mean response rate of 39.6% (SD=19.6%) in a highly cited meta-analysis of 68 web surveys across 49 studies (Cook et al. 2000).
The Role of Peer Referral in Response Rates
In Network Roster A, name interpreter questions elicited from respondents the attributes of alters nominated and referred to the study (e.g., gender, education) as well as attributes of the relationship with them. Referred alters selected by the NSM sampling algorithm were invited to take the survey. If they agreed to participate, they were asked to self-report their own characteristics during the interview. With this information, we can evaluate the reliability of proxy reports by assessing consistency with self-reports. We find that in 97% and 76% of cases, respectively, referred alters' self-reported gender and education match their referrer's proxy reports (kappa=.94 and .59, respectively). This high validity gives us confidence in the quality of referrer's proxy reports on nonrespondents, enabling an analysis of the effect of peer referral on the likelihood of survey response for an evaluation of the role of peer referral in response rates.
These data are inherently multilevel, such that respondents are clustered within referrers, given that some referrers are likely better than others at “recruiting” their peers into the NSM sample. We use multilevel random intercept models to assess whether any attributes of the invited individual or the referrer or any relationship-level attributes are associated with the likelihood of survey participation. These models allow us to incorporate variability between and within referrers and test whether referrer's gender and education are associated with invited alters' survey uptake.
Table 4 shows the log odds of taking the survey as a function of the survey mode alone (Model 1), the survey mode and either the invited alter's or the referrer's characteristics (Models 2 and 3), and survey mode and both the invited alter's and the referrer's characteristics (Model 4). In Model 4, the odds of responding to the web survey relative to the in-person survey (base category) are approximately 75% lower (with an odds ratio of exp–1.39=0.25;5p=.08); the odds of response to a phone interview are indistinguishable from the odds of an in-person interview (p=.11). The invited alters' and the referrers' attributes are not significantly associated with the probability of response, with the exception of the referrer's education. When referrers have more than a college education, invited alters have lower odds of response (exp–1.25=0.29, p=.06) than when referrers have a high school education or less. As further evidence of the appropriateness of our multilevel modeling strategy, there is unexplained variation at the referrer level (p < .001 across all models) even after we control for demographic attributes of both the invited alter and referrer.
Table 5 shows the log odds of taking the survey as a function of homophily on the gender and education of the invited alter–referrer dyad and of their relationship attributes: strength (referrer-reported contact frequency) and social context (referrer-reported as friend, relative, coworker, or neighbor). All models control for survey mode (results not shown) because it is a meaningful determinant of survey uptake across models. The model with the largest explanatory power is the full model (Model 7), which includes both individual and dyad-level attributes. This model suggests that the likelihood of response is significantly associated with the similarity of the invited alter–referrer dyad and attributes of their relationship but not with the invited alter's individual attributes. Compared with dyads having different education levels, the odds of survey response are higher when the referrer and invited alter both have a high school education or less (exp4.68=107.77; p < .01), but the odds of response are smaller when they are both college educated (exp–1.51=0.22; p=.09). Chinese immigrants with low education may be similar across other characteristics related to the likelihood of response, whereas college-educated individuals represent a more heterogeneous group. Communication frequency also matters: invited alters in dyads who rarely communicate (once or twice per year) are less likely to respond than those in dyads with frequent communication (at least weekly; exp–0.94=0.39; p=.07). When referrer and invited alter dyads are neighbors, the odds of responding to the survey are two thirds lower (exp–1.12=0.326; p=.08) than when they are friends. Overall, the results of our multilevel models suggest meaningful peer effects on response likelihood. Peer referral is most effective if the referrer and referred alter communicate frequently or when both belong to homogenous education groups.
Assessing the Representativeness of NSM Samples: The ChIRDU Versus the ACS
To evaluate the extent to which NSM can generate samples for accurate population description and representation, we now turn to the comparison of key demographic characteristics of the ChIRDU sample with those of the ACS. We restrict the ACS comparison sample to those who were residents of Durham, Orange, and Wake counties of North Carolina; were born in China, Taiwan, or Hong Kong; were not U.S. citizens at birth; were at least 18 years old; and were not enrolled in school at the time of interview. We use the combined five-year 2014–2018 ACS sample files (Ruggles et al. 2020). Comparisons of ChIRDU and ACS estimates are reported in Table 6, with means and standard errors grouped by mode and source. Our comparisons focus on estimating group proportions for six demographic characteristics: gender, age, marital status, education, U.S. citizenship, and length of time in the United States. All estimates from the NSM and ACS samples are weighted. We calculate ACS estimates, standard errors, and confidence intervals using the ACS person-level replication weights available in the ACS PUMS file and using the successive differences replication method described in the ACS design (IPUMS USA 2019). We compute weights and standard errors for the ChIRDU proportion estimates using a bootstrap resampling approach similar to those proposed by Thompson (2020) and Gile and Handcock (2015). These calculations rely on the topology of the observed sample network in Figure 1 to select many resamples using the NSM sampling protocol used in the survey. Additional details on the ChIRDU sampling weights and standard errors are provided in sections 5 and 6 of the online appendix.
After applying weights, we use t ratios to test differences (1) across ChIRDU sample estimates grouped by mode, (2) between each ChIRDU mode and ACS estimates, and (3) between all ChIRDU modes combined and ACS estimates. First, comparing across pairs of ChIRDU samples by mode, we find the largest gaps between in-person and phone samples (columns 2 and 3): proportionately more in-person respondents are younger (more in the 30–39 age category and fewer in the 60+ category) and more educated (fewer with just college and more with more than college), and proportionately fewer report U.S. citizenship than in the phone sample (t ratios > 1.96).
Second, comparing each ChIRDU sample mode with the ACS, we find that proportionately fewer people are 60+ and report U.S. citizenship in the in-person sample than in the ACS (t ratios > 1.96). Compared with the ACS, proportionately more people in the phone sample have a college education, and proportionately fewer have more than a college education (t ratios > 1.96). Proportionately fewer respondents in the web sample (compared with the ACS) have high school education (t ratio > 1.96). Although this result is plausible given the digital divide based on educational attainment and income, the web sample size is smaller than the threshold required to detect moderate differences (online appendix, section 2). Thus, we have insufficient power to draw meaningful conclusions from the difference between the web sample and the ACS.
Finally, when all modes are combined, the gender, age, and education distribution of the ChIRDU sample is strikingly close to that of the ACS: the only significant differences are higher proportions who are college educated and lower proportions reporting U.S. citizenship in the ChIRDU sample compared with the ACS. The first difference is driven by the greater proportion college educated and the lower proportion with more than a college education in the phone arm of our study relative to the ACS, which could be due to the misreporting of educational credentials during phone interviews. The negative sign of the ChIRDU–ACS difference in U.S. citizenship is consistent with studies finding higher coverage of naturalized citizens in the ACS compared with administrative data (Van Hook and Bachmeier 2013). Nonetheless, the overall comparison between the ChIRDU and the ACS shows striking similarity, giving us confidence in the representativeness of the ChIRDU sample recruited with NSM techniques.
Discussion and Conclusions
Immigration to the United States has diversified in terms of place of origin and demographic profile. Immigrants are now more spatially dispersed, and the number of immigrants who spend their economic and social lives across international borders is an important aspect of migration. Concurrent with this changing portrait of immigration, accurate descriptions of immigrant origin groups and meaningful comparisons across them have faced new challenges. These groups are often too rare as a proportion of the total population or too spatially dispersed to be sufficiently covered in standard probability sampling designs, barring high screening costs required to ensure the inclusion of a minimum number of group members in analytic samples for inference. At the same time, studies of migration and immigration processes have highlighted that migrant networks facilitate decisions to migrate or to return home, social integration in the receiving society, and the maintenance of transnational ties. Yet, the measurement of social networks relevant to these processes has not been a privileged domain of large-scale surveys focusing on migration. A recurring challenge in investigating the heterogeneous role of networks in migration has been the collection of high-quality data that measure the kin and nonkin ties between migrants and potential migrants, return migrants, and nonmigrants in origin communities and with natives in destination communities. Studies that focus on small samples of respondents and their personal networks have highlighted the trade-off between the significant costs (respondent fatigue, attrition, and nonresponse) of studying personal networks that span multiple social domains and the advantages offered by large samples with a limited number of network indicators.
This article describes a network-based sampling approach—network sampling with memory—that relies on social networks to recruit samples of rare populations of migrants and collects ancillary data that allow researchers to explore the role of networks in migration. We applied NSM techniques with three modes of data collection (in person, phone, and web) to recruit and survey a sample of the rapidly growing Chinese immigrant community of North Carolina for the Chinese Immigrants in Raleigh-Durham (ChIRDU) Study. Our aims were to illustrate the feasibility of collecting detailed network data as part of a broader migration survey, to demonstrate the role of peer referral in response rates, and to test NSM's ability to cost-effectively recruit a representative sample of a rare population of immigrants. We demonstrated that NSM is a practical and cost-effective means of generating large-scale, population-representative samples of rare immigrant groups for inference. Although obtaining contact information for link-tracing referrals can be a slow and complex process (Beauchemin and Gonzalez-Ferrer 2011) and low response to surveys poses well-known challenges (Tourangeau and Plewes 2013), we show that a combination of known models of survey participation with the peer referral element of link-tracing designs can yield sufficiently large numbers of nominations and referrals and high response rates to recruit a sample that accurately describes the characteristics of the population of interest when compared with the ACS. In the case of the ChIRDU Study, we chose to sample a population that was likely to have good coverage in the ACS so that we could compare selected demographic measures between the two surveys as a test of NSM. Future researchers might use NSM to study rare or hidden populations that do not have good coverage in public data or for which the costs involved in screening interviews make conventional sampling techniques prohibitively expensive.
The NSM approach has several advantages for the study of networks and migration. First, it directly measures migrant network characteristics. We used multiple name generators and name interpreters to elicit lists of a respondent's ties grouped according to the context of the immigration experience—local coethnic ties at destination, local ties with natives at destination, and global coethnic ties, including with the country of origin—and the type and intensity of these interactions. In future analyses of the ChIRDU individual and network data, we will explore the operations of immigrant networks in a context of immigrants' spatial dispersion, the process of social incorporation in new destinations, and the role of communication networks in sustaining transnational ties. For example, the name generator that elicits global ties is particularly suited to considering the social and economic activities of transnational Chinese migrants—especially highly educated professionals, such as those who, similar to the Chinese community in Silicon Valley (Wong 2005), dominate the profile of the Chinese community in the Raleigh-Durham area and contribute to the increasing geographic and educational diversity of Chinese immigrants to the United States. Second, the link-tracing nature of the NSM sample relies on minimally identifying information to generate ties across sampled and unsampled alters linked by multiple nominations across egocentric networks. Similar to other studies that have collected information on ties that extend across respondents' personal networks with alters in common (e.g., Vacca et al. 2020; Vacca et al. forthcoming), our approach allows researchers to trace out a sample of a network. This sampled network can then be completed with model-based approaches to impute missing network ties, such as those that rely on exponential random graph models to reconstruct the features of the true network (Gile and Handcock 2017; Handcock and Gile 2010; Smith 2012). These approaches can, under certain conditions, provide valid estimates of the network's structural properties (Smith et al. 2022), which can then be related to measures of immigrant incorporation.
Although our present application of NSM techniques is limited to a local sample, this approach can be extended to the recruitment of binational or multisited samples, starting from the community of origin or destination and soliciting cross-border nominations. For example, among the three name generators used in the ChIRDU Study, the one eliciting global ties captured links to migrant-sending locales, suggesting the feasibility of tracing links back to the area of origin from lists of contacts elicited from migrants at destination. Another illustration is the use of a simple snowball technique in the 2010 Network Survey of Immigration and Transnationalism, which surveyed 600 members of a Mexican migrant community with network nominations that spanned three regions: the Raleigh-Durham area of North Carolina; Houston, Texas; and Guanajuato, Mexico. The sampling started with 17 seed immigrants in North Carolina and Houston, relied on respondents' referrals to residents of these sites from the same origin, and extended the sample to recruit network members in Guanajuato. The study demonstrated the feasibility of obtaining a large number of nominations and referrals from respondents to recruit multisited samples, with the significant advantage of collecting network data directly from migrants at destination linked to the community of origin (Mouw, Chavez et al. 2014; Verdery et al. 2018). These illustrations are promising and call for further investigations of the benefits, costs, and trade-offs of sampling from global networks to explore the operation of migrants' transnational ties.
The domain of NSM is not just immigrants but any networked population. NSM can have broad applicability to other rare or hard-to-survey populations connected via a social network when survey respondents are willing to answer questions about their networks and provide referrals to randomly selected network contacts. Earlier studies have successfully collected this type of network information among stigmatized populations, such as methamphetamine users in New York City (Dombrowski et al. 2012) and female sex workers in China (Merli et al. 2015; Weir et al. 2012; Yamanis et al. 2013). It is also possible to maximize respondents' anonymity in network rosters, while making network matching techniques feasible using schemes to encode identifying information (e.g., Dombrowski et al. 2012).
Our evaluation of the feasibility of data collection methods that rely on peer referral and respondents' social networks is also relevant to those interested in seeking alternatives to conventional data collection schemes (Brick 2011). Response rates to government and privately sponsored household surveys for social science research have been falling in high-income countries (Czajka and Beyler 2016; De Leeuw and De Heer 2002), including the United States (Tourangeau and Plewes 2013), with reasons attributed to growing technological (call screening) and physical (gated entryway) barriers and a lower propensity of potential respondents to participate when contacted by researchers (Miller 2017). The costs of attempting to raise response rates, minimize the risk of bias in estimates, and preserve research designs are high (Curtin et al. 2005; Tourangeau and Plewes 2013). The identification of new approaches that increase respondents' motivation to participate in surveys is central to developing better approaches to increase survey response (Tourangeau and Plewes 2013). Network-based approaches in which the interaction between the interviewer and respondent is filtered by peer recruitment are promising, and the ancillary network data collected by such methods can be used to study network-based social processes.
The data collection described here was made possible by NICHD grant R21HD086738 to Duke University and the University of North Carolina (Merli and Mouw MPI), with additional support for a pilot study to Merli via the Duke Population Research Center (NICHD grant P2CHD065563). We thank Francesca Florey-Eischen for expert project management; Chunyu Xiao, Jun Xing, Hui Yan, and Nan Zhou for their enthusiasm and hard work as field interviewers; and Sara Curran, Emilio Parrado, Herbert Smith, Ashton Verdery, and four anonymous reviewers for constructive comments.
This point is best illustrated by an example: consider the Immigrant and Intergenerational Mobility in Metropolitan Los Angeles (IIMMLA) survey, which collected data from a sample of 2,822 first- and second- generation immigrants. Participants were screened with random digit dialing (RDD), which required dialing 263,783 telephone numbers to recruit 10,893 potential respondents (Rumbaut et al. 2006:450). This suggests high monetary and time costs required to recruit an immigrant sample even in an immigrant destination community as large as Los Angeles. Furthermore, the IIMMLA collected data on all immigrant groups in the city. If a researcher wished to focus on a single origin group, the relative number of screening interviews would be even larger.
Evaluations of the statistical performance of RDS (point estimates and variance) include simulation studies on simulated networks (Gile and Handcock 2010); simulation studies on previously mapped realistic networks (Goel and Salganik 2010; Mouw and Verdery 2012); and empirical evaluations of RDS in real-world hidden populations, including comparisons with populations with known characteristics (McCreesh et al. 2012; Merli et al. 2015; Verdery et al. 2015; Wejnert 2009; Yamanis et al. 2013).
Evidence of the success of web-based network sampling among the general population is limited to one internet-based RDS study (Schonlau et al. 2014). Thus, the inclusion of a web data collection mode in the ChIRDU Study also presents an opportunity to learn about the benefits and costs of web network sampling in comparison with in-person and telephone modes of data collection.
Interviewers were instructed to elicit the primary context of the relationship to each alter.
Because the odds of response to the in-person survey (with all else held constant) are exp3.21 = 24.78 and the odds of response to the web survey are exp–1.39 × exp3.21 = 6.17, the odds ratio is 6.17 / 24.78 = 0.25.