We study the economic assimilation of childhood immigrants to the United States. The linguistic distance between English and the predominant language in one’s country of birth interacted with age at arrival is shown to be closely connected to occupational sorting in adulthood. By applying big-data techniques to occupations’ detailed skill requirements, we provide evidence that childhood immigrants from English-distant countries who arrived after the primary school years reveal comparative advantages in tasks distinct from those for which (close to) Anglophone immigrants are better suited. Meanwhile, those who arrive at younger ages specialize in a bundle of skills very similar to that supplied by observationally equivalent workers. These patterns emerge even after we net out the effects of formal education. Such findings are compatible with the existence of different degrees of complementarity between relative English-learning potential at arrival and the acquisition of multiple capabilities demanded in the U.S. labor market (math/logic, socioemotional, physical, and communication skills). Consistent with the investment-complementarity argument, we show that linguistic distance and age at arrival also play a significant role on the choice of college major within this population.
Little is known about the link between immigrant assimilation and the ability to accumulate the different skills required and rewarded in the U.S. labor market.1 This dearth is particularly surprising given the growing literature recognizing that workforce skills and abilities are multiple in nature and have sizable direct effects on workers’ wages (Heckman and Mosso 2014; Heckman and Rubinstein 2001; Heckman et al. 2014; Knudsen et al. 2006), as well as the central role of comparative advantages for understanding the economic progress of individuals (Roy 1951). We fill this knowledge gap in the present article.
We bypass the central difficulty in assessing immigrants’ multiple skills assimilation by assembling suitable data from the Dictionary of Occupational Titles (DOT) and Occupational Information Network (O*NET), merged with the 1990 and 2000 U.S. Censuses and the American Community Surveys (ACS) 2009–2013 (Ruggles et al. 2015). The DOT and O*NET offer detailed characterization of skill requirements for multiple occupations and are based on ratings by trained occupational analysts regarding how jobs are performed in establishments across the nation. From these occupational skill requirements, we carefully compute four broad categories of worker skills: communication, math/logic, socioemotional, and physical. We focus our empirical analysis by pooling these data for cohorts of childhood immigrants who arrived in the United States between 1960 and 2005. We also use novel data on the Levenshtein distance of one’s country of origin toward the English language to classify immigrants’ linguistic origins.2 Finally, we capitalize on the insight of Lenneberg (1967) and other contributions to the child development literature (Birdsong 2006; Pinker 1994) that point to mid-childhood as a major turning point in linguistic, cognitive, and behavioral development to examine immigrants arriving in the United States at different ages.
Our empirical strategy is to examine these data using a difference-in-differences (DID) approach. The significance of initial linguistic endowment in child migrants’ acquisition of multiple skills is identified by comparing the difference in workplace skills of those who arrived at relatively older ages (10 or older) versus younger ages from linguistically distant countries, relative to the same age difference for child migrants whose mother tongues are closer to English.3 We find that children who arrive at an earlier age (before 10) employ skills in adulthood that are similar to observationally equivalent immigrant workers from (close to) Anglophone origins. This is true when it comes to the communication skill requirements of occupations but is also the case for math/logic, socioemotional, and physical skills dimensions. In contrast, those who arrive at relatively older ages, and whose mother tongues are distant from English, specialize in occupational tasks that are markedly different from those in which younger and (close to) Anglophone immigrants do. Childhood immigrants born in English-distant countries arriving after the primary school years employ more physical skills as adults (approximately 0.2 of a standard deviation on that skill distribution) and fewer communication, math/logic, and socioemotional skills (0.3, 0.2, and 0.2 of a standard deviation, respectively). Importantly, when focusing on relative differences in skills, we uncover that late arrivals from English-distant countries develop a comparative advantage in math/logic, socioemotional, and physical skills relative to communication skills.
These findings are compatible with the idea that a child’s stage of linguistic development at arrival (relative to other dimensions of development) is one of the main drivers of differential investments in math/logic, socioemotional, and physical capabilities, in addition to adulthood language abilities. Our contribution highlights the multidimensional aspects of human capital and the complementary nature of investments in their acquisition—a concept that lies at the heart of influential work by Cunha and Heckman (2007, 2008). In a multistage human capital development process, the relative initial endowment of a given skill has direct impact over the costs and benefits of investments in the further acquisition of workplace skills, above and beyond the acquisition of formal educational credentials. Hence, faced with such technological restrictions, some U.S. immigrants have found it more attractive to invest in brawn rather than in brain—in mathematics rather than poetry, in science rather than history, in logical reasoning rather than persuasion, and so forth. Ultimately, this bundle of skills determines their comparative advantages and specialization as adults, occupational trajectory, and economic assimilation.
Compared with younger arrivals or those from (close to) Anglophone origins, child migrants from English-distant countries who immigrated after middle childhood find it difficult to learn a second language, having been exposed to numeracy training and school-level socialization in their home countries. These migrants likely find it more reasonable to specialize in skills applicable to more physical occupations and simultaneously refrain from investing heavily in the accumulation of communication skills relative to the development of math/logic and socioemotional skills. Quite reasonably, these effects are reduced to approximately one-half their original size when we account for the influence of formal education attainment. Nonetheless, we can still assert that linguistic distance and age at arrival do have significant interactive and independent impacts over adult skill accumulation even after holding constant individual schooling.
To more directly examine the connection between background and specialization, we analyze a subsample of childhood immigrants who graduated from college while in the United States. First, we show that the workplace skills accumulation patterns and comparative advantage development seen for all immigrants are also present among these highly educated (and U.S.-trained) immigrants. Second, by capitalizing on the information about college major choices collected in the most recent waves of the ACS, we identify that college-educated childhood immigrants from English-distant countries who arrived in the United States after the primary school years are relatively more likely to major in science, technology, engineering, and math (STEM) fields and less likely to major in the social sciences or humanities when compared with (closer to) Anglophone immigrants. This finding is again consistent with our reasoning that individuals facing different costs for the acquisition of additional skills end up directing effort toward specialization in systematic ways. This is likely to be true even before college, when teenagers choose vocational training or courses to take while in high school or when they decide how much effort to put into studying different disciplines, for example.4
Our work also complements very recent contributions to the population studies literature on occupational segregation and workplace concentration (Andersson et al. 2014; del Rio and Alonso-Villar 2015; Stromgren et al. 2014). We believe that our genuine contribution is the multidimensional characterization of workplace skills of jobs that immigrants sort into, rather than occupation labels per se. Occupation labels are not sufficiently segregated, and they do not lend themselves to a natural ordering. Moreover, skill dimensions are more informative, particularly if the goal is to inform the policy-oriented debate on the types of human capital investments that should be enhanced or incentivized to aid the assimilation of immigrants. Throughout the United States, many schools are facing the challenge of better preparing immigrant students in multiple areas of the curriculum. There is also a recent trend of school authorities attaching value to helping English learners maintain skills in their first language while also becoming proficient in English (Council of Chief State School Officers 2012). In documenting the skills assimilation of childhood immigrants, we demonstrate how the skills among those who immigrate as children may determine their economic success and may ultimately influence the shape of the distribution of earnings in the population. In other words, we see our approach as focusing on the fundamental elements of immigrants’ economic assimilation.
Background and Related Literature
Assimilation theory and other empirical studies have suggested that acquisition of language fluency accounts for immigrants’ faster wage growth compared with native-born workers.5 Indeed, Borjas (2015) used data from the 1970–2010 decennial censuses and traced the patterns of wage convergence to those of English fluency.6 Cohorts that entered prior to the 1980s experienced a 12 percentage point increase in English fluency during their first decade in the United States, whereas the cohorts arriving after the 1980s showed only a 4 percentage point improvement. Concomitantly, the earnings disadvantage of immigrants (relative to U.S.-born workers) who entered the United States before the 1980s narrowed by 15 percentage points in their first two decades after arrival, whereas those who arrived after the 1980s have a much lower rate of wage convergence.
The estimation of returns to English-language fluency among immigrants is intuitively appealing; however, it poses many empirical difficulties. First, language skills may be correlated with confounding factors, such as unobserved innate ability, parental background, or other accumulated (and unmeasured) skills. Second, typical measures for language skills are coarse and are likely subject to measurement error. For example, in the U.S. Census, the measure employed is a self-assessment of respondent’s ability to speak English: very well, well, or not at all. Previous studies have addressed the problems of endogeneity and measurement error of language skills by employing instrumental variable strategies. In particular, Bleakley and Chin (2004) turned to the extensive literatures in cognitive sciences and psychobiology for the well-documented relationship between age and language acquisition facility. Their creative identifying variation for the wage return to language skills is an interaction of age at arrival with Anglophone versus non-Anglophone country of origin. The implementation of this strategy essentially attempts to identify the language skills (labor productivity) from difference in the age-at-immigration gradient in English proficiency (wages) for immigrants born in English-speaking and non-English-speaking countries. Moreover, the interaction of a child migrant’s age at arrival and Anglophone origin has also been shown to influence adult attainment in schooling, which itself has direct effects over wages (as recognized by Bleakley and Chin 2004). For example, Beck et al. (2012) computed that the probability of being a high school dropout increases significantly each year for childhood immigrants who arrive after age 8.
Third, and central to our contribution, language proficiency in adulthood is only one of the many skills that individuals acquire over their lives abroad and while in the U.S. education system and labor markets. By focusing only on English proficiency, we are missing a more complete picture of the assimilation process. We also claim that this is not resolved by looking at aggregated measures of human capital, such as years of schooling, because these measures do not capture the different type of skills developed within school. In essence, the multidimensional and dynamic human capital investment framework on which we base our study argues against attributing differences in earnings between younger and older (at arrival) Spanish-speaking Dominican immigrants relative to the difference among Anglophone Jamaicans exclusively to language fluency in adulthood or formal educational attainment. According to our framework, language-acquisition facility (relative to other skills) is an endowment that operates as a catalyst for learning other capabilities. Therefore, an examination of adult outcomes such as wages, age at arrival, and Anglophone origin should reflect multiple skills and not language proficiency only.
Like in the work of Bleakley and Chin (2004) and Beck et al. (2012), we base our framework in compelling empirical evidence in economics, developmental psychology, and neurobiology, making the case that different skills are malleable at different ages and point to critical periods in child development (Knudsen et al. 2006). The child development literature itself proposes a major turning point during middle childhood (between ages 7 to 11) in cognitive skills development. Jean Piaget, one of the first psychologists to study child cognitive development, called this the “concrete operational stage,” when children begin thinking logically about concrete events but still have difficulty with abstract concepts (Ginsburg and Opper 1988). Therefore, during primary school ages, a childhood immigrant’s cognitive and socioemotional skills are crucially shaped, providing a base for later schooling, and eventually determining which skills individuals elect to develop before and during accrual of experience in the labor market.7
To interpret our findings, we invoke the conceptual model of skill formation developed in Cunha and Heckman (2007, 2008), who built a model recognizing multiple stages of childhood, with inputs at different stages being complements (i.e., earlier investments affect the attractiveness of later ones). Children who acquire greater stocks of (cognitive and socioemotional) skills in early childhood are more efficient at learning in later childhood. In other words, skill begets skill. Consistent with the framework, we hypothesize that age at arrival and linguistic distance to English have an interactive effect on the skill formation of child migrants. Those who arrive prior to middle childhood have the highest rate of assimilation, and the age differential is greatest for those who come from linguistically far countries of origin. In particular, those with a lower level of initial language-learning potential who arrive in late childhood or as adolescents end up with relatively lower levels of communication skills as adults and, given the circumstances of the processes involved in multiple skill acquisition, invest relatively more in skills whose acquisition is not complemented by language. In principle, we would expect that those exposed to international education systems before arriving in the United States would necessarily have an endowment tilted away from English but not necessarily so in terms of other skills, such as numeracy or socialization ability.
In examining the data through these lenses, we highlight a broader mechanism for immigrants’ earnings assimilation that goes beyond language skills in adulthood. Incorporating these nuances is important. Our results provide evidence that immigrants fine-tune their strategies with respect to investments in multiple skills acquisition, and these results have implications for both assimilation theory and immigration policy. With the doubling in the number of children in immigrant families (those under age 18 who are foreign-born or who live with at least one foreign-born parent) since 1990 (Mather 2009), it is important to focus assimilation studies on human capital accumulation trajectories. These trends also clearly put the spotlight not only on immigration policy reforms but also on the U.S. public education system that has to absorb these child migrants.8 If economic assimilation of migrants is to be expected, it will have to include either the formation of child migrants’ skills converging to levels observed among natives or by dramatic changes in how specific skills are rewarded in the U.S. economy.
Occupational Skills Databases
Occupational information in the Dictionary of Occupational Titles (DOT) and in the Occupational Information Network (O*NET) are the result of comprehensive studies of how jobs are performed in establishments across the nation and are collected from multiple sources: surveys filled by workers performing the job, members of trade and professional associations, and site visits by trained occupational analysts. We employ big-data techniques (specifically, principal component analysis) to compute job skill measures as composites of such data, and merge them with worker-level information using a crosswalk matching DOT and O*NET occupation codes to 1990 census occupation codes from the National Crosswalk Service Center included on all IPUMS data sets.9
The period covered in our study overlaps well with occupational information from both the DOT and the O*NET. Occupation skills information in the revised fourth edition of the DOT was collected between 1978 and 1990 and published in 1991 (U.S. Department of Labor et al. 1991). The O*NET program replaced the DOT and began data collection in June 2001 (U.S. Department of Labor 2008). The O*NET 13.0 database includes data collected between 2001 and 2007. Releases earlier than version 13.0 of O*NET exist, but they mainly contain extrapolated data from the DOT.10 Thus, in using both the DOT and the O*NET, we are able to capture the occupational tasks and skills of childhood immigrants who were in the labor force during 1990–2010.
Similar to previous studies that used information from occupational databases, it is not possible to simultaneously use all the variables capturing job skills. High collinearity makes precise estimation impossible. We use the textual definitions of DOT and O*NET variables and the O*NET Content Model to construct interpretable measures of worker skills. These broad skill categories are communication or language, math/logic, socioemotional, and physical skills. These skill indices are created using principal component (factor) analysis. The indices are constructed from the first factor and are (population-weighted) rescaled to have a mean of 10 and a standard deviation of 1 using the entire population of workers 25 to 38 years of age (including natives).11 The specific DOT and O*NET variables included for each skill measure are described in panels A and B, respectively, of Table 5 in the appendix.
We turn to previous studies using the DOT (e.g., Bacolod and Blum 2010; Bacolod et al. 2009) for constructing indices from the DOT. For example, the measure of language skills from the DOT is constructed from variables such as gedl, with five levels to capture reading, writing, and speaking skills required to perform the job. At high gedl levels, workers are required to read literature and write critiques; at low gedl levels, workers need only write and speak simple sentences. Similarly, several variables from the O*NET Content Model are used to construct a language skill index. These relate to a worker’s developed capacities in language skills and verbal abilities and come from the survey question, “How important is _______ (e.g., the variable Oral Expression) to your current job?” Respondents rate the skill on a 1–5 scale, with 1 being “not important” and 5 being “extremely important.” At the occupation (SOC) level, each O*NET skill is a weighted average of respondents’ ratings; on average, there are 31 raters per occupation.
Several variables from the O*NET are used to construct the math/logic skills. These are the non-language-related variables categorized under Basic Skills and Complex Skills in the O*NET Content Model, and relate to a worker’s developed capacities that facilitate learning or more rapid acquisition of knowledge related to work performance. A high value on the index for these skills indicates higher usage of critical thinking, mathematics, and problem-solving when carrying out the tasks associated with that job. The DOT math/logic skills index captures the same aspects of cognition.
Our measure of socioemotional skills is constructed with variables that indicate the skills required for workers to perform their jobs in relation to people to achieve goals in the workplace. These variables indicate how aware a worker has to be of others’ reactions, adaptability, ability to work under stress, and persuasion, among others. Meanwhile, the physical skills measures from the DOT and O*NET are constructed to indicate the physical demands required for job performance. For instance, in the DOT, five levels are used to capture the degree of strength requirements as measured by the job’s involvement in standing, walking, sitting, and lifting and carrying objects.
Measure of Initial English Endowment
To capture English-language facility at arrival in the United States, we use the Levenshtein distance to measure the linguistic distance of the predominant language in one’s country of origin from English. The German Max Planck Institute of Evolutionary Anthropology developed and computed this measure to study geographical diversity of languages and the historical process behind their divergence.12 The measure is based on a simple way of measuring the similarity of character strings between pairs of words with the same meaning in two different languages in their phonetic representations (Levenshtein 1966).
The implementation is based on a specific phonetic alphabet that employs characters within standard ASCII to represent common sounds. This method uses 41 symbols representing 7 vowels and 34 consonants. The contrast between words is the number of sounds that have to be substituted, added, or removed to translate the one word into the other (Holman et al. 2011). The words used in the approach are taken from a list of 40 words that are common in nearly all the world’s languages, including parts of the human body or expressions for common things in the environment (see Swadesh 1952). It is the average dissimilarity within this set of words that is taken to be our measure for the linguistic distance between a given language and English.13 Because lexical similarity may be influenced by chance (Bakker et al. 2009), such as an overlap in the phoneme inventories, this quantity is normalized by using the average disparity of all N (N − 1) / 2 pairings of words with different meanings.
In essence, this measure of distance (LDND) gives an approximation of the number of cognates between English and another language. More cognates indicate that languages are closer to having common ancestries. Therefore, a smaller Levenshtein distance indicates a higher probability that a language shares characteristics with English and is likely correlated with the ease of acquiring English as a second language. This measure is discussed in more detail in Isphording and Otten (2014).
Figure S1 in Online Resource 1 provides information on the distribution of the measure of linguistic distances for the countries of origin represented within the IPUMS samples. For example, English-speaking countries such as Canada and New Zealand have LDND = 0. The closest non-Anglophone immigrant language in our data is Vincentian Creole English (LDND = 41.57), and the farthest language is Vietnamese (LDND = 104.06). In an attempt to provide a better understanding of this scale, we present in Fig. S2 of Online Resource 1 alternative illustrations of correlated quantities. First we estimate probit models using linguistic distance as the only predictor for Anglophone origin based on the binary classification of official languages according to the World Almanac and Book of Facts (1999) and adopted by Bleakley and Chin (2004). Then we plot the relationship between the linguistic distance measure and model-based projected probabilities. In the same figure, we also plot the local polynomial relation between linguistic distances and scores of the 2005/2006 Test of English as a Foreign Language (TOEFL), which is applied worldwide.14 The figure clearly reveals that our measure of linguistic distance does contain information that is relevant for our application.
Given the lack of continuous variation in the index within our population of interest, we use such a measure to define two subgroups (above and below average LDND) in the analyses that follow. We identify countries for which (1) linguistic distance ranges from 0 to less than 85 (local-averaged TOEFL above 90), and (2) considered linguistically far from English, for which linguistic distance is 85 and above (TOEFL scores below 90). We experiment with alternative classifications of countries for which English is an official language albeit not the predominant one. We list the countries of origin in our working sample and their linguistic classification in Table 6 in the appendix.
Data on Childhood Immigrants and Native-born U.S. Workers
Our main sample is composed of workers aged 25 to 38 at the time of the interview for the 1990 and 2000 U.S. Censuses and the 2009–2013 ACS, and weights are adjusted to reflect differences in sampling between the former (5 % sample) and the latter (1 % sample).15 The main subgroup of interest is childhood immigrants younger than age 18 at the time of entry into the United States and for whom we have well-defined country-of-birth information. Importantly, in our sample, workers are similar with respect to location within the experience-earnings profile.
We capitalize on the age at immigration to generate identifiers for immigrants who arrived in the United States early (before age 10) or late (age 10 or older). Finally, we also single out a group of child immigrants born abroad (in Anglophone countries or not) to U.S.-born parents, who are thus granted automatic citizenship yet have experiences abroad that may mimic those of other immigrants.
Interestingly, in Table S1 (Online Resource 1), we document that some patterns observed among all childhood immigrants are also present among those who were born abroad to U.S.-born parents, suggesting at least some impact of the environment surrounding these individuals before immigration to the United States. We further explore this reasoning in upcoming sensitivity analyses.
Table 8 (in the appendix) and Table S2 (in Online Resource 1) reproduce the analysis by focusing exclusively on the variables of main interest in the present article: skill indices (z scores with a mean of 10 and a variance of 1). Once again, we observe strong gradients, with linguistic distance and age at immigration both associated with lower communication, math/logic, and socioemotional skills, as well as greater physical skills’ accumulation. Similar but more muted patterns characterize immigrants born abroad to U.S.-born parents.
The objective of our estimation strategy is to identify the combined role of linguistic distance and age at arrival in determining skill levels in adulthood of those who were childhood immigrants. This exercise corresponds to the estimation of differences in an individual’s skill (outcome of interest) under two distinct conditions—low and high—regarding English learning potential relative to other basic skills. However, for each childhood immigrant, only one learning condition is indeed observed. Therefore, we develop comparisons between subgroups that can plausibly illustrate the counterfactual trajectory of skill accumulation.
where the effects of immigrating after age 9 (α1) and observed covariates (β) are common to all individuals, and θc + ηic collapses all unobservable characteristics (origin-level and individual-level). τfar is the DID parameter of interest: it represents the additional impact of migrating later in childhood from a country linguistically far from English relative to the same age differential for sending countries that are (close to) Anglophone.
We estimate τfar by standard ordinary least squares (OLS), clustering standard errors by country of birth to allow for the presence of a country-level effect. We also extend the analysis by including country-of-birth fixed effects on the pooled cross-sectional samples of the treatment and control populations. Identification variation in this latter case comes from the existence of younger and older immigrants from the same country of origin. Differences in push and pull factors are thus controlled for, as long as they do not vary within a source country across children that migrate at different ages. We also examine the robustness of results to the inclusion of premarket skills measures captured in the years of schooling of individual immigrants.16
An alternative strategy to examine whether the conventional adjustment using observed covariates is sufficient to eliminate compositional biases, as Rosenbaum (1987) suggested, consists of exploring additional control groups. We do so by restricting attention to those immigrants from linguistically far countries. Child immigrants from the same countries but who are born to U.S.-born parents are employed as a comparison group in this case. The assumption here is that these individuals are subject to similar immigration push factors and share the same sending-country environment before migration to the United States, yet the latter are exposed to English at home, giving them a relative edge in language and communication development upon arrival. In other words, the assumption is that those born to U.S.-born parents would have exposure to English that makes the linguistic distance associated with country of birth less relevant, but that they would still be subject to country-specific elements that affect skill accumulation (e.g., the quality of schools or level and timing of economic development).
Workplace Skills and Assimilation
Table 1 presents the results that are central to our analyses. We estimate DID parameters by focusing on those from linguistically far origins arriving after age 9 as a treatment group, and using both Anglophone and linguistically close immigrants as the control group. We report results based on both DOT and O*NET classifications, and focusing on communication, math/logic, socioemotional, and physical skills (in different panels). We find substantial and significant negative impacts of lower English-learning potential (relative to other skills) at arrival over the acquisition of all the computed workplace skills indices.
In columns 1 and 2, and 7 and 8, we present unconditional averages of the dependent variable. Columns 3 and 9 present age-group differences without accounting for covariates for DOT and O*NET classifications, respectively. In columns 4 and 10, these quantities are reestimated, including demographic characteristics and country fixed effects as regression controls. We see that for both groups of countries of origin, those arriving at a later age accumulate less communication, math/logic, and socioemotional skills. At the same time, they seem to be employing more physical skills. Despite the same direction of effects, the age differences are significantly larger among those from English-distant countries of origin compared with the control group. We confirm this by reporting DID estimates in columns 5 and 11. In terms of communication skills, the DID estimate is 0.22 (DOT) or 0.28 (O*NET) of 1 standard deviation lower than for their counterparts (considering the population distribution of the trait). These differences are significant even if we consider a more stringent criterion that takes into account the large sample sizes used in the estimation (Schwarz’s Bayesian criterion). Clearly, older-at-entry immigrants arriving with lower relative linguistic endowments end up accumulating and using relatively lower levels of communications skills when performing their jobs.
The same pattern is present when we focus on nonlanguage cognitive skills. For mathematical and logical reasoning skills, the effects range from −0.18 to −0.13 of 1 standard deviation, depending on the classification used. Similarly, the results regarding socioemotional skills in panel C of Table 1 indicate a negative relation corresponding to 0.14 (DOT) and 0.13 (O*NET) of 1 standard deviation in the population distribution of those traits. Finally, panel D shows the opposite pattern for physical skills. Individuals arriving in the United States as children who have a relatively lower endowment of English-learning potential end up accumulating relatively more physical skills than their counterparts from (close to) Anglophone countries. These relative differences correspond to 0.19 to 0.22 of 1 standard deviation in the distribution of such skills as measured by DOT and O*NET classification systems, respectively. Taken together with the aforementioned results, we can safely say that these immigrants indeed developed a bundle of skills tilted in favor of physical rather than intellectual and socioemotional skills. In rather simple terms, they seem to have accumulated brawn rather than brain.
In columns 6 and 12 of Table 1, we also show that these measures of DID are somewhat conservative. When we explore an alternative classification (one based solely on linguistic distance) of countries in our sample that have English as an official language, we see that all the estimated parameters are larger in absolute terms.
Figure 1 summarizes these findings by portraying DID patterns. To obtain the DIDs, we contrast individuals from (close to) Anglophone and linguistically far origins by first computing skill levels relative to those immigrants arriving before completing the first year of age. We then combine this within-group difference with an across-group comparison per age-at-immigration level. What we plot is the result of these two operations. As before, we perform all calculations conditioning on demographics. The results show a clear pattern of skill accumulation with age at arrival across all four skill dimensions.
Here we examine the sensitivity of our results. We first investigate whether observable characteristics of parents systematically vary by Anglophone origin and age at arrival, which would generate unobserved heterogeneity biases in our estimates. To do so, we turn to a different data set with enough observations of childhood immigrants with parent background information. We use the National Longitudinal Survey of Youth 1979 (NLSY79) because these data cover most of the immigrants in our sample: arriving in the United States in the 1960s and 1970s and employed in the labor market in the 1990s and 2000s. Of 12,686 NLS respondents, 874 are foreign-born. Although the baseline 1979 interview asked for nativity, year of first entry into the United States was not collected until the 1990 survey. In addition, the public release version of the data does not have information on country of birth for foreign-born respondents. To assign respondents to the treatment or control group, we take advantage of responses to the question, “When you were a child, was any language other than English spoken in your home?” One may argue that this is actually a better indicator of linguistic endowment, given that indicating foreign language spoken at home during childhood is more informative than Anglophone/non-Anglophone country of birth.
Table S3 in Online Resource 1 reports results from regressions of childhood immigrant’s father’s and mother’s education on age at arrival, specified in two alternative ways: (1) the indicator for foreign language spoken at home, and (2) interactions of the two. In columns 1 and 2, age at arrival is indicated as “Age >9,” consistent with this article; in columns 3 and 4, age at arrival is specified as max(0, age − 11). This table shows that better-educated parents are more likely to migrate to the United States when their children are younger. For instance, on average, fathers have 1.4 more years of education (not statistically significant; see column 1), and mothers have 1.99 more years of education (mildly significant) among early arrivals. In addition, parents have significantly fewer years of education among migrants with lower linguistic endowment (as indicated by speaking a foreign language as a child at home). However, the estimates show that parental background does not systematically differ by linguistic endowment across child’s age at arrival in the United States. All the interaction terms are not statistically different from 0, confirming the identification hypothesis behind our main estimation strategy.
To further address concerns on this front, we report DID estimates that employ an alternative comparison group, in the spirit of Rosenbaum (1987) and other research on causal inferences. We focus on individuals migrating from English-distant countries only. Instead of contrasting their trajectories with those of immigrants from other countries, we identify childhood immigrants who were born in the same countries but to U.S.-born parents. Despite obvious differences between these two groups, this alternative estimation is intended to identify whether results are similar to the original ones even when we employ a comparison group that is likely not subject to the same compositional biases as the original control group. Moreover, one may even argue that by using immigrants who leave the same country at the same age and time, we are better able to control for pull and push factors that lead to the migration decision, including the economic conditions or the quality of the education system, for example. In Table S4 of Online Resource 1, we report findings based on this strategy, revealing that results are qualitatively unaltered—and, if anything, a little larger in absolute terms. Bolstered by these findings, we include immigrants born abroad to U.S.-born parents in the control group for the additional estimations that follow: that is, they are considered close to Anglophone immigrants.17
Second, we provide evidence that our main results are not being driven by particular subsections of our data. In Table S5 in Online Resource 1, we show that alternative definitions of the sample based on excluding older immigrants (aged 16 and 17) or those from particular countries/regions do not produce changes in the conclusions that we draw from our inference. This is the case even when we exclude Mexico, by far the largest sending country, from the data set. Most results are indeed smaller in absolute terms, but this does not alter the main conclusions that we draw. Similarly, Table S6 (Online Resource 1) reports that our findings are prevalent among men and women and older and younger workers in the data set.
Third, we examine the possibility that our results can be explained by aggregate measures of human capital that are usually employed in the literature. In particular, we examine whether including measures of educational attainment eliminates our results. Table 2 presents these results in two ways: (1) including educational credentials as regressors, and (2) estimating the model by subgroups defined by highest degree completed. Although educational attainment is not an ideal control given the endogenous accumulation of schooling, we can still examine these findings taking into consideration that these measures represent the summary of premarket investments in human capital: in that sense, education would be a state variable. That said, skill differentials observed with education held constant would represent those that are generated by the process of on-the-job “learning by doing” that initiates after individuals leave school and join the labor force. Of course, this is not the only interpretation possible. Education measures in the census are too general and do not capture nuances related to the quality of schooling nor—and key for our argument—the differential performance across disciplines and school subjects.
Ultimately, our skill differentials also reflect some of these subject-specific investments made before engagement in labor market activities. In Table 2, we present evidence that despite the fact that higher completed education levels are associated with greater math/logic, language, and social skills, as well as fewer physical skills, education does not explain all the differences we observe across our treatment and control groups. The effects measured are approximately one-half the size of the original ones, but we can safely assert that initial language capital and age at arrival continue to have significant interactive and independent effects on adult skill attainment.
This can also be seen in the sample stratifications. In qualitative terms, the impact of relative English-learning potential at arrival is still seen among those who are high school graduates (but have no more education than that) and among college graduates. For the latter, the patterns of skill accumulation are still sensitive to linguistic endowment, but effects are mostly concentrated on communication and socioemotional skills. It is interesting that the disadvantages for these two skills are not seen for math/logic skills, suggesting that highly educated childhood immigrants may have invested quite selectively.
Finally, we examine the possibility that our findings are generated by the effect of a general and unidimensional ability bias that is omitted from our estimations. Our strategy here is based on the idea that if this were the case, differences between skills should have no relation with the interaction of age at entry and linguistic-distance indicators. Alternatively, one could also hold constant workplace communication skills and estimate the effects on other skills of initial relative linguistic endowment conditional on adult language skills. Table 3 reports findings based on this strategy. Interestingly, we find that the effects on math/logic, socioemotional, and physical skills are generally indicative that individuals arriving late from English-distant countries develop a relative disadvantage in communication skills. Thus, the variation in the data does not represent an absolute and unidirectional impact of the learning endowment. Although the effect of linguistic endowment on physical skills seems unsurprising given the signs of the changes we observed earlier, the estimations here provide support to the interpretation that childhood immigrants with relative difficulties in acquiring language skills end up developing comparative advantages in math/logic and socioemotional skills as workers. This is confirmed in Table S7 (Online Resource 1), where we reestimate the models by employing the z score of the ratio between two skills as our dependent variable.
The finding that childhood immigrants with relatively lower linguistic endowment end up developing comparative advantages in math/logic motivates one additional empirical exercise focused on childhood immigrants with a U.S. college degree. To explore this angle, we return to the 2009–2013 ACS data and reselect a sample based on all bachelor’s degree holders who reported their choice of major in that survey. We use the same sample restriction in terms of age at immigration but remove the upper limit on current age and working status in order to draw most of the data available.18 We also employ an econometric strategy identical to the one used for the measures of skill presented earlier in order to verify differential patterns of major choice among college-educated immigrants.
In Table 4, we show the statistical and substantive significance of the impact of English-learning potential at immigration with respect to the choice of STEM majors over social sciences and humanities majors. Older immigrants from linguistically far countries are 3.8 percentage points more likely to graduate with a STEM major, and they are 2.0 percentage points less likely to major in social sciences or humanities. These patterns are also depicted in Fig. 2, revealing a remarkable similarity with the skill figures shown earlier. This is a novel result indicating the role of an innate comparative English-language advantage for the choice of college major. This finding has important implications, and this reasoning can also be applied to understanding course choice in high school or even effort allocation across disciplines. These results raise questions about the profound long-run impacts of programs that teach English to immigrants or dual-language schooling.
Summary and Conclusions
Our analysis of U.S. Censuses and ACS, distance of one’s mother tongue to the English language, and skill content of occupations (DOT and O*NET) reveals important fundamental elements of the economic assimilation of childhood immigrants in the U.S. labor market. In documenting assimilation of childhood immigrants at the skill level, we demonstrate how the skills among those who immigrate as children may determine their economic success vis-à-vis U.S.-born workers and ultimately influence the shape of the distribution of earnings in the population. In this way, our work is related to a large literature in economics on the wage effects of immigration on the U.S. labor market, which shows that the effect of immigration on the wages of native-born workers is slight to nonsignificant (e.g., Borjas and Katz 2007; Card 1990, 2009; among many others) while having a relatively large effect on previous immigrants (e.g., Cortes 2008; Ottaviano and Peri 2012). Their findings are compatible with the idea that immigrants and U.S.-born workers—even those with similar education and work experience—possess unique skills that lead them to specialize in different occupations; and this pursuit of their advantages in skill leads, in equilibrium, to a mitigation of locals’ wage losses from immigration.19
We show that children who arrive at an earlier age from English-distant countries sort themselves as adults into tasks/occupations that are more similar to the ones chosen by observationally equivalent (close to) Anglophone immigrant workers. In contrast, those who arrive at older ages—particularly after the primary school years—are in tasks/occupations that are more physically intensive relative to those who migrated at younger ages or who have Anglophone origins. The significance of this interaction of age at arrival and linguistic distance—which we dub relative learning potential—in determining adult attainment of communication, math/logic, physical, and socioemotional skills of child migrants also goes above and beyond the effect of migrants’ acquisition of formal education.
We view this as evidence for the complementary nature of comparative advantages in human capital and skills acquisition, and their dynamic interplay. Relative English-learning potential at arrival is a major determinant of differential investments in multiple skills. Specifically, the greater a child migrant’s English-learning facility (relative to other skills) at arrival, the more the child invests in communication skills than on math/logic, socioemotional, and physical skills. This reasoning is confirmed by an investigation of college major choice, which suggests that lower endowments of English-learning potential at arrival directly influence the tendency to choose STEM concentrations over social sciences or humanities, both of which require greater language and communication skills. These findings indicate individuals facing different costs for the acquisition of additional skills and thus directing effort toward specialization in very systematic ways.
Our work contributes to the recent population studies’ literature on occupational segregation and workplace concentration among immigrants (Andersson et al. 2014; del Rio and Alonso-Villar 2015; Stromgren et al. 2014). The empirical evidence presented here helps characterize the skill dimensions of jobs into which immigrants sort, going much deeper than occupation labels. In that sense, our work is more informative for the policy-oriented debate on the types of human capital investments that should be enhanced or incentivized in order to aid the assimilation of immigrants. It also complements other empirical studies on assimilation theory suggesting that acquisition of language fluency—and only that skill dimension—accounts for immigrants’ faster earnings growth compared with native-born or Anglophone workers (Bleakley and Chin 2004; Borjas 2015; Guven and Islam 2015). We provide evidence that this is an incomplete view of the assimilation process. By looking at childhood immigrants’ acquisition of human capital focusing on a multidimensional vector of skills, we provide evidence that early comparative disadvantage in English begets different degrees of comparative advantage in math/logic, socioemotional, and physical skills.
We are grateful to Ingo Isphording and Sebastian Otten for sharing their data of the Levenshtein linguistic-distance measure. Sections of the analysis were performed while Rangel was a visiting scholar at Princeton University’s Program in Latin American Studies. He is thankful for their hospitality. We would also like to thank participants of the All-California Labor Economics Conference, Southern Economic Association meetings, Population Association of America meetings, and workshops at Duke University for helpful comments. Thanks also to Bernardo Blum, Charles Clotfelter, Elizabeth Frankenberg, and anonymous referees for detailed and insightful suggestions. All remaining errors are our own. The views expressed herein are our own and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
We acknowledge that assimilation and integration are more general processes by which immigrants become fully fledged members of their host societies (see Akerlof and Kranton 2010; Fukuyama 2006; Portes and Rumbaut 2006). Here we focus only on the human capital and worker productivity elements of immigrants’ economic assimilation.
This measure was originally developed by the Max Planck Institute of Evolutionary Anthropology and has been previously employed by Clarke and Isphording (2016) to assess investments in health by immigrants; Isphording and Otten (2014) to assess language acquisition by immigrants; Isphording and Otten (2013) to examine international trade patterns; and Adsera and Pytlikova (2012) to discuss international migration patterns.
Throughout the article, we consider that an individual’s mother tongue is the predominant language spoken in her country of birth, as listed at https://www.ethnologue.com/.
We see these as important avenues for future research. Rangel and Shi (2016) found that age at immigration and country of origin influence credit accumulation in different high school disciplines for respondents of the National Longitudinal Study of Adolescent to Adult Health (Add Health). Stiefel et al. (2010) uncovered that among high schoolers in New York City, late-arrival immigrants perform better than natives in math but not in English. Bohlmark (2008) reported similar findings in Sweden.
See also Angrist and Lavy (1997), Berman et al. (2003), Chiswick and Miller (2002, 2007, 2010), Bleakley and Chin (2004, 2010), Dustmann and Fabri (2003), Dustmann and van Soest (2001, 2002), and Guven and Islam (2015), among others.
Another important consideration is the increase in immigration flows from countries in Asia and Africa where the main language is not English. See Elo et al. (2015).
We restrict ourselves to occupations that are listed in all calendar years covered in our pooled data. In that way, we do not address the impact of the emergence of new occupations over our outcomes of interest.
Approximately 100 occupations per year were gradually transitioned from extrapolated DOT data. By version 13.0 of the O*NET database, occupational data collected between 2001 to 2007 from more than 128,000 workers in 95,000 establishments are included.
The first factor accounts for 96 % to 100 % (O*NET) of variation in all the variables included in each index.
See details at http://www.eva.mpg.de.
As described by Isphording and Otten (2014), only one consonant has to be substituted between the English word yu and German word du. Meanwhile, to transfer maunt3n—the transcription of mountain—into bErk (the German transcription), one has to remove or substitute each consonant and vowel.
We use the reports downloaded from the Educational Testing Service (ETS) web page with tables that list average scores per country of test-taker’s nativity https://www.ets.org/s/toefl/pdf/94227_unlweb.pdf).
Data for this are made available in the Integrated Public Use Microdata Series (IPUMS, Ruggles et al. 2015).
In the economics literature, premarket skills refer to investments in human capital that predate the entry into the labor force.
In nonreported estimates, we also find that the pattern of skill accumulation is not an interactive function of parental origin and age of immigration among those coming from countries that are close to Anglophone.
This new working sample has descriptive statistics presented in Table S8 (Online Resource 1).
Recent studies have argued for this imperfect substitutability in the United States (Lewis 2013; Ottaviano and Peri 2012; Peri and Sparber 2009), Germany (D’Amuri et al. 2010), Spain (Amuedo-Dorantes and de la Rica 2011), and the UK (Manacorda et al. 2012).