Abstract
Self-employment plays a crucial role in immigrants’ economic assimilation. Previous studies examining immigrants’ self-employment relied on estimates obtained from national surveys, which could contain measurement error. In this research note, we compare estimates of immigrant men's self-employment obtained from the Current Population Survey (CPS) with those from data linking respondents to their tax records. Our findings indicate that the CPS substantially underestimates the immigrant–native gap in self-employment. In some cases, the rate of self-employment for immigrants from administrative data is nearly double that obtained from survey data alone. Measurement error also appears to distort estimated differences in self-employment among immigrants by race, ethnicity, and national origin. The results highlight the greater importance of self-employment for the labor market integration of immigrant men than was previously known on the basis of survey data alone.
Introduction
Researchers have long been interested in the role of self-employment in immigrants’ economic assimilation (Fairlie and Lofstrom 2015; Portes and Zhou 1996; Waldinger et al. 1990; Zhou 2004). Recent estimates show that immigrants are more likely to be self-employed than native-born workers in the United States (Christnacht et al. 2018; Hipple and Hammond 2016). A variety of factors may account for immigrants’ higher self-employment rate. One strand of work has focused on self-employment as an avenue to overcome labor market disadvantages, such as language barriers, having educational credentials that are undervalued by U.S. employers, and labor market discrimination (Akresh 2007). Studies have also considered how strong family ties, social capital, and coethnic networks shape immigrants’ self-employment and its associated financial returns (Fairlie and Meyer 2003; Portes and Manning 2019; Portes and Zhou 1996). Researchers have also discussed self-employment among the children of immigrants (Zhou 1997) and the possible role of intergenerational transmission of entrepreneurial skills and values (Andersson and Hammarstedt 2011; Hout and Rosen 2000).
Previous analyses of immigrants’ self-employment have largely relied on survey data, which may have limitations. A growing body of research using survey data linked to administrative sources has examined measurement error for an array of survey-based outcomes, including earnings (Kim and Tamborini 2012, 2014), household income (Thompson and Tamborini 2023), and food stamps (Meyer et al. 2022). Recent evidence also shows substantial measurement error in self-employment status (Abraham et al. 2021). However, little is known about how measurement error regarding self-employment might influence our understanding of immigrants’ employment. If measurement error differs by nativity or across subgroups within the immigrant population, it can bias population estimates of the immigrant–native gap in self-employment and distort our understanding of immigrants’ work experience.
In this research note, we aim to contribute to the literature on immigrant labor market integration and assimilation by investigating the prevalence of self-employment among immigrant men in the United States. Using national survey data linked to administrative tax records, we first test the hypothesis that survey measurement error in self-employment status is greater for immigrant men than for native-born men. Next, we examine whether the systematic variation in the pattern of error in measuring self-employment by nativity and within immigrant populations distorts our estimates of immigrant self-employment.
We use restricted data that match respondents of the Annual Social and Economic Supplement (ASEC) from the most recent decade of the Current Population Survey (CPS) to their administrative tax records from the Social Security Administration (SSA) and Internal Revenue Service (IRS). The CPS is a national survey that is widely used to estimate immigrants’ labor market status. The linked administrative records include respondents’ self-employment earnings from tax records, allowing us to compare a person's self-employment status in the CPS with their official tax records.
There are many reasons why immigrants might exhibit greater measurement error in survey-reported self-employment status, defined as a disagreement regarding a person's self-employment classification based on survey data versus administrative records. The kinds of jobs in which immigrants are employed might be more prone to misclassification in surveys. Differences in income levels between immigrant and native-born workers might play a role. Workers with lower incomes might be more incentivized to file self-employment tax forms to qualify for the Earned Income Tax Credit (EITC). On the other hand, some types of self-employment jobs might be more likely to go unreported to the tax authorities, especially low-paying or sporadic self-employment. Cultural differences and language barriers might lead immigrants to more frequently misclassify their work in national surveys.
Data and Methods
Our analysis draws from a restricted-use dataset that matches respondents of the CPS-ASEC to their administrative tax records compiled by the SSA (Davies and Fisher 2009; Duleep and Dowhan 2002). The ASEC is administered in the first quarter of every year and provides information on respondents’ labor market activity for the prior calendar year, including self-employment. To obtain sufficient statistical power to analyze subgroups of immigrants, we pooled together respondents from the CPS-ASEC for 2011–2020.1
We match CPS respondents with the Detailed Earnings Record (DER) file compiled by the SSA, which contains their annual earnings for all jobs reported to the IRS for the same year as the survey, as well as for the year before and after the survey. Wage and salary earnings come from employer-reported W-2 forms. Self-employment earnings are derived from Form 1040 Schedule SE, which the IRS electronically sends to the SSA (Olsen and Hudson 2009).
The rate of successful matches between CPS respondents and administrative records is high, at 85%. Even though the risk of bias due to unsuccessful matches is low (Czajka et al. 2008), we adjust the CPS sampling weights to ensure that our results are representative of the U.S. population. Following previous research (Cheng et al. 2019; Villarreal and Tamborini 2018, 2023), we multiply the CPS person weight by the inverse probability of a successful match given a person's characteristics (i.e., age, race and ethnicity, nativity, household income, and educational level).
Our analytic sample is restricted to matched men aged 25–65 at the time of the CPS interview. We begin at age 25 to ensure that most respondents have completed their formal education. We limit our sample to those aged 65 or younger to avoid problems associated with selective retirement. Sensitivity tests examining different age ranges (25–61 and 25–54) and including an age-squared term showed results similar to those reported here. Our main analysis includes only men because a full examination of women's self-employment would require us to consider additional factors associated with their selectivity into the labor force that are often not available (Aronson 2019; Sanders and Nee 1996). However, in supplementary analysis, the pattern we found among women was similar to that reported here for men (see Tables C1–C3 in the online appendix).
Our analytic sample is further restricted to respondents with positive CPS and administrative earnings from any source for the same year. We exclude farm owners as self-employed by removing respondents whose main job in the CPS is a farming occupation.2 We avoid duplicates in our pooled sample by retaining only those respondents who are in the last four of their eight CPS interviews. These criteria result in a sample of 159,275 male workers. Table 1 compares the social and demographic characteristics of the full CPS sample with the CPS-matched sample using our adjusted weights. Notably, we find a close correspondence in our estimates from the two samples.
Measures of Self-employment
We constructed conceptually similar measures of self-employment using survey (CPS) and administrative (DER) information. Both measures focus on unincorporated self-employment (i.e., on self-employed workers whose business is not officially registered as a corporation) because corporation owners’ earnings will typically be recorded as salary or as corporate dividends for tax purposes (Hipple and Hammond 2016). In line with Abraham et al. (2021), our CPS-based measure first classifies a person as self-employed if their longest job over the past year was unincorporated self-employment. Persons not reporting self-employment as their longest job can still be classified as self-employed if they reported self-employment income from any job over the year. The CPS question does not allow us to distinguish between incorporated and unincorporated self-employment for jobs other than the longest held job. However, this issue is of minimal concern because self-employment that is not part of a person's main job is likely to be unincorporated. In some specifications, we distinguish between primary and secondary self-employment by comparing a person's wage/salary and self-employment earnings levels in the same year. The CPS asks respondents to report their self-employment earnings net of expenses.
Our administrative measure classifies individuals as self-employed if they had positive earnings in their linked Schedule SE tax record for the same year covered by the CPS. This measure is highly likely to reflect unincorporated forms of self-employment because individuals at an incorporated business would pay themselves salaries that would be reported on W-2 forms. Like the CPS, we distinguish between primary and secondary self-employment in the administrative data. Primary self-employment is identified if a person's annual self-employment earnings are higher than their wage/salary earnings. Also like the CPS, the DER measure captures self-employment earnings net of expenses.
For simplicity, we refer to the discrepancy between a person's self-employment status in the CPS and the administrative data as measurement error because we think the administrative data are more accurate. However, there is no “true” self-employment measure, and each measure could fail to capture self-employment for different reasons. The survey data might miss self-employment not reported by the respondent or might misclassify workers, such as coding independent contractors or freelancers as wage/salary workers (Abraham et al. 2021; Abraham et al. 2023). On the other hand, the administrative data might miss self-employment earnings that are not reported in tax forms or are underreported.
Statistical Models
Discrepancy Between the Survey and Administrative Self-employment
We conduct two types of analyses. First, we examine measurement error by estimating linear probability models of self-employment in one source, given that the individual is classified as self-employed in the other (1 = agreement, 0 = disagreement). Respondents’ immigrant generation is the key predictor in our models. We define immigrants (first generation) as individuals born outside the United States to non-American parents. Individuals born in Puerto Rico or other U.S. territories are not considered immigrants because they are U.S. citizens. We use survey information about respondents’ parents’ country of birth to identify the second generation, defined as any U.S.-born individual who has at least one parent born abroad. All remaining individuals are classified as third-plus generation. Note that because survey linkages with administrative records require respondents to have a corresponding Social Security number in SSA's Numident file, unauthorized immigrants are essentially excluded from our analytic sample. In our analysis, we can thus rule out immigrants’ legal status as an explanation for any differences between immigrants and natives.
A series of control variables account for factors that might lead to disagreement in self-employment status between the survey and administrative records. Because the age profile of immigrants might differ from that of native-born workers (Lofstrom 2002), we introduce age as a predictor in all our models. We also include a dummy variable indicating whether a respondent has a bachelor's degree or higher. In models presented in the online appendix (section A), we tested separate models by education and found consistent results. We include racial and ethnic categories as predictors in some models using respondents’ self-identification in the CPS.3 We also control for marital status and the number of children under age 18 living in the household.
Some models include respondents’ self-employment earnings levels, specified in quintiles. We use either the CPS or administrative measure of earnings, depending on the sample. Additionally, some models include an indicator of sporadic self-employment drawn from the administrative data. Sporadic self-employment is defined as having self-employment earnings in the tax records in the CPS reference year but not in the prior or subsequent year. Some models also control for the number of jobs an individual held during the year. For the CPS data, we sum the number of jobs reported over the reference year; for the administrative records, we count the number of unique Employer Identification Numbers (EIN). Finally, we test models including occupational fixed effects. We use three-digit occupational categories based on the 1990 census classification system using the crosswalk provided by IPUMS (Flood et al. 2022). Because occupation is available in the CPS data only for the longest job held during the year, we include occupational fixed effects in models of primary self-employment.
Additional controls include a dummy variable indicating whether the respondent received the EITC according to the CPS. This control is important because some self-employed workers might have an incentive to file a Schedule SE form to qualify for the EITC. We also account for survey nonresponse by including dummy variables indicating a census-based imputation of labor market status and a proxy response. Finally, some models include a person's state of residence at the time of the interview, and all models include fixed effects for the survey year.
The Impact of Measurement Error on Immigrant Self-employment
The second part of our analysis assesses the impact of measurement error by comparing the estimated immigrant–native gap in self-employment between the CPS and the administrative data. We estimate two models using our matched sample of men who have positive earnings in both the CPS and their tax records. The first model estimates the probability of self-employment according to the survey data. The second model estimates the probability of self-employment using the same sample, but this time on the basis of administrative records. These models include similar control variables as our models of measurement error described earlier, but they remove the economic variables that could be a consequence of self-employment, such as earnings. In this part of our analysis, we also examine differences among immigrants by race and ethnicity and by national origin because these are common focal points in studies of immigrant entrepreneurship (e.g., Zhou et al. 2016).
Results
Descriptive Results
Figure 1 shows the percentage of male workers in our sample classified as self-employed according to the CPS and administrative records by immigrant generation. The administrative records show substantially higher self-employment rates for all men. However, the differences between the survey and administrative data are larger for immigrant men than for native men. For instance, the administrative records show a self-employment rate that is 79% higher than the CPS for third-plus-generation men and 123% higher for first-generation immigrant men. The CPS appears to understate self-employment for immigrants more than for native-born workers. As a result, immigrant–native differences in self-employment are narrower in the CPS than in the tax data.
Table 2 further breaks down the level of disagreement between men's self-employment classification for each immigrant generation according to our two data sources. The off-diagonal cells indicate the level of disagreement between administrative records and the CPS for immigrant men relative to native men: 10.7% of native men are classified differently by the two sources, compared with 16.1% of immigrants. In other words, there is a greater disagreement between the CPS and tax records about the self-employment status of immigrants than that of native-born workers. The disagreement between the two sources is particularly large for men who are self-employed according to their tax records but not the CPS (i.e., the lower left-hand cells in all panels of Table 2).
Multivariate Results of Measurement Error in Self-employment
Our first set of regression models examines the probability of agreement in men's self-employment survey and tax record classification. Table 3 shows the results of our models predicting the probability of being classified as self-employed in the CPS for individuals who were classified as self-employed in their administrative records. A positive outcome indicates agreement between the two data sources, whereas a negative outcome indicates disagreement and suggests a false negative for self-employment in the CPS.
The results reveal that being an immigrant is associated with a lower probability of being classified as self-employed by the CPS among the sample of men classified as self-employed in the administrative records. Model 1 shows that after we control for age, immigrants are 4.2 percentage points more likely than third-plus-generation men to be misclassified as not self-employed in the CPS. The probability of misclassification increases for immigrants in Model 2, which controls for demographic and job characteristics; it also increases in Model 3, which adds controls for survey response characteristics, EITC receipt, and the state of residence. Thus, demographic, job, survey, and other characteristics do not account for immigrants’ higher likelihood of misclassification by the CPS among self-employed male workers in the administrative records.
Model 4 uses the same specification as Model 3 but imposes a stricter definition of self-employment that includes only primary self-employed workers according to the tax records. The results show that the immigrant–native discrepancy is even higher among this subsample. Among workers whose main job is self-employment according to the tax records, immigrants are 8.4 percentage points more likely not to be classified as self-employed in the CPS than third-plus-generation native-born men. By contrast, the coefficient for second-generation men is not significant in any of the models. Finally, Model 5 includes fixed effects for men's occupation. The results continue to show a higher probability of misclassification for immigrant men working in the same occupation as natives.4
Table 4 examines misclassification in the opposite direction as Table 3 by examining the probability that self-employed men in the CPS are also classified as self-employed according to their tax records. The results are remarkably different from those in Table 3. Among self-employed men in the CPS, immigrant status is not significantly related to the probability of being classified as self-employed in the tax records in any model specification. In combination with the prior results, these findings suggest an underestimation of immigrant self-employment in the survey data relative to the administrative records but not the other way around. The underreporting of self-employment earnings to tax authorities (“under-the-table” earnings) does not appear to account for the measurement error we observe; if they did, we would observe a higher rate of immigrants with self-employment earnings in the CPS not reporting self-employment earnings in their tax records.
Impact of Measurement Error on Estimates of Self-employment Among Immigrants and Natives
In this section, we present results from regression models that examine the consequences of measurement error for estimates of immigrants’ self-employment status. The analytic sample for these models includes all workers. Table 5 shows the results of separate regression models predicting self-employment based on the CPS and administrative records including the same set of control variables (see the table notes). The final column in the table shows the estimated difference in the coefficients from the two models, along with a test of its statistical significance. In both data sources, immigrants have a higher probability of self-employment than native men. However, the survey data severely underestimate immigrants’ self-employment: immigrants are 2.5 percentage points more likely to engage in self-employment than natives according to the CPS measure but are 7.7 percentage points more likely to engage in self-employment according to the administrative records. The difference of 5.2 percentage points is statistically significant.
Table 6 examines differences by race and ethnicity and by national origin. We exclude second-generation men because we found no significant differences in their self-employment status across the two data sources. The left panel of Table 6 shows significant differences between the survey and administrative data for immigrants of different race and ethnic groups. For example, whereas the model using the CPS measure indicates that Hispanic immigrant men are not significantly more likely to be self-employed than White natives, the model using the administrative measure indicates that Hispanic men are significantly more likely to engage in self-employment (5.3 percentage points higher).
The right panel of Table 6 shows the results by national origin groups. These estimates are broadly consistent with those from our models separating immigrants by race and ethnicity, which is not surprising given the strong correspondence between race and ethnicity and national origin. For instance, the self-employment of immigrants from Mexico, Central America, and other Latin American countries is significantly understated in the CPS relative to the administrative records. Immigrant men from Central America are not significantly more likely to be self-employed than native Whites according to the CPS, but they are 9.7 percentage points more likely to be self-employed according to the administrative records. Reliance on survey estimates of self-employment leads to an incorrect perception regarding the relative level of self-employment among Hispanic immigrants of certain national origins.
The administrative data also tell quite a different story for Asian men. Estimates from the survey data indicate that the probability of self-employment among Vietnamese immigrant men is not statistically different from that of third-plus-generation White men. By contrast, the administrative data indicate that Vietnamese immigrant men have nearly a 10.8-percentage-point-higher probability of being self-employed than third-plus-generation White men. We also observe an underestimation of African immigrants’ self-employment using the survey data. The probability of self-employment among immigrants from Africa does not differ significantly from that of third-generation White men in the CPS but is 10.9 percentage points higher in the administrative records.
Sensitivity Analyses
We conducted sensitivity tests to evaluate the robustness of our findings. First, we tested separate models for individuals with and without a college degree to examine differences in measurement error and self-employment classification of immigrants by level of education. The results show a broadly similar impact of measurement error on estimates of self-employment for men in both educational categories (Table A3, online appendix). Second, we explored whether the increase in jobs in the gig economy in recent years could explain the observed disparities in self-employment measurement error. To do so, we estimated our models using data from 1994–2006, which predate the start of the gig economy.5 The results were fully consistent with those presented earlier (see Tables B1–B3, online appendix), suggesting that the growth of the gig economy does not account for the observed patterns.
Third, to explore the sensitivity of our results to how long immigrants have lived in the United States, we replicated our analyses, distinguishing between immigrants who arrived in the country less than 10 years ago and those who arrived earlier. Our findings (presented in Tables D1–D3, online appendix) were consistent with those presented here. Both categories of immigrants based on their time since arrival suffer from a larger underestimation of their self-employment status in the survey data relative to native-born workers. Finally, we examined whether measurement error differed among immigrants by their U.S. citizenship status. We found that the CPS underestimates the self-employment of immigrants with and those without U.S. citizenship relative to natives.
Conclusions
We find systematic differences in measurement error between immigrants and nonimmigrants in the CPS. Immigrants’ self-employment is more likely to be underreported in the CPS than in tax records. This pattern leads to a substantive underestimation of immigrants’ self-employment. In some cases, immigrants’ self-employment rate in the administrative data is nearly double that obtained from survey data. The underestimation of immigrants’ self-employment activity is important because many studies rely on estimates from the CPS.
Our findings also suggest that survey measurement error distorts our understanding of self-employment among immigrants by race and ethnicity and by national origin. For example, after we control for other relevant factors, estimates from CPS data indicate that the probability of self-employment among Central American and Cuban male workers does not significantly differ from that of later-generation Whites. Yet, findings from the administrative tax data show that Central American and Cuban immigrant men are 9.7 and 7.4 percentage points more likely to be self-employed than native Whites, respectively. Altogether, the results highlight the greater importance of self-employment for the labor market integration of immigrant men of different ethnoracial and national origin groups than was previously known.
Our regression analysis shows that the types of jobs in which immigrants and natives are employed and other social and demographic differences do not fully explain the underestimation of immigrants’ self-employment in the survey data. Because the key drivers of differences in estimates of self-employment by nativity between survey and administrative data remain unclear, the lessons for improving surveys to reduce measurement error are not straightforward. One important factor that might drive immigrant–native differences could be unobserved cultural differences and language barriers that make immigrants (or interviewers) more confused about their labor market activity during the interview. Immigrants might be more likely to misunderstand the CPS self-employment question, or cultural differences could influence how jobs are conceptualized and presented to others. Likewise, immigrants might feel less comfortable participating in survey interviews, have lower trust in the government, and exhibit different social desirability biases relative to native-born respondents. Alternatively, immigrants might have a greater incentive to file self-employment tax forms to signal their contribution to the economy, which could be relevant to their visa status. Because our analytic sample includes only matched immigrants, we can rule out immigrants’ legal status as a possible explanation.
Less survey accuracy in measuring immigrants’ self-employment status might also relate to freelance or independent contracting work being categorized as wage and salary work in surveys. Contract work might be misreported by survey respondents (or misinterpreted by interviewers) because it could involve work patterns similar to those of wage/salary employees rather than a person who owns a business (Abraham et al. 2023). Surveys might want to develop questions that distinguish between different forms of self-employment and ask respondents to report the tax forms they filed in the reference year. Higher rates of freelance work among immigrants would also have relevance for our understanding of economic assimilation because independent contracting can provide fewer worker protections, such as unemployment insurance, workers’ compensation, and the right to unionize (Abraham et al. 2023).
Finally, although developing methods to account for differences in survey accuracy in self-employment by nativity is beyond this study's scope, some brief comments are warranted. Calls to use only administrative linked data to study self-employment among immigrants are not advisable or feasible, given that most researchers do not have access to such restricted-use data. Future work might seek to develop adjustments based on information gleaned from estimates of linked survey–administrative data disseminated to the public by researchers with access to such data. Such corrections might be especially useful when the dependent variable is self-employment status. In the short term, however, our findings suggest that more caution should be applied in interpreting the size of immigrant–native differences in self-employment gleaned from survey data alone.
Acknowledgments
The views expressed here are those of the authors and do not represent the views of the Social Security Administration (SSA) or any federal agency. Access to SSA data linked to CPS is subject to restrictions imposed by Title 13 of the U.S. Code. The data are accessible at a secure site, and all findings must undergo disclosure review before their release.
Notes
We control for the CPS year in all models. Results of models interacting the CPS year with immigrant status indicate similar immigrant–native differences over the observation period.
Sensitivity tests examining a sample that includes farm workers show results similar to those presented here.
Multiracial individuals are classified in the largest non-White category with which they identify. Multiracial individuals never exceed 2.5% in any CPS year. The category “Other” mainly includes American Indian and Alaskan Native peoples.
Alternative models in which we added industry fixed effects yielded similar results, suggesting that the industries in which immigrants and natives are employed do not explain the observed differences in measurement error.
For example, the share-ride company Uber started in 2009, and Airbnb began operations in 2008.