Abstract
Immigration economists often disagree about whether comparably skilled immigrants and natives are perfect substitutes in the United States and other developed countries, leading these scholars to different assessments of the labor market impacts of immigration and policy recommendations. This article attempts to provide theoretical bases for understanding the immigrant-native substitution and to introduce machine learning techniques to resolve the empirical debate. Using the male subsample from the US Census and American Community Survey, it shows that the difference in covariate selection explains substantial disagreements in estimating immigrant-native substitution. Given the difficulties in providing compelling theoretical justifications for covariates selected, this article proposes estimating via the Lasso-type (least absolute shrinkage and selection operator) estimators. My Lasso-based estimation rejects perfect substitution, but it also implies easier substitution than that preferred by Ottaviano and Peri, suggesting more direct immigrant-native competition. By extending the sample to women, I find similar immigrant-native substitution across gender. Therefore, this article casts doubt on previous immigration impact assessments. Indeed, my simulation suggests considerable precision gains concerning the immigration's wage impacts on immigrants themselves. Furthermore, this article identifies immigrant segregation as a critical source of the national-level imperfect substitution, which decreases within progressively smaller regions and almost disappears in the same city. By introducing the Lasso-type estimators into migration studies, this article makes solid progress toward evaluating and understanding imperfect immigrant-native substitution and its socioeconomic consequences.
Introduction
Within immigration economics, leading scholars heatedly debate whether comparably skilled immigrants and natives are perfect substitutes in the United States and other developed countries (e.g., Ottaviano and Peri 2012; Borjas, Grogger, and Hanson 2012 for the United States; D’Amuri, Ottaviano, and Peri 2010 for western Germany; Manacorda, Manning, and Wadsworth 2012 for the United Kingdom). In estimating immigration's labor market impacts at the national level, Borjas (2003) made an implicit assumption that they were perfect substitutes in the United States. Ottaviano and Peri (2012) challenged Borjas by showing that comparably skilled immigrants and natives are imperfect substitutes and suggesting that their preferred immigrant-native elasticity of substitution was around 20. 1 To defend his earlier work, Borjas, Grogger, and Hanson (2012) revisited and confirmed the perfect substitution assumption in a highly saturated specification. Until recently, the debate on immigrant-native substitution has not reached a broad consensus.
The above debate is crucial for evaluating the direction, size, and distribution of immigration impacts on destination labor markets. As Card (2009, 2) puts it, “if immigrants and natives in the same skill category are imperfect substitutes, the competitive effects of additional immigrant influx are concentrated among immigrants themselves, lessening the impacts on natives.” Some economists, thus, consider such imperfectness as a principal explanation for why a large body of empirical studies, starting with Card (1990), cannot identify any sizeable negative effect of immigration on native workers’ labor market performances (e.g., Lewis and Peri 2015). They further argue that once the imperfect immigrant-native substitution is taken into account, the negative wage effects of immigration, reported in Borjas (2003), diminish. By contrast, immigrants already settled in the United States could face more competition from new immigrants (e.g., Ottaviano and Peri 2012).
The same debate also contributes to understandings of the social structures in which immigrant-native labor market competition is embedded. Social scientists have long been fascinated by structural factors such as labor market segmentation (e.g., Piore 1979) and spatial segregation (e.g., Duncan and Lieberson 1959). 2 These factors tend to mitigate direct competition between immigrants and natives (Piore 1979), facilitate specialization, foster complementarity, and thus attenuate the negative immigration labor market impacts on natives (e.g., Bean, Gonzalez-Baker, and Capps 2001; Peri and Sparber 2009). Estimating the immigrant-native substitutability could offer a parsimonious measure of these complex and often-intersecting socioeconomic forces.
A firm grasp of evidence of immigration's labor market impacts is essential for policy-making. Moreover, since immigration's labor market impacts are not evenly distributed, we can identify the “winners” and “losers” of immigration. Those substantially affected by immigration may need assistance in ways that raise fiscal burdens borne by governments (Blau and Mackie 2017; Smith and Edmonston 1997). With imperfect immigrant-native substitution, economic theory predicts that less-skilled immigrants will be vulnerable to immigration influx (e.g., Ottaviano and Peri 2012). If such a prediction is valid, policies alleviating poverty and promoting the assimilation of these immigrants and their children should be of high priority.
Given the significance and policy relevance of the immigrant-native substitution debate, this article employs Lasso-type (least absolute shrinkage and selection operator) estimators, one of the machine learning (ML) techniques with which social scientists are most familiar (Mullainathan and Spiess 2017), 3 to shed fresh light on the debate. Using the male samples of the US Censuses and American Community Surveys (ACS) since 1960, I demonstrate that a substantial fraction of the debate can be attributed to researchers’ distinct covariate choices, yet none of which can be fully justified by existing socioeconomic theories. Fortunately, the Lasso is known for its strength in variable selection (Tibshirani 1996). Its recent advances, especially the Post-Double-Selection (PDS) Lasso estimator developed by Belloni, Chernozhukov, and Hansen (2014a), further adapt the Lasso into a causal inference method. Applying the Lasso-type estimators eliminates the biases caused by omitted labor demand shocks that trigger immigration and helps us cope with the challenges of endogenous immigration. Therefore, the Lasso-type estimators fit perfectly for detecting imperfect immigrant-native substitution.
Indeed, the applications of the Lasso-type estimators proposed here yield novel findings on immigrant-native substitution that are sufficiently different from both Borjas (2003) and Ottaviano and Peri (2012). Building on these estimates, I further simulate immigration's wage impacts on immigrants and natives from 1990 to 2006. In agreement with imperfect substitution, I find that immigrants, especially low-skilled immigrants, were more likely to be negatively affected by immigration. My simulation also demonstrates that immigration's wage impacts on immigrants themselves are particularly sensitive to the immigrant-native substitutability, implying substantial precision gains by adopting the PDS Lasso estimator.
Moreover, the Lasso-type estimators’ capability to handle high-dimensional data allows me to explore beyond the pioneering studies on immigrant-native substitution mentioned before. One particularly noticeable finding is that by adopting the PDS Lasso, I identify the critical role of spatial segregation between immigrants and natives in shaping imperfect substitution at the national level. Evidence from various local levels, ranging from census divisions via states to cities, suggests that spatial segregation by nativity provides a very powerful explanation for the national-level imperfect substitution.
The rest of this article is organized as follows. I, first, review the relevant socioeconomic theories for evaluating and understanding the immigrant-native substitution and its labor market implications. Then, I introduce the analytical framework and highlight the empirical challenges. After introducing the data and historical context, I demonstrate the importance of covariate selection and employ the Lasso-type estimators as covariate selectors and causal inference toolkit. The PDS Lasso estimation yields the main findings. I further discuss the endogeneity issues and provide evidence for imperfect immigrant-native substitution among women. To offer key ideas about the sensitivity of immigration's impacts to the immigrant-native substitutability, I simulate the wage impacts of interest under alternative substitution estimates. Last but not least, by applying the PDS Lasso to US regional-level data, I explore the relationship between spatial segregation and imperfect substitution, suggesting that the national-level imperfect substitution can be largely understood as a consequence of immigrant segregation.
Theories on Immigrant-Native Competition and Imperfect Substitution
While the evaluation of immigration's labor market impacts is primarily an empirical matter (Blau and Mackie 2017), theories from different social sciences provide essential concepts, analytical frameworks, and deep insights that help better understand such impacts (e.g., Bean, Gonzalez-Baker and Capps 2001; Brettell and Hollifield 2015). Hence, I briefly review the related theoretical literature before moving to the empirical investigation to resolve the immigrant-native substitution debate and shed light on the size and distribution of immigration's impacts. 4
Substitutability and Labor Market Impacts of Immigration
For most economists, the concepts of substitutability/complementarity between immigrants and natives offer keys to analyzing immigration's labor market impact (e.g., Blau and Mackie 2017). The influx of a particular type of immigrants is expected to reduce the wages of those who substitute for them while it raises the wages of those who complement them (e.g., Borjas 2014). In many sociological and demographical studies, these concepts play equally important roles in studying the immigrant-native competition, its consequences for minority groups, and immigrant assimilation (Lin and Weiss 2019; Pais 2013; Waldinger and Lichter 2003; Waters, Kasinitz, and Asad 2014).
Furthermore, these concepts can be operationalized and incorporated into the structural estimation framework to evaluate immigration's labor market impacts (Borjas 2003; Ottaviano and Peri 2012). This framework starts with specifying a production technology describing the relationships between inputs and output. The elasticity of substitution measures the substitutability between any pair of skill-nativity groups. Various labor inputs interact within the framework according to these elasticities, and immigration is mainly viewed as a labor supply shock affecting the size and composition of the labor force at the destination (e.g., Blau and Mackie 2017). To facilitate estimation of immigration's wage impacts derived from the technology, we need two sets of elasticities of substitution (i.e., those across skill groups and those between comparably skilled immigrants and natives). Traditionally, the economic literature has emphasized the former (Card and Lemieux 2001; Katz and Murphy 1992). Partly due to the immigrant-native substitution debate, the latter—within-group substitution—receives increasing attention from researchers (e.g., Ottaviano and Peri 2012). Evidence gathers to support the imperfect within-group immigrant-native substitution in the United States (Blau and Mackie 2017, 206–207). Given this article's objective, I pay more attention to the latter when the literature below allows.
Sources of Imperfect Immigrant-Native Substitution
The review so far has drawn heavily from research on immigrant-native substitution by economists. Researchers from sociology, demography, and geography, however, have also made substantial contributions to the research field, although they often seem more interested in providing theoretical bases for understanding the immigrant-native substitution and its resulting competition instead of measuring them precisely, as economists pursued in the past three decades (e.g., Bean, Gonzalez-Baker, and Capps 2001; Borjas 2014, 79–80). Parallel with economists, other social scientists have offered many insights into the sources of imperfect immigrant-native substitution. This subsection discusses selected explanations from this interdisciplinary literature.
Labor market segmentation provides a plausible explanation for why immigrant and native workers do not directly compete in the labor market and are, thus, imperfect substitutes (Piore 1979). According to the segmented labor market (SLM) theories, there are multiple segments in the economy. These segments differ in wages, working conditions, returns to skills, and upward mobilities (e.g., Massey et al. 1998). Immigrants, especially low-skilled and undocumented immigrants, are usually excluded from the primary sector (e.g., Portes and Bach 1985). The SLM theories also argue that substantial barriers caused by institutional, political-economic, or psychological factors prevent intersectoral mobilities (Piore 1979; Portes and Rumbaut 2014). The ethnic social network widely used in recruitment and prevalent in the workplace strengthens the labor market segmentation between immigrants and natives (Waldinger 1994; Waldinger and Lichter 2003).
Labor market segmentation needs not to conflict with the neoclassical economic paradigm, and many economists recognize the relevance of labor market segmentation to the modern economy (Blau and Mackie 2017, 178; Taubman and Wachter 1986). However, economists are more likely to interpret segmentation as an outcome of individuals’ self-selection in the sense that individuals pursue their own (skill- or task-based) relative advantages and make occupational/sectoral choices (Borjas 1987; Peri and Sparber 2009; Roy 1951). Institutional factors often receive less explicit attention from economists, but they can still affect individual choices by influencing benefit–cost comparisons (Todaro and Maruszko 1987). Regardless of its concrete causes, labor market segmentation can foster imperfect immigrant-native substitution (e.g., Bean, Gonzalez-Baker, and Capps 2001).
Spatial segregation between immigrants and natives at multiple scales provides another plausible yet clearly underexplored explanation for national-level imperfect substitution (see Smith and Edmonston 1997 for a notable exception). Traditionally, immigrants in the United States were more spatially concentrated than natives and more likely to live in a few gateway states and large metropolitan areas/cities (Lewis and Peri 2015; Waters and Pineau 2015). In the past three decades, immigrants’ concentration has decreased by continuing immigrant dispersal (Waters and Pineau 2015). Despite the progress toward assimilation, regional-level spatial segregation remains critical for imperfect immigrant-native substitution in the United States.
In addition, immigrants continue to face challenges from residential segregation within old and new destinations (Waters and Pineau 2015). Researchers disputed the overall trend of within-city residential segregation (Cutler, Glaeser, and Vigdor 2008; Lichter, Parisi, and Taquino 2015). Investigations do reveal declining within-neighborhood ethnic segregation, but the trend is partly offset by rising between-neighborhood segregation caused by “white flight” (Lichter, Parisi, and Taquino 2015; Logan and Zhang 2010; Saiz and Wachter 2011). Hence, within-city immigrant segregation might still account for imperfect substitution.
I further discuss two skill measurement issues that help explain imperfect substitution between seemingly comparably skilled immigrants and natives. (1) In immigration economics, the term “skill” can be understood as education, education joining with experience (Borjas 2003; Ottaviano and Peri 2012), occupation (Card 2001), among others, reflecting that skill is multidimensional. Thus, no matter which skill measure researchers use, other omitted skills could account for the imperfect substitution between workers with comparable skills. A notable example of omitted skills is English proficiency, when skill is measured by education (Lewis 2013). (2) Even for standard skill metrics such as education and experience, measurement can be challenging because of the lack of human capital transferability (Friedberg 2000; Sanromá, Ramos, and Simón 2015). Researchers found skill downgrading at the initial stage after immigration in the United States and other developed economies. Immigrants tend to receive lower returns for the same measured skills than natives when these skills are acquired abroad (Dustmann, Schönberg, and Stuhler 2016). Skill downgrading leads to skill group misclassification and estimation biases. In sum, theoretical literature surveyed before suggests labor market segmentation, spatial segregation, and skill mismeasurement could contribute to imperfect immigrant-native substitution.
Analytical Framework
Estimating the immigrant-native substitutability involves numerous empirical details. Borjas, Grogger, and Hanson (2012) offer an extensive list of these details, including sample selection criteria, the dependent variable's definition, weighting schemes, and covariate selection. This article, however, while acknowledging the importance of other issues, emphasizes one issue that will be proven at the center of the debate: the covariates used in estimating the elasticity of substitution.
To motivate the estimation of immigrant-native elasticity of substitution and the evaluation of immigration's wage impacts that follows, I formally introduce a structural estimation framework by specifying a national nested-CES (constant elasticity of substitution) technology.
5
The technology uses capital (K) and labor supplies (Ljkt) from all education(j)–experience(k)–nativity groups as inputs to produce output (y) in each period (t). Notice that in line with the literature on immigration's economic impacts (e.g., Ottaviano and Peri 2012), most discussions here focus on the US national labor market segmented along the lines of skills and nativity. Identification of immigration's wage impacts is achieved by exploiting variations in labor supply shocks due to immigration and wages across skill groups and over time. I return to the regional-level analysis when exploring the relationship between spatial segregation and imperfect substitution. For now, the national-level production function takes the form of
Under the assumptions of competitive markets, which imply equalization between wage and marginal value products, I obtain the following regression equation used to estimate the immigrant-native elasticity of substitution.
Empirical Implementation and its Challenges
Empirically, the elasticity of substitution
To guard against omitted variable bias in estimation, ideally, the relative labor demand shocks
A close reading of Ottaviano and Peri (2008, 2012) and Borjas, Grogger, and Hanson (2012) suggests they made very different decisions about the “best” covariates. Ottaviano and Peri generally prefer more parsimonious specifications than Borjas and his coauthors. For example, the basic specification in Ottaviano and Peri (2008) only contains the education-by-experience interactions (Equation (7)); Ottaviano and Peri (2012) additionally control for the year fixed effects (Equation (8)), while compared to Ottaviano and Peri (2012), Borjas, Grogger, and Hanson (2012) further include the education-by-year and experience-by-year interactions (Equation (9)).
In my opinion, Borjas’ criticism makes some sense, especially when applied to low-skilled workers, such as high-school dropouts among immigrants and natives whose years of schooling may differ sufficiently. However, Borjas’ proposed remedy of controlling for skill, year fixed effects, and their interactions is neither adequate nor practical. The remedy is inadequate because we cannot entirely rule out three-way interactions. Immigrants and natives within the same education-experience cell could still exhibit some heterogeneity; thus, they may be subject to different labor market experiences. Constrained by the capacity of traditional regression techniques, the remedy is impractical since it introduces an excessively large number of covariates into the estimation. Therefore, the challenges in providing compelling justifications for covariates selected call for novel empirical solutions. I return to this point after introducing the data and context for the period under study.
Data and Historical Context
For subsequent empirical investigations, I use the same microdata, namely, the US Census, 1960–2000, and the 2006 ACS (Ruggles et al. 2020), 6 to enable comparison between this article, Ottaviano and Peri (2012), and Borjas, Grogger and Hanson (2012). The Census and ACS data are standard sources that have been intensively used across the social sciences to investigate the impacts of immigration, assimilation, and related issues in a US context (Blau and Mackie 2017; Waters and Pineau 2015). 7
I carefully follow Ottaviano and Peri (2012) to construct the samples and define variables. For comparability purposes and analytical simplicity, I focus primarily on men. To mitigate the limitation, I provide key findings on immigrant-native substitution among women and then compare across gender. For each gender, the employment samples include workers aged 18 and older. The wage samples further restrict the observations to employees in employment samples to produce average wage rates (Katz and Murphy 1992). The variable definitions are also consistent with Ottaviano and Peri (2012). Two measures for human capital (education and experience) are employed to delineate skill cells. 8 Since the Census and ACS do not collect actual experience, I use the potential experience, defined as the years since completing school. I further divide these skill groups into immigrant and native subgroups in terms of birthplace. Consequently, each year, workers in employment and wage samples are sorted into one of the 64 cells jointly defined by four education groups, eight five-year-experience intervals, and the immigrant-native dichotomy. Pooling skill-specific employment and wage data from six survey years prepares the final dataset (sample size: 192) for empirical investigation.
It is worth noting that the data I used were collected in a period that roughly coincides with an era of US immigration, starting from enacting the 1965 Amendment to the Immigration and Nationality Act (Massey et al. 1998; Portes and Rumbaut 2014). The resurgence of mass immigration, accompanied by shifts in immigration's skill mix and origin countries, significantly affects the size and composition of the US labor supply (Waters and Pineau 2015, 50). Meanwhile, the US economy underwent a profound transformation. A complex set of economic forces, notably technological innovations and globalization, have precipitated industrial restructuring, resulting in substantial changes in labor demands for different skill-nativity groups (Katz and Autor 1999; Portes and Rumbaut 2014). The supply shifts caused by immigration, interplaying with profound demand changes since the 1960s, provide the historical context for immigrant-native labor market competition.
The Central Role of Variable Selection
As argued before, to ensure that the substitution estimates have causal interpretations, the covariates selected must capture all relevant labor demand shocks, without any omission. In practice, however, suppose the substitution estimates were insensitive to the covariates selected, I may safely ignore these debates. Nevertheless, in what follows, I show that variable selection matters for estimation. Among other estimation details, I find that Ottaviano and Peri, and Borjas and his coauthors’ distinct covariates choices considerably explain the difference in substitution estimates. Depending on the covariates included, the ordinary least square (OLS) regression may produce outputs ranging from mild imperfect substitution to perfect substitution.
Table 1 demonstrates the significance of covariate selection for immigrant-native substitution estimation. Table 1, row 1, employs a setting equivalent to Ottaviano and Peri (2012), used as the benchmark below. By inspecting the results rightwards, it is evident that the values of
OLS Estimates of Immigrant-Native Elasticity of Substitution among Men.
Notes: OLS regressions, based on Censuses 1960–2000, and ACS 2006, male subsamples. Robust standard errors are reported in parentheses and clustered by education-experience cells; adjusted R2 are reported in brackets. The weights in row 1 are the total employment in skill groups. The weights in rows 2 and 3 are the inverse of the sampling variance of the dependent variables. Covariates such as “Education-by-experience effects” denote all two-way interactions between the education and experience fixed effects, “Year effects” denote all year fixed effects. * Statistically significant at 10% level; ** at 5% level; *** at 1% level.
Meanwhile, the standard error of
The empirical regularities obtained from the first row of Table 1 continue to hold when specifications other than covariate selection vary. Therefore, Table 1 illustrates that although the difference in covariate selection is not the only explanation for the disagreement on substitution estimates, it doubtlessly lies at the center of the ongoing immigrant-native substitution debate. To resolve the debate, we must identify the appropriate covariates first.
The Lasso-Type Estimators
Despite the importance of covariate selection, the existing economic literature on immigration's labor market impacts does not provide precise suggestions for such selection. In my view, the weakness of their arguments relates to the vagueness of economic theory. Neoclassical labor demand theory, in particular, is often too abstract to offer practical guidance on covariate selection among numerous skills, year-fixed effects, and their interactions. 10 Social theories on labor market competition and immigrant-native substitution provide many insights but may not succeed in the above task either. To shed light on this challenging problem, I approach it from a new angle by applying ML techniques to select covariates. The philosophy of my subsequent investigation is to let the data speak for themselves (e.g., Mullainathan and Spiess 2017). Given my dataset's relatively small sample size, dimension reduction becomes a prerequisite for subsequent estimation and causal inference.
The ML techniques employed here to reduce dimensionality and facilitate causal inference are Lasso-type estimators, including the Lasso and its recent development—the PDS Lasso. These estimators have gained growing popularity among economists and other social scientists. Aside from their traditional applications targeting prediction (e.g., Chalfin et al. 2016), these techniques have also been adapted for causal inference and are applied to diverse topics in economics and related disciplines, such as minimum wage (Allegretto et al. 2017), education (Angrist and Frandsen 2022), training (Knaus, Lechner, and Strittmatter 2022), and immigration (Abramitzky et al. 2019). All of these studies tend to make full use of Lasso-type estimators, whose strengths as covariates selectors support more credible estimation, more precise statistical inference, and sensitivity analyses. 11
The Lasso minimizes the sum of squared residuals subject to an L1-penalty, which equals the sum of absolute values of all coefficients:
The PDS Lasso, while inheriting the Lasso's capability in dimension reduction, can alleviate the estimation biases introduced by the Lasso, as I discuss later. Using the PDS Lasso can help detect imperfect substitution between comparably skilled immigrants and natives at the national level and, thus, make justified decisions on whose conclusions, Ottaviano and Peri (2012) or Borjas, Grogger, and Hanson (2012), should we place more confidence. Furthermore, the introduction of PDS Lasso could contribute to our understanding of immigration's labor market impacts, a point that I revisit after estimating immigrant-native substitutability.
Preselection and Estimation via the Lasso
To demonstrate the strengths of Lasso-type estimators, I first provide preliminary estimates of the immigrant-native substitutability by applying the Lasso to specifications recommended by Ottaviano and Peri (2012) and Borjas, Grogger, and Hanson (2012). Moreover, unlike regression techniques, the Lasso works for the high-dimensional data with n < p. Therefore, I also report estimation results, using the Lasso to search among the full set of skills and year fixed effects and their two- and three-way interactions. The fully saturated specification nests Ottaviano and Peri (2012) and Borjas, Grogger, and Hanson (2012) specifications as special cases. Controlling these fixed effects and interactions may eliminate bias caused by omitting time-varying and skill-specific relative labor demand shocks. 12
Table 2 below compares the OLS and Lasso estimates of
OLS and Lasso Estimates.
Notes: Based on Census/ACS, 1960–2006, male subsamples. Throughout the table, I report the unweighted estimates. Model specifications other than covariate selection are similar to those in Table 1, row 1. The Lasso results are estimated using the Stata package “lassopack” (Ahrens, Hansen, and Schaffer 2018). The setting “Ottaviano and Peri (2012)” controls for the year fixed effects and education-by-experience interactions. The setting “Borjas, Grogger, and Hanson (2012)” controls for year fixed effects and all two-way interactions. The setting “FULLSET” denotes the fully saturated specification, which controls all main effects and their two- and three-way interactions. NA denotes “not available.” The tuning parameter λ is selected using the extended Bayesian Information Criterion, the package's default option.
Notice that Table 2 does not report standard errors for the coefficient
Debiasing the Lasso
Provided that the Lasso could select all nonzero covariates (“model selection consistency”), we could simply apply the OLS to all covariates selected by Lasso, to give unbiased estimates. However, the precondition needs not hold for most real applications. Specifically, for the question considered here, the first-stage variable selection tends to omit covariates highly correlated with the independent variable of interest, i.e.,
To alleviate the omitted variable biases, Belloni, Chernozhukov, and Hansen (2014a) developed the PDS Lasso, adapting the Lasso into a valuable causal inference method. The method can be implemented in three steps. First, we use the Lasso to select covariates useful for predicting the independent variable of interest. The variables that are closely related to the independent variable of interest are, thus, kept in the model. Second, we use the Lasso to select covariates that predict the dependent variable. Finally, we estimate the key parameter by running (e.g., OLS) regression of the dependent variable on the independent variable of interest and the union of covariates selected in the previous double-selection procedure.
Compared with the Lasso, the PDS Lasso has another appealing advantage for my purpose. It provides consistent standard errors for the parameters of interest, enabling us to draw a solid conclusion on the existence of imperfect immigrant-native substitution. Because of these two advantages, I exclusively rely on the PDS Lasso for all subsequent investigations.
Table 3 presents the PDS Lasso estimates of
PDS Lasso Estimates.
Notes: The first column uses Census/ACS 1960–2006, male subsamples, while the second column uses the Census 1960–2000, ACS 2010 and 2018, male subsamples. Robust standard errors are reported in parentheses and clustered by education-experience cells. The PDS Lasso is implemented using the Stata package “pdslasso” (Ahrens, Hansen, and Schaffer 2018). Standard errors are reported in parentheses. * Statistically significant at 10% level; ** at 5% level; *** at 1% level.
The resulting
Now that my substitution estimates in Table 3 lie between Ottaviano and Peri and Borjas’ estimates, I conjecture whether Borjas, Grogger, and Hanson (2012) could overestimate natives’ wage impacts, whereas Ottaviano and Peri (2012) could underestimate them. Immigration's wage impacts on immigrants could also be biased, but in reverse directions. My subsequent discussion will offer a quantitative assessment of these conjectures based on microdata from 1990 and 2006.
Besides, I include ACS 2010 and 2018 (but exclude ACS 2006) to reflect the recent labor market dynamics, partly as a robustness check. For the specifications in the first two rows, the elasticities of substitution between 1960 and 2018 resemble those between 1960 and 2006. However, when the fully saturated specification in the third row is adopted, the elasticity reduces from 34*** to 24***, suggesting that comparably skilled immigrants and natives have become less substitutable in the past decade. 16
Coping with the Endogenous Immigration
Some readers might have noticed that all my previous estimations, no matter which concrete estimation techniques they adopted (OLS, Lasso, or PDS Lasso), did not pay attention to the possibility of endogenous immigration. Although it seems that such practice prevails in studies structurally estimating immigration's wage impacts, shared by both Ottaviano and Peri (2012) and Borjas, Grogger, and Hanson (2012), researchers may be still concerned about the endogeneity biases from three sources: omitted variable bias, measurement errors, and simultaneous equation bias.
As the PDS Lasso enables us to search among all potential components of relative labor demand shocks
Endogeneity Issues.
Notes: The first column uses Census/ACS 1960–2006, male subsample, while the second column uses the Census 1960–2000, ACS 2010 and 2018, male subsample. Robust standard errors are reported in parentheses and clustered by education-experience cells. The PDS Lasso estimates are obtained using the Stata package “pdslasso.” The IV-Lasso estimates are obtained using the Stata package “ivlasso” (Ahrens, Hansen, and Schaffer 2018). I employ the SSIV to mitigate the endogeneity biases caused by measurement errors. I also follow Borjas (2003) to use the log immigrant employment in each skill cell as the instrument. In all estimations reported in the table, the full set of the fixed effects, two- and three-order interactions are subject to preselection. Standard errors are reported in parentheses. * Statistically significant at 10% level; ** at 5% level; *** at 1% level.
However, controlling for covariates selected by the PDS procedure cannot mitigate the simultaneous equation bias caused by feedbacks from the labor supply to demand. Under a commonly used, yet vulnerable, identification assumption that immigrant influx into particular skill groups was independent of the relative wages offered to the various skill categories (Borjas 2003, 1361), I also provide tentative IV Lasso estimates by instrumenting the relative labor supply with immigrants’ labor supply in that skill group. The resulting IV Lasso estimates (row 3) usually resemble the PDS Lasso estimates, suggesting small endogeneity biases.
It is worth emphasizing that without convincing exogenous supply-shifters (such as refugee influx), I cannot argue with certainty that my PDS Lasso estimates are free from endogeneity bias. Fortunately, I can prove under some reasonable assumptions that even if the PDS Lasso estimates were biased, they tend to bias upwards.
18
Thus, at its minimum, my PDS Lasso estimates offer upper bounds for the true
Female Immigrant-Native Substitution
By focusing solely on men, my previous empirical investigation has neglected women entirely, who accounted for roughly a half of US residents in the early 21st century (Donato et al. 2011). The neglect of women is, thus, an apparent limitation of this article as well. This section attempts to partially overcome such limitation by applying the PDS Lasso estimator to women and the pooled sample, providing main findings on the immigrant-native substitution among women.
Women in the United States generally have weaker labor market attachments and more frequently experience career interruptions than men (Blau and Kahn 2017). This observation applies to all women, especially immigrant women. The widely used potential experiences are, thus, likely to be imprecise for women as a whole (Blau and Kahn 2013). Partly due to the concerns about measurement errors and resulting attenuation biases, some labor economists limit their analyses to men (e.g., Card and Lemieux 2001). However, for the central question addressed here, the consequence of measurement errors seems less severe. According to Table 5, the PDS Lasso estimates of β always have the right signs and are significant. Hence, similar to estimates based on men, I find no evidence for perfect immigrant-native substitution among women or clear signs of attenuation bias. The immigrant-native substitutability among women is often similar to that among men. 19
PDS Lasso Estimates Across Gender.
Notes: The first three columns use Census/ACS 1960–2006, male, female, and pooled samples, while the second three columns use the Census 1960–2000, ACS 2010 and 2018, male, female, and pooled samples. Robust standard errors are reported in parentheses and clustered by education-experience cells. * Statistically significant at 10% level; ** at 5% level; *** at 1% level.
The absence of attenuation biases among women in Table 5 might be consistent with an observation that for workers with similar observed skills, the difference in their actual experiences occurs more along line of genders than nativity because of immigrant and native women's lower labor force participation rates than men (e.g., Blau, Kahn, and Papps 2011). Hence, when estimating female immigrant-native substitutability, the influences of measurement errors in immigrant and native women's experiences may cancel one another out. The resulting attenuation bias tends to be small. Furthermore, Table 5 shows that the magnitudes of β estimates based on pooled samples are usually smaller than those based on single-gender subsamples. Since successfully applying the structural estimation approach depends on pre-assigning workers into correct skill cells (Dustmann, Schönberg, and Stuhler 2016), pooling seemingly comparable men and women may introduce substantial measurement errors, and, thus, attenuate the pooled estimates.
Sensitivity of Wage Impacts to Immigrant-Native Substitutability
Estimating the immigrant-native elasticity of substitution is interesting in its own rights, especially when it reveals the social structure in which immigrant-native competition is embedded. For many other cases, however, researchers are interested in this parameter to evaluate immigration's wage impacts. Theoretically, comparably skilled immigrants and natives are expected to be affected differently by immigration, provided that the substitution is imperfect. By confining the competition caused by immigration to immigrants, the imperfect substitution—or more precisely, the underlying structure that generates it—attenuates natives’ wage impacts.
This section builds on earlier discussions to evaluate immigration's wage impacts. I do not intend to provide a definitive evaluation of such impacts because it inevitably involves other conceptual and empirical issues subject to debates, as is briefly mentioned. My primary goal is to examine the sensitivity of such impacts to alternative substitution estimates mentioned before. The experiments also convey key ideas on the significance of introducing the Lasso-type estimators to understand immigration's labor market impacts and other related areas.
Simulating Immigration's Wage Impacts
Conditional on alternative immigrant-native elasticities of substitution, I adopt the structural estimation approach to simulate immigration's wage impacts from 1990 to 2006, on which Ottaviano and Peri (2012) focused.
20
I again restrict the sample to men. Given the nested CES technology (Equations (1)–(4)), the simulation requires two additional elasticities of substitution – namely, that between education groups (
The economic literature often distinguishes between immigration's short- and long-run impacts (Blau and Mackie 2017; Lewis and Peri 2015). The former is simulated when capital stock is fixed, while the latter is simulated when capital stock fully adjusts to demand changes. Since capital adjustment absorbs immigration shocks, short-run immigration's wage impacts are more substantial than their long-run counterparts. Although researchers often disagree about which temporal framework should be adopted (Borjas 2003; Ottaviano and Peri 2012), the simulations of short- and long-run wage impacts offer bounds for actual impacts (Blau and Mackie 2017, 229). I follow the distinction between the short- and long-run impacts below.
Wage Impacts under Alternative Immigrant-Native Substitutability
This subsection presents the simulation results. Table 6, Panel A focuses on the short-run wage impacts on different skill-nativity groups. Immigration between 1990 and 2006 reduced the average wage of US natives by around 2 percent, which is arguably small, considering the massive immigrant influx. Those impacts, though, were unevenly distributed. High-school dropouts’ wages were more strongly affected than others, reduced by 6–7 percent. College graduates came second, and their wage was reduced by 2–3 percent. The bimodal skill distribution of recent immigration waves can explain these findings (Blau and Mackie 2017; Portes and Rumbaut 2014). Moreover, the short-run wage impacts on immigrants were consistently larger than on natives. High-school dropouts, in particular, experienced a substantial wage reduction.
The Wage Impacts of Immigration in the Short and Long-Run.
Notes: Based on Census/ACS data, male subsample. Panel A reports the simulated short-run wage impacts of immigration between 1990 and 2006, while panel B reports the long-run wage impacts. I assume the elasticity of substitution among all education groups is 2, while the elasticity of substitution among experience subgroups is 5. These parameters, the immigrant-native elasticities of 20 (Ottaviano and Peri 2012), 34 (my preferred estimate), and infinity (Borjas, Grogger, and Hanson 2012), together with the wage shares of each skill-nativity cell in 1990 are used to simulate the wage elasticity of natives and immigrants. These wage elasticities are combined with the labor supply shocks due to immigration to yield the wage impact estimates above.
Panel B offers the long-run results. Because capital adjusts to immigration shocks, the wage impacts decreased for all education-nativity groups, compared to the short run. Among native workers, only high-school dropouts suffered from wage reduction. Wage impacts for other groups often clustered around zero or became positive. Nevertheless, even in the long run, most immigrants were still negatively affected. Besides, my Lasso-based simulated wage impacts always lay between two polar cases preferred by Ottaviano and Peri (2012) and Borjas, Grogger, and Hanson (2012).
More importantly, Table 6 provides evidence of how sensitive immigration's wage impacts were to alternative immigrant-native elasticity of substitution, ranging from 20 (Ottaviano and Peri 2012), via 34 (PDS Lasso), to perfect substitution (Borjas, Grogger, and Hanson 2012). 22 In general, the simulated wage impacts on natives were not very sensitive to the elasticity of substitution. When examining the average short-run wage impacts, the Lasso-based simulated wage impact was 8.3 percent larger than that based on Ottaviano and Peri's preferred elasticity and 9.9 percent smaller than that based on Borjas’ preferred elasticity. In clear contrast to the native cases, the simulated wage impacts on immigrants were far more sensitive. On average, the Lasso-based simulated wage impact on immigrants was 23.3 percent smaller than that based on Ottaviano and Peri's preferred elasticity and 76.4 percent larger than that based on Borjas’ preferred elasticity.
The above simulation demonstrates, thus, considerable precision gains for applying the PDS Lasso to estimate the immigrant-native elasticity of substitution and, then, simulate immigration's wage impacts. The gain is likely to be substantial in areas where wage impacts on immigrants are of great interest to researchers, such as immigrant-native labor market competition, immigrant-native inequality, immigrant poverty, and immigrants’ and their children's socioeconomic assimilation.
Spatial Segregation and Imperfect Substitution
Unlike labor market segmentation, the role of spatial segregation in determining the immigrant-native imperfect substitution has received less attention in immigration economic literature (Smith and Edmonston 1997). This section intends to fill this gap. Note that up to this point, I have followed Ottaviano and Peri (2012) to assume that within the United States, all workers with comparable skills and the same nativity are perfect substitutes. This assumption is shared by the “national approach” literature (Borjas 2003). By contrast, another literature—the “regional approach”—exploits variations of immigrant penetration and natives’ wages across regions to identify immigration's wage impacts (e.g., Card 1990, 2009).
The assumption held by the national approach offers a good approximation to an economy where labor market integration is sufficiently high and domestic trade further equalizes wages across regions (Lewis 2003). However, as suggested by the theoretical section, certain spatial segregation remains in the United States, and it could raise difficulties for comparable immigrants and natives living and working in different regions to be perfect substitutes. This problem is most relevant for low-skilled workers who produce non-tradable services and whose spatial mobility is low (Moretti 2013).
Keeping these concerns in mind, I employ the PDS Lasso estimator to investigate the relationship between spatial segregation and the national-level immigrant-native imperfect substitution established in previous sections. I am also interested in whether comparably skilled immigrants and natives could be better substitutes within narrowly defined regions like census divisions, states, and metropolitan areas (cities, hereafter).
To answer these questions, I adapt my national-level analysis to regional data. Specifically, I estimate the regional-level immigrant-native elasticity of substitution using wage and employment information from skill-nativity-region-year cells. 23 The adaptation raises a tremendous challenge for estimation and inference because the covariates subject to preselection explode. Traditional regression techniques such as OLS cannot handle high dimensional data with p close to or exceeding n. The PDS Lasso, however, offers in such circumstances, a far superior solution to the OLS.
Table 7 above shows that the immigrant-native elasticities of substitution obtained by applying the regional approach always exceed the national-level elasticity (34). Furthermore, the elasticity of substitution rises as a finer geographic area is employed. When I use the data from 1960 to 2006 and adopt the fully saturated specification, the census-division-level elasticity of substitution equals 48, followed by the state-level elasticity 83, and the city-level elasticity 167, 24 and the state- and city-level elasticities are statistically indifferent from infinity. When using more recent data extended to 2018, I obtain qualitatively similar results, except that now, the state-level elasticity equals 122 and is statistically indifferent from infinity. 25
Spatial Segregation as a Primary Source for Imperfect Substitution.
Notes: The first column uses Census/ACS 1960–2006, male subsample, while the second column uses the Census 1960–2000, ACS 2010 and 2018, male subsample. The national approach treats comparably skilled male workers in different census divisions/states/cities as perfect substitutes. However, when adopting the regional approach, comparably skilled workers are perfect substitutes only if they live in the same census division/state/city. Robust standard errors are reported in parentheses and clustered by education-experience cells. * Statistically significant at 10% level; ** at 5% level; *** at 1% level. NA denotes not available.
These findings are intuitive and show that spatial segregation of comparably skilled immigrants and natives powerfully explains the national-level imperfect substitution. In general, the degree of imperfect substitution shrinks as I examine the same issue in increasingly finer regions. Notably, I find little evidence of imperfect substitution between comparably skilled immigrants and natives in medium- or large-sized US cities. The national-level imperfect substitution almost disappears in narrowly defined local labor markets.
As reviewed before, it remains considerable within-city immigrant segregation. The disappeared imperfect substitution in local economies, thus, deserves explanations. One explanation is that segregation occurs along many dimensions. Once restricting to comparably skilled workers, the effective immigrant segregation decreases. Moreover, although the spatial segregation literature traditionally centers on the residential pattern (Waters and Pineau 2015), social interactions also occur in the workplace. Evidence suggests lower workplace immigrant-native segregation than residential segregation (Ellis, Wright, and Parks 2004; Strömgren et al. 2014). Furthermore, although immigrants and natives may locate in different local labor markets, the product market can integrate these segments. In narrowly defined geographic units such as cities, virtually all goods and services become tradable, and there is no barrier to factor price equalization.
Finally, the findings in Table 7 raise a puzzle that deserves further study. Since the immigrant-native substitutability within smaller geographic units is often larger than its national counterpart, applying the logic of Card (2009), we should expect that the regional approach yields more negative wage effects because nearby natives and immigrants compete more directly. Hence, it should impose larger downward pressure on natives’ wages. However, the regional approach literature usually concludes with smaller, if not ignorable, negative impacts than the national approach literature (Lewis and Peri 2015). Sometimes, these studies yield positive effects (Ottaviano and Peri 2012). There is, thus, a potential inconsistency between my findings’ implication and consensus within the regional approach literature, which demands investigations in the future. 26
Conclusions
To resolve the immigrant-native substitution debate among leading scholars, I employed the PDS Lasso estimator to shed fresh light, relying on its double strengths as a variable selector and a causal inference tool. Introducing this ML technique added a novel method to social scientists’ toolkits for studying international migration worldwide and its socioeconomic impacts. 27 It also enabled this article to make solid progress in three major areas below.
First, unlike regressions, the PDS Lasso estimation yields robust and precise substitution estimates, leading to a firm rejection of perfect substitution. Based on these improved substitution estimates, I confidently simulated immigration's wage impacts. Qualitatively, my simulation confirms the quotation from Card (2009). Even in the short-run, the simulation shows small overall negative effects on natives’ wages, whereas immigrants, especially high-school dropouts, suffer from persistent wage reduction. Nevertheless, my Lasso-based substitution estimates suggest more direct immigrant-native competition than Ottaviano and Peri expected, which helps explain more negative wage impacts on the US natives in my simulation. Simulation results also indicate considerable precision gains in immigrants’ wage impacts.
Second, the PDS Lasso facilitates my exploration of the sources of imperfect immigrant-native substitution. Traditional explanations emphasize aspatial skill differences (e.g., language). This article, however, stressed the role of spatial segregation in determining nationwide imperfect substitution. To my knowledge, this article represents one of the earliest empirical attempts to rigorously examine the relationship between spatial segregation and immigrant-native substitution. Moreover, my findings, together with the segmented labor market theories and the ethnic enclave literature, force us to rethink spatial segregation and its socioeconomic functions.
Third, the Lasso-based simulation results have clear policy implications. There are growing concerns about the widespread anti-immigrant sentiment and its political consequences. The small labor market impacts, thus, urge us to move decisively from simple economic explanations to sociocultural explanations (Berman 2021; Tabellini 2020). Nevertheless, it is also important to explore the interactions between these two explanations. Some sociocultural factors might act as “amplifiers” that transform small economic impacts into larger political consequences. Moreover, given that the negative wage impacts are concentrated among immigrants, policies fighting against poverty and promoting assimilation should be prioritized.
Before concluding, it should be noted that this article has the following limitations. First, for all empirical sections except one, I neglect the women. Although my PDS Lasso estimates produced close substitutability across gender and, thus, lessened concerns about mismeasurement of immigrant women's experiences in this specific setting, these findings did not significantly improve our understanding of how gender and immigrant-native substitution interact. Second, the Lasso-based estimation strategy cannot eliminate the endogeneity concern caused by simultaneous equation bias. To better cope with endogeneity issues, future studies may simultaneously exploit the exogenous supply shocks and use the Lasso to control demand shocks. Third, although spatial segregation powerfully explains the national-level imperfect substitution, immigrant segregation remains unexplained. A thorough investigation should examine whether it is more appropriate to view spatial segregation as a social structure or an outcome of individuals’ choices, reinforced by social networks.
Supplemental Material
sj-docx-1-mrx-10.1177_01979183221126467 - Supplemental material for Detecting Imperfect Substitution between Comparably Skilled Immigrants and Natives: A Machine Learning Approach
Supplemental material, sj-docx-1-mrx-10.1177_01979183221126467 for Detecting Imperfect Substitution between Comparably Skilled Immigrants and Natives: A Machine Learning Approach by Yunhe Lu in International Migration Review
Footnotes
Acknowledgments
This research project began in the summer of 2016. The author is grateful to Shihe Fu, Xiaobo He, Jingbei Hu, Shi Li, Chunbing Xing, Armin Bohnet, and Jürgen Meckl for their encouragement and helpful comments through the years. He also thanks IMR's Editors-in-Chief, the Associate Editor, and four anonymous referees for their valuable suggestions, comments, and constructive criticisms. Financial support for this article came entirely from the author's institutions. All errors are my own.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
