This paper investigates gender-based segregation across different fields of study at the senior secondary level of schooling in a large developing country. We use a nationally representative longitudinal data set from India to analyze the extent and determinants of gender gap in higher secondary stream choice. Using fixed-effects regressions that control for unobserved heterogeneity at the regional and household levels, we find that girls are about 20 percentage points less likely than boys to study in science (STEM) and commerce streams as compared with humanities. This gender disparity is unlikely to be driven by gender-specific differences in cognitive ability, given that the gap remains large and significant even after we control for individuals' past test scores. We establish the robustness of these estimates through various sensitivity analyses: including sibling fixed effects, considering intrahousehold relationships among individuals, and addressing sample selection issues. Disaggregating the effect on separate streams, we find that girls are most underrepresented in the study of science. Our findings indicate that gender inequality in economic outcomes, such as occupational segregation and gender pay gaps, is determined by gendered trajectories set much earlier in the life course, especially at the school level.
Various forms of gender inequality are observed in different parts of the world. In South Asia, such inequalities have manifested throughout the life course of individuals: sex imbalance at birth due to sex-selective abortions, unequal survival rates, differential human capital investments, discrimination in the labor market, and so on. The last few decades, however, have seen progress toward gender equality, most notably in education. Although the gender gap in enrollment rates at all levels of education has diminished, it has not translated into a commensurate improvement in women's labor market outcomes. For instance, female labor force participation, which is viewed as one of the important indicators of inclusive development and female economic empowerment, has remained very low and stagnant and has sometimes declined in India, despite the nation's rapid economic growth, female educational expansion, and fertility decline in the last two decades (Klasen and Pieters 2015). Underparticipation of women will restrict the country from properly utilizing its demographic gift of having a high proportion of the population at working ages. Additionally, occupational and sectoral segregation of employment by gender is remarkably persistent and is a key issue behind perpetuating female disadvantage, such as the gender pay gap in the labor market (Borrowman and Klasen 2020).
Against this backdrop, we show here that the gender gap in economic participation in adulthood in India is shaped by gendered trajectories set earlier in life, especially at the school level. Specifically, we identify the gender gap in science, technology, engineering, and mathematics (STEM) and commerce-related fields at the senior secondary level of education, which is likely to have a significant and far-reaching effect on individuals' adult-life outcomes.
The literature has analyzed the determinants and consequences of stream choice at the postsecondary and tertiary levels of education (Arcidiacono 2004; Beffy et al. 2012; Fuller et al. 1982). Some studies have also recognized the link between educational segregation and occupational segregation and have reflected on the life course processes that determine women's career trajectories relative to men's (Schneeweis and Zweimüller 2012; Xie and Shauman 2003). However, the causes of gender gap in stream choice are not fully understood, and they constitute an area of active research in the literature (Kahn and Ginther 2017; Xie et al. 2015). In addition, barring a few exceptions, most of this research has focused on developed countries.1
By investigating this issue in the context of an emerging economy, we contribute to the literature in various ways. First, India offers an important case study given the importance of STEM education in its economy. Over the last few decades after economic liberalization, the growth of the Indian economy has been led by the services sector, where the information technology–related industry has been a prime contributor (Panagariya 2004). The nature of economic growth during this period has potentially had a spillover effect on education participation (Jensen 2012; Oster and Steinberg 2013; Shastry 2012). India, along with China, has accounted for the majority of the world's recent STEM graduates, constituting a much larger share than the European Union and the United States (UNCTAD 2018). Yet, the gender composition of these STEM students and their labor market prospects have remained unexplored in the literature.
Cross-country studies have shown variation in the overall levels and patterns of sex segregation in stream choice (e.g., Charles and Bradley 2009), indicating that the processes that generate this form of gender inequality in advanced and developing nations may be distinct. This set of findings justifies an empirical analysis in a developing country setting, such as India, where the prevalence of male-favoring gender norms affects individual decisions at various life stages. In a society where economic and cultural factors drive underinvestment in girls' education (Azam and Kingdon 2013; Kingdon 2005), the magnitude and determinants of gender gap in STEM participation might be different from those in more gender-egalitarian countries. Moreover, from a demographic point of view, the Indian scenario differs from developed countries in its very low and stagnating female labor force participation. In this context, although a growing body of literature has examined the role of education in female labor force participation, it has focused on the level of education and not the type of education (Klasen and Pieters 2015; Sarkar et al. 2019). The current paper contributes to this discourse by highlighting the persistence of gender segregation of academic fields despite improvements in female education levels.
The Indian education system also has distinct structural features that imply greater importance of stream choice made at the school level. After secondary schooling, completed after 10 years of education, students entering at the higher secondary level (lasting another two years) must specialize in one of the following streams: humanities, science, commerce, engineering/vocational, and other.2 Unlike many developed countries, including the United States, stream choice at the tertiary level in India is made before admission to college, and because of eligibility requirements, this choice is largely determined by the stream studied at the higher secondary level.3 Therefore, school-level stream choice is a crucial juncture in an individual's career because it determines the subsequent course of study at the college level and the nature of jobs that the individual may obtain in the future. Thus, the life course approach proposed by Xie and Shauman (2003) is especially relevant in the context of India, where the prevailing education system implies that choices made in adolescence affect adult-life outcomes. In fact, Sahoo and Klasen (2018) showed that stream choice at the higher secondary level in India strongly influences later labor market outcomes, including participation, occupational choice, and earnings.
Another contribution of this study is its exploration of the gender gap in the commerce stream, which is equivalent to a business major. Although a vast literature has focused on the gender gap in STEM, the gender gap in business studies is less explored. Analyzing trends in college major choice in the United States, Gemici and Wiswall (2014) found that women are significantly less likely than men to choose a business major, despite the documented overall rise in women's participation in tertiary education (Goldin et al. 2006). We extend this literature by investigating gender disparity in the choice of the commerce stream at the higher secondary school level in India.
We use a nationally representative household-level panel data set that tracks the same households and individual members at two time points: 2005 and 2012. The novelty of this survey is that it asks all individuals about their performance in the secondary school leaving certificate (SSLC) examination and subsequently asks what stream they studied at the higher secondary level. Additionally, individuals aged 15–18 years (the ages corresponding to higher secondary schooling) in 2012 can be matched with information on their prior skills in mathematics, reading, and writing from an independent test conducted in the earlier round of survey in 2005. Thus, we have a unique setting to investigate individuals' higher secondary stream choice after controlling for their past academic performances, which serve as reasonable proxies of their cognitive ability.
Estimating fixed-effects regression models, we find a significant gender disparity of about 20 percentage points in the choice of nonhumanities streams (i.e., STEM and commerce) at the higher secondary level among youth aged 15–18 years. In addition to a rich set of covariates, we account for unobserved heterogeneity at the regional and household levels by including fixed effects in the regression. The gender gap remains unchanged even after we control for SSLC exam performance and lagged test scores from the previous survey. We establish the robustness of the estimates by considering the intrahousehold relationships of individuals, estimating sibling fixed-effects models, and addressing sample selection issues using an inverse probability weighting (IPW) framework.
We further investigate the determinants of gender difference in stream choice. Given the persistence of the gender gap even after we take into account the effect of cognitive ability as measured by past exam performance and test scores, we explore the roles of other relevant characteristics. We find that the gender gap does not vary with household income, suggesting that gender-based sorting into different streams is equally prevalent in richer and poorer households. Rather, the gender gap is significantly reduced when there is greater educational parity between parents, captured by the difference in education level between mother and father. We also show that better access to STEM-related education benefits girls more than boys, thus narrowing the gender gap. Additionally, investigating the choice of separate study tracks, we show that the pro-male gender bias is largest in science, followed by commerce and engineering/vocational streams.
Background and Related Literature
The last few decades have seen considerable progress in bridging the gender gap in educational attainment around the developing world. At the same time, trends in female labor force participation have been rather uneven, with South Asia actually experiencing declining female labor force participation rates (Klasen 2019). Moreover, women have continued to be employed predominantly only in few sectors and occupations (Borrowman and Klasen 2020). This perpetuating trend in occupational and sectoral segregation is a major reason for the persistence of the male-female earnings gap (Blau and Kahn 2017). This pattern of gender stratification has also been found in the Indian labor market (Duraisamy and Duraisamy 2014).
India has experienced a major expansion in education provision, resulting in a significant rise in school enrollment of both boys and girls. The Indian education system has a common structure throughout the country: students progress through primary, middle, and secondary education in their first 10 years of schooling, followed by another two years of higher secondary schooling and subsequently three to five years of tertiary education. Data from the National Sample Survey (NSS) show that in the mid-1990s, the average enrollment rate among children in the age group corresponding to elementary (i.e. primary and middle) schooling was about 70%, with a gender gap of 10 percentage points favoring boys. Over the next 20 years, this enrollment rate increased to 93%, with the gender gap declining to only 2 percentage points. The same pattern is visible in secondary and higher secondary levels: over the last two decades, the enrollment rate increased from 50% to 77%, and the gender gap declined from 16 percentage points to 2 percentage points.
The first 10 years of education in India include a common, nonselective curriculum for all students. After that, each student enrolling in higher secondary level must specialize in a particular stream; most choose the humanities, science, or commerce stream, and a minority opt for other tracks, such as engineering or vocational education. After completing the higher secondary level, students who continue to tertiary education enroll in colleges for bachelor's and master's degrees in a chosen stream. A crucial aspect of the Indian education system is that stream choice at the higher secondary level largely determines subsequent major choice at the college level. Particularly, students who have studied in the humanities stream in higher secondary school are deemed ineligible for a STEM or commerce major in almost all colleges. Therefore, stream choice at the higher secondary level is an important decision in an individual's career because it drives the field choice at subsequent levels of education, which in turn affects labor market outcomes through occupational choice. National-level statistics from repeated cross-sectional surveys of the NSS show that the proportion of students enrolled in higher secondary level choosing humanities declined from 56% in 2007–2008 to 42% in 2014. In contrast, science enrollment increased from 31% to 39% during this period, and commerce enrollment increased from 13% to 16%. These aggregate statistics also reveal that girls have a higher propensity to study humanities than science or commerce, and boys are more likely than girls to study science (Figure 1). This gender disparity in school-level stream choice also leads to subsequent gender gaps in undergraduate studies: the share of women in STEM is only 37%, and the share in commerce is 45% (Government of India 2016).
The literature on postsecondary stream choice, mostly based on developed countries, highlights that educational choices at this level are closely linked to labor market outcomes. First, stream choice is affected by the expected future earnings from different streams (Beffy et al. 2012; Boudarbat 2008). Second, such educational choices also cause much of the variation in earnings later in life (Dustmann 2004; Joensen and Nielsen 2009). Specifically, evidence suggests that a STEM or business major yields higher returns than studying humanities (Flabbi 2011). This pattern is corroborated in the Indian context when we compare the earnings distributions of individuals who studied STEM/commerce with those of individuals who studied humanities at the higher secondary level (Figure 2).
Focusing on gender, studies have shown gender disparities in stream choice: girls are especially underrepresented in STEM at the postsecondary level of education in most countries (Hill et al. 2010; World Bank 2012). The incidence of gender segregation in education and its relation to occupational segregation has also been explored using data from the United States and Europe (Bieri et al. 2016; Daymont and Andrisani 1984; Eide 1994; Flabbi 2011; Van Puyenbroeck et al. 2012). These studies have found that men's and women's college major choice largely explains occupational choices and accounts for a significant part of the gender wage gap. For the case of India, Sahoo and Klasen (2018) found that, even with controls for exam results and household fixed effects, women who choose a STEM or commerce stream in higher secondary education have substantially higher chances of participating in the labor force, securing salaried employment, choosing a male-dominated occupation, and having higher earnings. The choice of STEM or commerce stream in turn leads to a reduction of gender gap within households in terms of all these economic outcomes. Also, among different streams, science appears to have the most significant effect.
One potential reason that girls are less likely to choose STEM subjects is that boys may have a comparative advantage in mathematics. Evidence shows that a male advantage in mathematics achievement starts manifesting in middle school and increases with age (Bharadwaj et al. 2012; Kahn and Ginther 2017), but mathematical ability does not fully account for the gender gap in STEM choice (Dickson 2010; Friedman-Sokuler and Justman 2016; Riegle-Crumb et al. 2012; Turner and Bowen 1999). Rather than inherent gender differences in cognitive ability, other societal, psychosocial, and preference-related factors play a larger role in explaining the underrepresentation of women in math-intensive STEM subjects (Antecol and Cobb-Clark 2013; Buser et al. 2014; Zafar 2013). In fact, a large part of the observed gender gap can be attributed to the stereotypical beliefs about girls' mathematical ability and gendered preferences that are often shaped by cultural norms (Charles and Bradley 2009; Kahn and Ginther 2017).
The salience of societal factors implies that contextual analysis is essential for understanding the incidence and determinants of gendered educational choices. In addition, the theoretical perspectives on the relationship between economic development and gender stratification in education do not always converge (Hannum 2005). Modernization or neoclassical theory suggests that the expansion of market forces reduces discriminatory cultural practices that are linked to economic inefficiency, thereby reducing gender disparities in education (Forsythe et al. 2000). On the other hand, Boserup (1970) hypothesized that inequality would first increase and then decrease in the process of development. Initially, men with better access to market opportunities may reap greater benefits of economic prosperity, and progress toward gender equality would be achieved as the structural transformation proceeds (Lantican et al. 1996). Traditional institutions also mediate the effect of economic development on women's educational responses (Munshi and Rosenzweig 2006). Particularly for school-age children, decisions are influenced by parents, who are likely to consider factors beyond labor market returns to education. In South Asia, these factors include dowry payment for daughters' marriage, a higher likelihood of receiving old-age support from sons than from daughters due to patrilocality, and gender norms about women's participation in activities outside the household (Alderman and King 1998; Jayachandran 2015).
Our study contributes to the literature in two ways. First, we identify the pattern of gender segregation in stream choice in an emerging economy where such evidence has been lacking.
Second, we explore the plausible determinants of the gender disparity. Specifically, we analyze the role of cognitive ability, measured by past exam performance and test scores. In addition, we examine the influence of other pertinent factors in this context. Household income is likely to be a constraining factor for poorer students while choosing a STEM education, which is more costly to study than humanities. Indeed, the NSS data reveal that the average expenditure incurred by students in the science and commerce streams is more than twice the expenditure of those studying humanities at the higher secondary level.4 Variation in household income may lead to gendered choices depending on whether resource constraints are binding and how son preference varies along with income (Alderman and King 1998; Garg and Morduch 1998). Therefore, we investigate whether household income determines the gender difference in stream choice.
Another potential determinant we consider is parental education gap. A large literature has explored the intergenerational transmission of human capital, but this research has mostly analyzed the effect of parental education on children's years of schooling or grade progression rather than stream choice (Holmlund et al. 2011). In addition, we introduce the gender dimension by focusing on the gap in educational attainment between mothers and fathers. We postulate that greater parity in parental education would induce equality in stream choice between boys and girls.
Finally, we consider access to STEM-related education, which is especially important in a developing country where students are often constrained by the availability of specific streams in the local schools. Reviewing the literature on several developing countries, Glick (2008) noted that access to education, despite being a gender-neutral factor, may disproportionately affect girls' participation. This possibility is plausible in the context of a patriarchal society like India, where strong gender norms may discourage adolescent girls from traveling long distances to attend school (Muralidharan and Prakash 2017). Safety concerns may also dissuade girls from enrolling in their preferred stream if it involves traveling longer distances (Borker 2017). Using regional variation in the availability of STEM colleges as a proxy for access, we test whether better access reduces the gender gap in stream choice.
We use the India Human Development Survey (IHDS), a nationally representative, two-period longitudinal data set (Desai et al. 2010, 2015).5 The first round of data was collected in 2004–2005 on 41,554 households in 1,503 villages and 971 urban neighborhoods across India. In 2011–2012, the second round of survey reinterviewed 83% of the same households; for households that could not be tracked, a replacement sample was used. Thus, the second round of survey covered 42,152 households across India. For brevity, we refer to the first round as 2005 data and the second round as 2012 data. IHDS is a multitopic survey collecting detailed information at the individual, household, and community levels. Our analysis mainly uses the sample from the 2012 survey and uses the 2005 survey to account for past characteristics of the same individuals.
We explore whether the choice of study stream exhibits a gender bias at the higher secondary level. In India, the official school entry age is 6 years, and the (lower) secondary level ends after 10 years of schooling. In the IHDS sample, the enrollment rate of children of secondary school age (14–15 years) is 87%, and the gender gap in the enrollment rate is only 2 percentage points. Because the higher secondary (or senior secondary) level consists of two years of schooling succeeding the secondary level, we concentrate on the sample of individuals who are in the corresponding age group of 15–18 years.6 Information on stream choice at the higher secondary level is available only for individuals who have passed the secondary level and enrolled in the subsequent level of education. The secondary pass rate for our sample is 39.4% for males and 40.6% for females; a t test reveals that the gender difference in the secondary pass rate is not statistically significant. After we drop observations with missing values, the final analysis sample is 5,203 children.
The first step toward specialization begins at the higher secondary level of education, when students have to choose a stream mainly from the following options: arts/humanities, commerce, science, engineering/vocational, and others (e.g., home science, craft, and design).7 Estimates from the IHDS data show patterns of stream choice that are similar to the national-level statistics around this period. Summary statistics presented in Table 1 show that 50% of students in the sample chose the humanities stream. The next most popular stream is science, followed by commerce, engineering/vocational, and others, the latter of which are chosen by very few. In the sample, 58% girls but only 41% boys chose humanities, indicating that girls are underrepresented in science, commerce, and engineering/vocational streams. Because these average differences may be confounded by various observable and unobservable factors that are correlated with both gender and stream choice, we next lay out an econometric model to identify the gender gap.
We estimate a linear probability model where the dependent variable () is a binary indicator of whether an individual of higher secondary school age (15–18 years) has chosen to study stream , where . The subscripts i, h, v, and d (respectively) denote individual, household, village/town, and district. The main explanatory variable is an indicator variable () denoting whether the individual is female. In addition, we control for individual-level covariates (): age, birth order, number of siblings, mother's years of education, father's years of education, and dummy variables indicating relationship to the household head. Household-level covariates () include household size, wealth, dummy variables for social group (caste and religion), and whether the household is in a rural area. To control for regional characteristics, we first include fixed effects at the district level () and then the village/town level (). Inclusion of village/town fixed effects also helps us to control for access to education in the locality, which is important because some schools may not offer higher secondary education or may not offer all the streams at this level. Other regional characteristics, such as local labor market conditions and societal norms toward girls' education, are also subsumed by these fixed effects.
Because household-level factors, including unobserved tastes and preferences for different types of education, potentially affect the stream choice, we control for household-level heterogeneity by including household fixed effects () in an additional set of regressions.8 This control is especially important in the context of India, where the household's unobserved preferences are correlated with gender inequality. For example, female children in India are often more likely to be found in larger families because fertility decisions are endogenously determined; parents keep having children until they have at least one boy (Basu and de Jong 2010; Clark 2000; Yamaguchi 1989). If STEM education requires higher investments, then comparisons across households may artificially show a gender gap because girls belong to larger families, who invest less in the human capital of each child. For these and related reasons, studies investigating gender discrimination in educational investments have advocated using household fixed effects (Jensen 2002; Kingdon 2005; Sahoo 2017). Although it includes household fixed effects, our model also takes into account the potential nonindependence of observations belonging to the same household by clustering the standard errors at the household level.9
Gender differences in the choice of STEM education may be driven by girls' lower cognitive ability compared with boys, especially in mathematics. The literature on gender gaps in mathematics achievement suggests that most of the observed gap is explained by background factors (Benbow and Stanley 1980; Nollenberger et al. 2016). In India, because of systematic and continual underinvestment in girls' human capital from early childhood, girls' cognitive ability may lag behind that of boys at the higher secondary level. A novel feature of our data is that they allow us to account for an individual's cognitive ability using two distinct measures.
The first measure of cognitive ability is given by the individual's performance in the secondary level board examination, which is potentially an important predictor of stream choice at the higher secondary level. In India, a standardized examination is conducted by the education board (at the state or national level) to which each school belongs. Every student must pass this examination and obtain the SSLC to be able to continue at higher secondary levels of education. The results of this examination are typically categorized into divisions 1, 2, and 3, in the declining order of the quality of grade obtained. We use this SSLC performance indicator to control for the individual's cognitive ability.
Furthermore, in the 2005 IHDS round, children who were aged 8–11 years were given cognitive tests on mathematics, reading, and writing ability. In the 2012 survey, these children are in the age group corresponding to the higher secondary level and are considered in the regression. Therefore, we are able to control for their past cognitive ability by including their performance on these tests.10 Consequently, we control for achievement scores collected by two independent tests: one from the SSLC examination and the other conducted by IHDS enumerators in 2005. Hence, we believe that our regression adequately captures the differences in children's abilities and identifies the gender gap in stream choice.
A potential concern that remains is that stream choice is defined only for those individuals who have passed the secondary level and enrolled at the higher secondary level. In the age group considered, 40% of children passed the secondary level. These children are likely to be systematically different from those who have education below the secondary level. However, disaggregating this pass rate by boys and girls, we find that there is no gender gap in the secondary level pass rate. We also estimate a regression (see Table A2, online appendix) similar to Eq. (1) but with the dependent variable being a binary indicator of whether a child has passed the secondary level (and hence is eligible for higher secondary stream choice). The coefficient on gender in this regression is almost always insignificant, and the magnitude is almost zero, suggesting that the probability of selecting into the sample for our main regression (stream choice) does not vary by gender. Hence, this selection is unlikely to confound the effect of gender in the regression of STEM/commerce stream choice.
We begin by investigating the gender difference in the choice of STEM/commerce streams, combining science, engineering/vocational, and commerce into one category and comparing it with the humanities and other streams. The results, presented in Table 2, show a statistically significant female disadvantage of about 20 percentage points in the choice of STEM/commerce streams compared with the humanities. This estimate remains stable across different specifications. Although all regressions include observable control variables and SSLC results to control for cognitive ability, we sequentially add fixed effects at the level of the district, village/town, and household.11
Our final model further includes test scores from the 2005 survey.12 Among all boys and girls, 50% study STEM/commerce streams; thus, the estimated gender gap translates into a magnitude of 40% of the mean participation, which is substantial. As expected, we find that students who scored better on the SSLC examination are more likely to study STEM/commerce at the higher secondary level. Students who in 2005 scored at the highest level of difficulty in mathematics (i.e., division) also have a higher probability of choosing these quantitative streams.13 Because the estimate of the gender gap remains significant and stable even after we take into account the variation in cognitive abilities captured by two different measures, the gender gap in stream choice is unlikely to be driven by the intrinsic ability of students.
Our main results reveal a gender gap in stream choice after we control for explanatory factors. We further investigate the intrahousehold differences in outcomes when we include household fixed effects in the analysis. In this section, we test whether our estimates of the intrahousehold gender gap remain robust after we take household structures into account.
First, we consider the relationship of individuals in the household more explicitly. In the sample of adolescents included in the analysis, 84% are children and 12% are grandchildren of the household heads.14 To ensure that intrahousehold relationships do not confound the effect of gender, all the regressions control for dummy variables denoting an individual's relation to the household head. Moreover, we conduct a sensitivity analysis by restricting the comparison between individuals who are in a similar position within the household; in particular, we compare direct siblings by using a sibling fixed-effects model.15 The observations pertaining to the siblings sharing the same parents may not be independent because the siblings are likely to have common unobservable characteristics. To address this issue, our model estimates cluster-robust standard errors, allowing the error terms to be correlated among siblings who share the same parents. Results presented in columns 1–2 of Table 3 reveal that the estimates remain almost unchanged in this analysis. In an additional exercise, we restrict the sample to sons and daughters of the household head and estimate the model. We again find a similar estimate of the gender gap, as shown in the last two columns of Table 3. These analyses establish that the magnitude and precision of the estimated gender gap are not affected by the household structure and relationships among individuals in the household.
Next, we investigate whether the estimates from the household fixed-effects models are generalizable. Because the coefficients in these models are estimated using variation within households, observations belonging to households with multiple children contribute to this estimation. Moreover, for identification of the intrahousehold gender gap in stream choice, at least some of these households must have both multiple children and children of opposite gender. If the characteristics of these households systematically vary from those of the overall sample, then the estimates may not be generalizable.16 To address this issue, we adopt IPW, which has been widely used in the literature in similar contexts (Fitzgerald et al. 1998; Jones et al. 2006; Wooldridge 2010). This estimation technique follows two steps. In the first step, using our main sample of 5,203 adolescents, we model the probability of belonging to a household with multiple children, conditional on a set of covariates. These covariates include the observable explanatory variables used in Eq. (1) and their interaction with the gender dummy variable. In the second step, we use the inverse of these predicted probabilities as weights for the observations while estimating a household fixed-effects model restricting the sample to those households with multiple children. In another instance, we apply the IPW model for households with multiple children of opposite gender for the second step.
The findings of this robustness analysis are summarized graphically in Figure 3, which juxtaposes the estimates that do not use IPW with those using IPW. We find that the point estimates and the confidence intervals are remarkably similar even after we use IPW to correct for any potential nonrandom selection of households when fixed effects are used. This analysis bolsters our main results and indicates that the estimated gender gap is robust to the issue of sample selection.
Heterogeneity Analysis Exploring the Determinants of the Gender Gap
To explore what drives the gender gap in stream choice, we augment our main empirical model (i.e., Eq. (1)) by including interaction terms of gender with some key explanatory factors. Estimating how the effect of gender varies along with these factors sheds light on the underlying determinants of the gender gap. We investigate variations with respect to factors that have high contextual relevance: household affluence, parental educational parity, and access to STEM education. The first two factors are related to the demand for education, and the third factor reflects the supply of education, which is also policy-relevant.
Studying STEM or related streams likely involves a higher cost, which wealthier households are better able to pay (Chandrasekhar et al. 2019). Indian households are also likely to make greater educational investments on boys (Azam and Kingdon 2013; Kingdon 2005). Therefore, the higher cost of STEM-related education may discourage households from enrolling girls in such streams, especially when households have limited resources for children's education. To check whether resource constraint leads to gender disparity, we interact the gender dummy variable in our model with household income (per capita). We mitigate the potential endogeneity in household income by using baseline income from the earlier round rather than contemporaneous income. As revealed in Table 4, household income has no significant effect on the gender gap in stream choice, although households with higher income are more likely to enroll boys in the STEM/commerce streams.17 Thus, we find that the gender gap is quite pervasive, given that it is observed both in richer and poorer households. This result implies that either resource constraint is relatively less crucial than other determinants of the gender gap, or the gendered preference concerning stream choice does not change with respect to household income.
Because the decision of stream choice is made in adolescence, parents are likely to influence it (Alderman and King 1998; Dustmann 2004). In a patriarchal society like India, parental attitudes toward gender equality in education are likely to affect the study choice of girls vis-à-vis boys. To capture this aspect, we next consider parental educational parity, as defined by the difference in years of education between the mother and father. Because mothers usually have lower levels of education than fathers, a greater parity implied by relatively higher education of the mother may reduce the gender disparity in their children's education. By interacting the female dummy variable with parental educational parity in our model, we find support for this hypothesis. On average, a mother has 1.7 fewer years of education than a father; when educational attainment between the parents is equal, it reduces the gender gap in their children's STEM/commerce stream choice by 2.2 percentage points (column 4, Table 4).
Another pertinent question from the supply side of education is whether the gender disparity declines when STEM-related education is made more accessible. Although various government policies over the last few decades have universalized access to education at the elementary levels, access to higher secondary education still varies substantially. In addition, educational institutions that offer higher secondary level grades may not offer all the streams. In many places, students have to travel long distances to study their desired stream, especially science or commerce.18 Although any such variation in access to education is captured by village/town fixed effects in our model, access may have a differential effect on girls than boys. To estimate the differential effect of access by gender, we interact with gender a variable that measures the total number of science and technical colleges in the district at the time the stream choice was made.19 The results show that districts with a higher number of colleges providing science or technical education have a smaller gender gap in stream choice (columns 5–6, Table 4). A 1 standard deviation increase in the number of science/technical colleges per 1 million population in the district is associated with a reduction of 7 percentage points in the gender gap in higher secondary stream choice.
Gender Gap in the Choice of Individual Streams
We also estimate a linear probability model given by Eq. (1) separately for each stream. Table 5 presents results that include village/town fixed effects (panel A) and household fixed effects (panel B). Girls are 20 percentage points more likely than boys to study humanities, as estimated from both models. Underrepresentation of girls is most prominent in science (8.5–10 percentage points), followed by commerce (6–8 percentage points) and engineering/vocational education (about 3.5 percentage points). Ability sorting is also significant across streams: the humanities stream seems to attract students with worse grades in SSLC exam results, whereas science attracts the best performing students. Nonetheless, the effect of gender remains significant even after we take the effect of ability into account.
Our study provides quantitative evidence on the gender segregation in higher secondary stream choice in an emerging economy. In addition to showing that girls are substantially underrepresented in STEM and commerce streams as compared with the humanities, we shed light on the plausible determinants of the gender gap. Our findings are based on data from India, which accounts for a large share of the world's STEM graduates, thus expanding on the literature, which has focused mostly on developed countries. Also, by reflecting on the underlying processes that cause the gender gap, our findings have implications beyond the Indian setting.
A recent international comparison of test scores in mathematics and science found significant heterogeneity in the relative performance of girls versus boys across different countries (UNESCO 2017). Because of the importance of math skills in STEM fields, one may presume that the extent of the gender gap in math performance would predict an underrepresentation of women in STEM fields. However, we show that variations in cognitive skills, as measured by prior exam performance and math test scores, do not subsume the effect of gender on stream choice. In addition, we show that the gender gap in STEM/commerce participation is equally prevalent among richer and poorer households, in line with the pervasiveness of women's underrepresentation in STEM fields across societies with varying levels of economic prosperity. Our results imply that individual performance or household affluence need not be the main determining factor behind gendered educational choices, and it is necessary to consider other background and societal factors in this context.
Exploring the role of other factors, we show that parental educational parity helps to reduce the gender gap in STEM education. This result underscores the influence of parents, especially in settings where streams are chosen at an early age. Unlike the United States, many European countries require students to choose a field of study in secondary school (e.g., see Dustmann  for Germany; see Dahl et al.  for Sweden). That gender parity in parental education encourages girls to pursue a male-dominated field indicates an intergenerational transmission of gender attitude toward education. However, intergenerational mobility may be limited if parental background determines the education choice of the next generation, as shown by Dustmann (2004) in the context of Germany. Hence, there is scope for policies to play an instrumental role in bridging the gender gap by providing equal opportunities to boys and girls. As our results highlight, one avenue through which policies can be effective is by increasing the number of local educational institutions offering STEM and commerce streams, especially in underserved areas. Such an approach is particularly relevant for developing countries, where girls may be disproportionately affected by the lack of access to STEM education.
It is important to point out that there may be other potential determinants that we have not examined here because appropriate data are not available. These factors include individual preferences or behavioral traits; for instance, gender differences in competitiveness may explain the gender gap in STEM choice (Buser et al. 2014). Teachers may also influence stream choice, but without matched teacher-student data, it is not possible to analyze this aspect. The labor market opportunities for women studying different streams can be another relevant determinant. Investigation of these additional determinants constitutes an agenda for future research.
We are grateful to the editors and four anonymous reviewers of Demography for their constructive comments that helped us to improve the paper. We thank the participants of GrOW Workshop 2016 at Stellenbosch University, Contemporary Issues in Development Economics Conference 2016 at Jadavpur University, PEGNet Conference 2017 at ETH Zurich, GREThA International Conference on Economic Development 2018 at University of Bordeaux, Sustainability and Development Conference 2018 at University of Michigan, and CSAE Conference 2019 at University of Oxford for helpful comments. We also thank Rahul Lahoti, Abhiroop Mukhopadhyay, Nishith Prakash, Sudipa Sarkar, and Hema Swaminathan for helpful discussions. We gratefully acknowledge funding from the Growth and Economic Opportunities for Women (GrOW) initiative, a multifunder partnership between the United Kingdom's Department for International Development, the Hewlett Foundation, and the International Development Research Centre.
An exception is Sookram and Strobl (2009), who analyzed this topic for Trinidad and Tobago.
We categorize science and engineering/vocational as STEM. Subjects like accountancy and finance that involve mathematical tools are included in the commerce stream. Hence, some of our comparisons in this paper involve humanities versus nonhumanities, including STEM and commerce. We also report analysis for each stream separately.
Estimates from the data we use suggest that 93% and 85% of students who are currently studying, respectively, engineering and science in college studied a STEM stream at the higher secondary level. Among students studying humanities in college, 85% studied humanities in higher secondary school as well.
The difference in expenditure is mainly driven by higher school fees and private tutoring costs. Compared with humanities students, students in the science and commerce streams pay, respectively, 2.7 and 2.5 times more on school fees and 2.9 and 2.2 times more on private tutoring. Science and commerce students also incur marginally higher expenses on books, school supplies, and transportation, although these expenses are relatively smaller in proportion to the total expenditure.
The IHDS was carried out jointly by the University of Maryland and the National Council of Applied Economic Research, New Delhi. The data set is publicly available. More details can be found online at https://ihds.umd.edu/.
Strictly speaking, the ages corresponding to the higher secondary level should be 16–17 years. However, we include one year below and one year above this range to allow for the possibility that some children may finish the secondary level earlier or later. The enrollment rate among children aged 16–17 is 74%; however, many of them haven’t yet completed secondary-level schooling. In an alternative specification, we remove the age restriction and estimate the regression for all individuals enrolled in the higher secondary level; the results are unchanged.
Students choose from physics, chemistry, mathematics, and biology/computer science/economics in the science stream; business studies, accountancy, economics, and business mathematics in the commerce stream; and fields such as history, geography, political science, sociology, and philosophy in the humanities stream. In addition, all students study languages at the higher secondary level.
The inclusion of household fixed effects implies that only households with at least two individuals contribute to identification in this regression. To ensure that our estimates are not biased due to the selection of such households, we also present results from analyses excluding household fixed effects. To further address this issue, we check the sensitivity of our estimates using an inverse probability weighting technique in our robustness analysis.
We also check the robustness of the estimates by clustering the standard errors at the level of district and village/town in the earlier specifications. We thereby take into account any potential heteroskedasticity and correlation in the error terms within the clusters (Angrist and Pischke 2009: chapter 8).
The sample size is reduced substantially, by about 50%, when we control for past test scores from the previous round of the survey due to many missing values in the variables capturing past test scores (see Table 1). Some individuals (about 11% of the sample) could not be found in the 2012 survey, and others may have misreported their age in the previous survey, leading to missing values for test scores among this age group. Later, we show that our estimates are not driven by variations in sample size.
We provide additional estimates based on an intrahousehold comparison in the subsequent section on robustness analysis. The results are also robust to the inclusion of a control variable indicating whether there was a sibling who married and left the household (results not shown).
Although the sample size drops after the inclusion of past test scores, which are not available for the entire sample, a comparison yields no significant difference in key variables between the entire sample and the reduced sample. Also, if we estimate regressions from columns 1–3 (Table 2) on the reduced sample, the estimates are almost unchanged. See Table A1 in the online appendix.
We do not find any significant effect of other measures of cognitive ability (i.e., reading and writing scores) on stream choice. This result is consistent with Arcidiacono’s (2004) finding that math ability was more important than verbal ability in explaining sorting into particular majors in the context of the United States.
In addition, 2% are nephews/nieces of the household head, and the sample reflects very few other relationships (each less than 1%), such as daughter-in-law, brother/sister, or other relatives.
These direct siblings share the same parents. In India, sometimes multiple families coreside in a household, forming an extended or joint family; hence children from multiple parents may be coresiding in a household. A sibling fixed-effects model controls for heterogeneity across different parents within a household; few such cases are found in the sample, however, given that 84% of the sample is formed by children of the household head.
A comparison of the key characteristics between the sample of households with multiple children and the whole sample is provided in Table A3 in the online appendix. For the samples in the stream choice analysis (i.e. 15- to 18-year-old adolescents enrolled in higher secondary schooling), there is no significant difference in the mean of the outcome variable (i.e., stream choice) across these households. However, some of the household characteristics predictably differ across these samples, although the differences are not large.
We find similar results if instead of household income we use household wealth measured by durable assets.
Data from the 2014 NSS show that in rural areas, 43% and 41% of girls in, respectively, the science and commerce streams have to travel more than 5 kilometers to reach their schools; 30% of girls who study humanities travel more than 5 kilometers for school.
We use data from the All India Survey of Higher Education to construct this variable. The measure is lagged with respect to the individual’s stream choice decision and is normalized by the population of the district.