## Abstract

Using data from a well-executed randomized experiment, I examine the effects of gender composition and peer achievement on high school students’ outcomes in disadvantaged neighborhoods. Results show that having a higher proportion of female peers in the classroom improves girls’ math test scores only in less-advanced courses. For male students, the estimated gender peer effects are positive but less precisely estimated. I also find no effect of average classroom achievement on female math test scores. Males, on the other hand, seem to benefit from a higher-achieving classroom. I propose mechanisms relating to lower gender stereotype influences and gender-specific attitudes toward competition as potential explanations for peer effects findings. Finally, having a higher proportion of female students in the classroom decreases student absenteeism among male students but has no impact on female attendance.

## Introduction

Title IX of the Education Amendments of 1972 bans discrimination on the basis of gender, and it has been traditionally known to limit single-sex education (Schemo, October 25, 2006). Recent regulations to Title IX in 2006 as part of the No Child Left Behind Act, however, allowed school districts more flexibility in providing nonvocational single-sex education opportunities. The release of these new regulations has also rekindled the debate over the role of peers’ gender on student outcomes.^{1}

Perhaps surprisingly, still little is known about gender peer effects. Most research on peer effects on K–12 has tended to focus on the impact of peer achievement on student outcomes (see, e.g., Antecol et al. 2016; Betts and Zau 2004; Burke and Sass 2013; Falk and Ichino 2006; Graham 2008; Hanushek et al. 2003; Imberman et al. 2012; Sojourner 2013; Vigdor and Nechyba 2007).^{2} I am aware of only a limited number of studies that explicitly examined *gender* peer effects on student achievement (see, e.g., Antecol et al. 2016; Hoxby 2000; Lu and Anderson 2015; Whitmore 2005; and see also Sacerdote 2011 for a review).^{3}

Recent STEM statistics show a significant underrepresentation of females among STEM majors despite making up nearly one-half of the college-educated labor force (Langdon et al. 2011). This finding as well as federal administration efforts to broaden the participation of girls in STEM subjects (i.e., Race to the Top) point to the need for a better understanding of gender peer dynamics at the secondary education level, when the influence on career choice and many other future outcomes is arguably the most relevant. Knowledge of gender peer effects may be particularly relevant in disadvantaged neighborhoods for generating policies that may help break the poverty cycle, reduce teen pregnancy, and prevent risky behaviors (such as involvement in delinquent activities).

In this article, I extend the existing literature along four dimensions. First, I examine the gender peer effects at the high school level in disadvantaged neighborhoods using a well-executed randomized experiment. Specifically, students enrolling in the same math courses at the start of the school year were randomly assigned across classrooms within schools to evaluate teachers holding different certifications (i.e., traditional vs. alternative). Although the experiment was designed to investigate the effectiveness of Teach for America (TFA) and The New Teacher Project (TNTP) Teaching Fellows programs, random assignment within schools may also generate causal peer effects. The experiment was well executed; however, the existence of late enrollees (nonrandomly assigned students) and noncompliers at the end-of-year classrooms prevent me from obtaining straightforward gender peer effects estimates. To overcome the potentially confounding effects, I instrument the end-of-year classroom gender composition with the information from the beginning-of-year classrooms. Second, apart from gender peer effects, randomization into the math classrooms and the availability of the baseline achievement measures enable me to tease out peer achievement effects in the classrooms. Similar to gender peer effects, I instrument the end-of-year classroom achievement with the beginning-of-year information from randomization. The availability of the baseline achievement measures in the data also allows me to disentangle the gender composition effects from peer achievement effects (Sacerdote 2011). Third, unlike many prior studies, I can measure gender composition and peer achievement at the classroom level, which arguably provides better measures than the grade-level aggregation. As Burke and Sass (2013) noted, the identification of the salient peer group in estimating peer effects cannot be understated. Finally, I examine the impact of gender composition and peer achievement on student attendance rates. In the last decade or so, there has been a growing research interest in the role of school attendance on juvenile crime. Many studies have shown that more time spent in school decreases crime and teen pregnancy by keeping juveniles occupied and leaving less time for risky behaviors (see, e.g., Anderson 2012; Berthelon and Kruger 2011; Jacob and Lefgren 2003; Luallen 2006). This incapacitation effect of schools highlights the importance of exploring how gender and achievement of peers affect school attendance among high school students in disadvantaged neighborhoods.

To the best of my knowledge, this study is the first to examine the effect of gender composition as well as peer achievement effects on (1) student math achievement at the high school level using a randomized experiment; (2) different levels of math courses (i.e., general high school math, geometry, and algebra II) using a randomized experiment; and (3) student absenteeism at the high school level. It is also the first attempt to disentangle the gender composition effects from peer achievement effects using a U.S.-based randomized experiment.

I find that having a higher proportion of female peers in math classrooms improves the math test scores of female students. The observed effects for females seem to be driven by interactions in less-advanced math courses (i.e., general high school math). For male students, the estimated gender peer effects are positive but statistically insignificant. Next, it appears that an improvement in average math achievement of the classroom has no statistically significant effect on female math test scores. Males in the bottom and middle one-third of the achievement distribution, however, perform better when the average math achievement of the classroom goes up. I also find gender peer effects to be insensitive to the inclusion of proper controls for predetermined classroom achievement. I propose mechanisms relating to lower gender stereotyping and gender-specific attitudes toward competition as potential explanations for peer effects findings at the secondary education level. Finally, results show that having a higher proportion of female peers in math classrooms decreases the probability of chronic absenteeism (and days absent from school) among male students without having any impact on female attendance.

## Identification Problems and Related Literature

One of the major issues in the education literature is the statistical identification of peer effects. The main difficulty is that students are not randomly assigned to classrooms and schools. Families generally self-select schools based on their educational and residential preferences. Similarly, within a given school, principals and teachers can generate a great deal of additional selection by assigning students to classrooms based on observable and unobservable traits (e.g., ability). Ignoring this identification challenge, often referred to as the *selection problem* (Sacerdote 2001), is likely to result in biased estimates of peer effects.^{4}

In addition to statistical challenges, the level of aggregation used in constructing peer measures is equally important. For example, the classroom setting conceivably facilitates learning spillovers in a way that nonclassroom interactions do not (Burke and Sass 2013).^{5} Under this scenario, aggregation at the grade level may not fully reflect the true peer effects. Relatedly, gender segregation at the classroom level may be more policy-relevant than those at the grade or school level.

As previously noted, the literature on peer achievement and its effects on student outcomes is vast. Studies focusing on gender peer effects, however, are rather limited, with a few notable exceptions. Hoxby (2000), using data on public primary schools in Texas, found that male and female students have higher test scores when grades have a higher proportion of female students. That study relied on variations in the composition of students at the grade level across adjacent cohorts within the same school. Absent randomized data, exploiting the grade-level variation, is less problematic than the classroom variation because families and/or school administrators are also likely to sort students into classrooms within a grade. Under the assumption that the variations at the grade level in students’ gender composition is random, the estimates yielded causal gender peer effects. Moreover, Hoxby (2000) concluded that gender peer effects do not operate solely through peer achievement but also through other mechanisms, such as changes in the classroom/learning environment and/or changes in interstudent and teacher-student relations. Using the same estimation strategy and Israeli data, Lavy and Schlosser (2011) also showed that a higher proportion of female students benefit both male and female students at different levels of K–12 education.

To the best of my knowledge, Whitmore (2005), Lu and Anderson (2015), and Antecol et al. (2016) are the only studies to examine the gender peer effects on achievement by exploiting the random assignment of students to either classrooms or subgroup of classrooms. The findings from Whitmore (2005), using Tennessee’s Project STAR experiment, suggest that female students have a positive effect on both male and female students’ achievement in kindergarten through second grade. In the third grade, however, male students perform worse if they are in a class with a higher fraction of female students. Whitmore (2005) did not provide a detailed analysis of disentangling the effect of peer achievement from that of gender composition effects, and it is not clear whether the observed effects for the female proportion merely proxy the achievement differences between boys and girls.

Lu and Anderson (2015) use randomized data from a Chinese middle school. Specifically, seventh grade students were assigned to blocks of rows in the classrooms based on height and then were randomly assigned to seats within blocks. Relying on this within-block randomization and controlling for peer achievement, the authors showed that being surrounded by more female students increased the test scores of girls. Boys, however, were not affected and even hurt from greater presence of female students. Finally, exploiting the random assignment of primary school students within schools in disadvantaged neighborhoods, Antecol et al. (2016) did not find any effect of female proportion in the classroom on achievement, irrespective of student gender.

## Data and Falsification Tests

### Data

Teach for America (TFA, founded in 1989) and The New Teacher Project (TNTP) Teaching Fellows (founded in 1997 by a TFA alumna) were both established to reduce the educational inequities in highly disadvantaged neighborhoods throughout the United States. These nonprofit organizations recruit recent college graduates and mid-career professionals from several different academic majors to teach “hard-to-staff” subjects, such as mathematics and science, in schools in low-income communities.^{6} Unlike traditionally certified teachers, the TFA and Teaching Fellows (TF) teachers hold alternative certifications that enable them to teach prior to completing all certification related course work and student teaching (see Clark et al. 2013 for detailed description of each program).

The U.S. Department of Education funded the evaluation of TFA and TNTP Teaching Fellows programs to assess the effectiveness of different teacher certifications in mathematics (i.e., traditional vs. alternative) on student outcomes. Mathematica Policy Research (MPR) conducted the study based on random assignment of middle school and high school students to math classrooms (Clark et al. 2013). To be eligible for the study, schools must have (1) at least one TFA or TF teacher and one control teacher teaching the same math course; (2) classrooms that meet in the same period of the school day; and (3) classrooms that provide the same level of instruction (e.g., honors or regular) and that are similar in terms of class size, teacher aides, and math course length.^{7} These data are ideal for my purposes because within each participating school, students who were enrolled in the same math course at the start of the school year were randomly assigned to a math classroom taught by a TFA or TF teacher and to a comparable classroom taught by a control teacher (i.e., TFA vs. control or TF vs. control). Therefore, randomization is done at the block level such that each block represents a classroom match in any given school.

MPR compiled the data set from 80 schools in 20 school districts covering the school years 2010–2011 and 2011–2012.^{8} Separate samples of districts/schools/teachers were recruited for each of the two study years, although some schools/teachers were retained in the second year of the study to increase the sample size; teachers who were retained in the second year had new classrooms with new randomly assigned students. The study includes 230 classroom matches: 260 TFA/TF and 260 control classrooms. The total number of classrooms is more than twice the classroom matches because approximately 12 % of the classroom matches contain more than one TFA/TF and one control classrooms (e.g., three algebra I classrooms in the school).^{9} I limit the study to high schools only, and the math courses include general high school math, algebra I, geometry, and algebra II.^{10} There is nonnegligible evidence of nonrandom sorting of students to math classrooms in the middle school sample. To be conservative, I do not report/discuss the middle school results in the text but provide falsification tests and main results in Online Resource 2.

The data contain 5,320 student observations from high schools who were randomly assigned to a TFA/TF or a control classroom at the beginning of the school year. Regular roster checks throughout the year (in the fall, in the first week of spring semester, and toward the end of school year) were administered to monitor the integrity of the experiment. Information on student achievement comes from a variety of sources. Baseline math achievement test scores were available through the district administrative records and were measured for all students as students’ most recent prior scores on end-of-year state assessment exams.^{11} The state endline assessments were missing for the majority of the high school students. Federal law requires states to assess students only once in high schools, and there is no uniformity across the states on the timing (grade level) of assessment. Therefore, MPR administered a computer adaptive end-of-year math assessment exam at the high school level. The assessments were exclusively for the course taken and were developed by the Northwest Evaluation Association (NWEA). Unfortunately, the data set does not contain any information on endline reading achievement, and thus the achievement outcome of interest throughout this article is the endline math test scores. Scores were standardized with respect to either the state or the national reference population. Specifically, the reference population for state assessment (baseline) math test scores was the entire set of students who took the same assessment in the same state, year, and grade; and for the endline math test scores, it was the nationwide NWEA normative sample.

The first column of Table S1 in Online Resource 1 presents the descriptive statistics for the randomized sample. Columns 2 and 3 present the same statistics broken down by TFA/TF and control classrooms. Proper implementation of the randomization requires the baseline characteristics to be similar between TFA/TF and control classrooms. To test this formally, I run a regression of the treatment indicator variable (taking the value of 1 if the student is in a TFA/TF classroom) on several selected baseline characteristics, conditional on block (classroom match) fixed effects. The fourth column of Table S1 displays the coefficient estimates from this exercise where each cell represents a separate regression. None of the coefficient estimates are statistically significant at conventional levels. Given the nature of randomization in the data, I include the block fixed effects in all of specifications throughout this study.

One other concern in experimental designs is the potential contamination from nonrandom attrition. From the initial randomization of 5,320 student observations, I lose roughly 30 % because of missing endline math test scores and/or students moving out of the school districts and another 10 % because of missing baseline math test scores. To analyze the presence of any potential nonrandom attrition, I run a regression of nonresponse indicator (taking the value of 1 if endline math test score is missing, and 0 otherwise) on a treatment dummy variable (TFA/TF or control classroom), baseline characteristics, proportion of female students in the classroom, and block fixed effects. The coefficient estimates from this exercise are all statistically insignificant and are reported in the last column of Table S1. Overall, this evidence provides some assurance that attrition is not likely to bias results.^{12} Also important to note is that I construct peer measures (described later) using all nonmissing information from the relevant variable of interest (i.e., student’s gender for gender peer effect), regardless of the availability of math test score information.

Thus far, I have discussed the beginning-of-year randomized sample and provided satisfactory evidence of the integrity of the experiment. However, not all the students in the end-of-year study classrooms were randomly assigned. Recall that students were randomly assigned to classrooms as long as they were enrolled in the math courses at the start of the school year. Schools were free to nonrandomly place the new (late) enrollees afterward. In addition, some of the randomly assigned students did not comply with the experiment over the school year and either switched across the study classrooms (crossovers) or moved to nonstudy classrooms in the same school or in nonstudy schools. The availability of the year-end classroom rosters allows me to observe the students over the school year and to construct the study sample. Specifically, the study sample includes all high school students (1) who were randomly assigned to classrooms at the beginning of the school year and stayed in the assigned classroom throughout the year (compliers), and (2) who switched across the study classrooms during the school year (crossovers). The average proportion of the students in the end-of-year classrooms who were randomly assigned is approximately 90 %.^{13}

Column 1 of Table 1 presents the descriptive statistics for the full study sample; columns 2 and 3 report the same statistics for female and male students, respectively.^{14} As shown in columns 2 and 3, boys outperform girls in the state math assessments by 0.07 of a standard deviation, on average, and they are also more likely to be white and receive special education programs. Next, I break down the study sample by compliance status (columns 4 and 5) and include the summary statistics for the 500 nonrandomly assigned students in the last column of Table 1.^{15} Compliers score significantly higher in both baseline and endline math tests than all other students. The percentage of students receiving special education is lower among the set of compliers, and the late enrollees are more likely to be male students (column 6).

Finally, to circumvent any concerns on endogenous teacher sorting (i.e., teachers having a preference for a classroom with more female students), I run a regression of beginning-of-year proportion of female students in the classroom on selected teacher characteristics. The first column of Table S2 in Online Resource 1 presents the results. None of the coefficient estimates are statistically significant. Results from a similar exercise wherein I ran a regression of average classroom baseline math achievement on teacher characteristics are shown in the second column of Table S2; the coefficient estimates are not different from 0. (See also Clark et al. (2013) for detailed information on teachers in the experiment.)

### Falsification Tests

*i*in classroom

*c*and block

*b*, $F\u2212i,cbt0$ is the proportion of female students in classroom

*c*and block

*b*at time

*t*

_{0}(beginning of the year) excluding student

*i*,

*λ*

_{b}is the block fixed effects (i.e., same period math course in any given school), and ε

_{icb}is the error term. Random assignment of peers to math classrooms within blocks requires the estimate of

*π*

_{1}to be equal to 0. I can extend this falsification test by simply replacing the dependent variable with any other student characteristics.

The randomization test results using Eq. (1) are provided in Table 2. The first column gives the coefficient estimates for the full study sample, and columns 2 and 3 present the results for female and male students, respectively. The coefficient estimates on the proportion of female peers from a total of 21 regressions are all statistically indifferent from 0.

To further examine the integrity of the randomized experiment, I conduct two more tests. First, I run a regression of each student characteristics on average peer baseline achievement in classroom *c* and block *b* at time *t* excluding student *i*$Y\xaf\u2212i,cbbase,t0$, conditional on block fixed effects. The results from this exercise are reported in panel A of Table 3, which includes 19 regressions. Only one significant estimate and one marginally significant coefficient estimate are obtained: the effect of average peer achievement on the black student indicator is statistically significant at the 10 % level for the full sample and at the 5 % level for females. Still, however, the evidence is not strong.^{16}

*π*

_{1}. Random assignment therefore may not appear random, whereas positive sorting of students may appear random. The proposed solution by Guryan et al. (2009) is to control for the relevant group mean from which the peers are selected. The adjusted falsification test equation is given by

*b*at time

*t*

_{0}, and all other variables are as previously defined. Panel B of Table 3 presents the test results from this exercise. The coefficient estimates in columns 1–3 are all indistinguishable from 0. Overall, these results provide some assurance that peers are indeed randomly assigned to math classrooms within blocks. In addition, because I use student’s gender and predetermined achievement in constructing the peer measures, the reflection problem is not likely to be a major concern.

## Empirical Methodology

*i*, in classroom

*c*and block

*b*;

*SC*

_{icb}is the set of individual student characteristics (i.e., gender, race/ethnicity, free lunch eligibility, LEP and special education status, grade fixed effects);

*TC*

_{cb}is the set of teacher characteristics (i.e., TFA, TF, and control teacher status; gender; race/ethnicity; and years of experience);

*u*is the error term; and

*λ*

_{b}, $F\u2212i,cbt0$, and $Yicbbase$ are as previously defined.

^{17}

*c*and block

*b*at time

*t*

_{l}(end of the school year). $F\u2212i,cbt1$ reflects the actual proportion of female students in the classroom, but the estimation of Eq. (4) via ordinary least squares (OLS) is likely to produce biased results because of the nonrandomly assigned students (crossovers and late enrollees) in the classrooms. To address this concern, I identify the gender peer effects by instrumenting the end-of-year female proportion ($F\u2212i,cbt1$) with the beginning-of-year female proportion ($F\u2212i,cbt0$). By doing so, I estimate the gender peer effects for students who were randomly assigned at the start of the school year (compliers and crossovers). In addition, unlike Hoxby (2000) and Whitmore (2005), I can control for own baseline achievement in the specifications, giving results a value-added interpretation.

*c*and block

*b*at time

*t*

_{l}, and all other variables are as previously defined. The estimated effect of $Y\xaf\u2212i,cbbase,t1$ on endline math test scores is also likely to be contaminated by nonrandomly assigned students; therefore, similar to gender composition, I instrument the end-of-year average peer baseline math achievement ($Y\xaf\u2212i,cbbase,t1$) with the beginning-of-year average peer baseline achievement ($Y\xaf\u2212i,cbbase,t0$).

_{1}. As discussed in Hoxby (2000), failure to address the potential nonlinearity in peer achievement and conditioning on only average math test scores may lead to omitted variable bias given that these variables are likely to be correlated with the female proportion itself. Moreover, the literature provides substantial evidence on the nonlinearity of peer achievement (see, e.g., Antecol et al. 2016; Burke and Sass 2013; Imberman et al. 2012). As such, I construct three achievement indicators denoting the student’s own math baseline achievement tercile

*k*(

*k*= top third; middle third; bottom third). I then interact these tercile dummy variables with the average peer baseline achievement. The estimation equation is given by

*i*’s baseline math test score is in tercile

*k*(

*k*= top third; middle third; bottom third) of the course-level baseline achievement distribution (i.e., $Iicbtop$ takes the value of 1 if the student

*i*is in the top third of the baseline math achievement distribution among all students taking algebra I course).

^{18}One can also relax the linear-in-means assumption in Eqs. (4)–(6) to allow for nonlinear gender peer effects by replacing $F\u2212i,cbt1$ with $D\u2212i,cbg,t1$, where $D\u2212i,cbg,t1$ is an indicator of whether the end-of-year classroom is in tercile

*g*(

*g*= top third; middle third; bottom third) of the course-level distribution in terms of its share of female students.

Of course, for coefficient estimates of peer effects to be interpreted as causal, I need to instrument the nonlinear peer measures with the corresponding beginning-of-year terms. Specifically, I instrument $Iicbk\xd7Y\xaf\u2212i,cbbase,t1$ with $Iicbk\xd7Y\xaf\u2212i,cbbase,t0$ and use $D\u2212i,cbg,t0$ as an instrument for $D\u2212i,cbg,t1$ when I replace the single treatment $F\u2212i,cbt1$ with the categorical variables in Eqs. (4)–(6). To serve as benchmark, I also estimate the peer influences via OLS using the beginning-of-year peer information. Finally, I report the standard errors clustered at the block level for all my estimates. The results of this study remain intact if I instead cluster the standard errors at the classroom or school level.

## Results

### Extent of Variation in Gender Peer Effects

Because my identification strategy hinges upon random variation within blocks, I have to (1) validate the existence of large enough variation in the data and (2) confirm that the results reported throughout the text are not driven by a few blocks with a very few of one or the other gender. I first display the distribution of the proportion of female students in the beginning-of-year and end-of-year classrooms in panels A and B, respectively, of Fig. S1 in Online Resource 1. The histograms closely resemble a Gaussian distribution with an average female proportion of .53 in the classroom.^{19} Next, I examine the variance decomposition in the proportion of female students. As shown in Table 4, sufficient within-block variation exists in the data, and this random variation accounts for approximately 23 % of the total variation in the proportion of female students. The final piece of evidence regarding the extent of variation comes from Fig. S2 in Online Resource 1, which shows distribution of the within-block standard deviation in the proportion of female students. Overall, large outliers are not seen, and thus it is unlikely for the results to be driven by a few blocks.^{20}

### Student Achievement and Peer Effects

#### Linear-in-Means Estimates of Gender Peer Effects

The linear-in-means estimates of gender peer effects using the beginning-of-year peer measures are shown in Table 5. Results are presented separately for female and male students (columns 1–3 and columns 4–6, respectively). The estimates in Table 5 are based on three specifications. Columns 1 and 4 provide estimates as in Eq. (3). To identify the nonachievement channel, I add the baseline peer achievement linearly (i.e., similar to Eq. (5) but with the beginning-of-year peer measures) in columns 2 and 5, and nonlinearly (i.e., similar to Eq. (6) but with the beginning-of-year peer measures) in columns 3 and 6. Table 6 shows the IV estimates using the same specifications. For brevity, unless a qualitative difference exists between OLS and IV estimates, I concentrate on the latter given that the IV regressions show the actual peer effects for students who were randomly assigned at the start of the school year.^{21} In all tables throughout the text, I also report the coefficient estimates from student’s own baseline math test score.

Focusing first on female students, it appears that having a higher proportion of female students in math classrooms significantly increases the math achievement of girls. The effect size is 1.10, implying that a 10 percentage point increase in the proportion of female peers increases the average math test scores by 0.11 of a standard deviation (column 1, Table 6).^{22} To put this finding in perspective, the estimated effect is approximately equal to the half of the effect observed from a 1 standard deviation increase in teacher quality (Hanushek 2011). Controlling for peer achievement linearly or nonlinearly (columns 2 and 3, respectively) decreases the coefficient estimate on female proportion only very slightly, and the coefficient estimate continues to be statistically significant.

Turning to the effects of peer baseline achievement, I do not observe any impact of average peer achievement on female students’ math scores. Considering column 3 of Table 6, where I add peer achievement measures nonlinearly, the impact is negative but insignificant for female students in the top third and middle third of the baseline achievement distribution. It appears that an improvement in peers’ average math achievement in the classroom has no effect on female students’ math test scores.

Columns 4–6 of Table 6 present the estimates of peer effects for male students. The estimated effects of female share on boys’ math achievement are also positive, but they are not statistically different from 0, irrespective of the specification. Unlike girls, however, male students seem to perform better when there is an improvement in the average math achievement of the classroom. The effect size is 0.61, implying that a 1 standard deviation increase in peer achievement increases boys’ math achievement by 0.42 of a standard deviation (the sample standard deviation for peer math achievement is equal to 0.70). This increase is a large improvement, and the magnitude is consistent with previous studies of peer effects. For example, Imberman et al. (2012) obtained an effect in the range of 0.15 to 0.33 of a standard deviation on math achievement for all students in secondary education. When I estimate the peer achievement effects by pooling male and female students, the point estimate is 0.226 (SE = 0.170). I also find the linearity assumption to be too restrictive for peer achievement and that allowing for nonlinearity reveals some important information on the peer effects dynamics in high school math classrooms. Specifically, male students in the middle third and bottom third of the achievement distribution seem to benefit the most from an improvement in average baseline math achievement.

One could argue that end-of-year classroom peer measures do not fully reflect the student interactions observed over the entire school year, given that some students crossed over study classrooms while others moved out of school districts at some unknown time. To address this concern, I construct alternative gender composition and peer average achievement measures using information from the roster checks of fall, spring, and the end of the year. Specifically, I average the peer measures from these three rosters and instrument the average measures with the beginning-of-year information. The IV results reported in Table S3 in Online Resource 1 show that the coefficient estimates are virtually identical to the estimates presented in Table 6.^{23}

Finally, as an alternative to measuring achievement of all peers in the classroom, I include average gender-specific peer baseline math achievement in the specification described in Eq. (6). For brevity, I concentrate on OLS and IV results from the most extensive specification. Table 7 reports the results. As shown in the second column of the table, the estimated effect of the proportion of female students remains almost intact, and I do not find any evidence for an effect of average female peers’ achievement on the math achievement of girls. For boys, however, it appears that it is the average of the entire classroom—not just same-sex peers’ achievement—that matters for their respective endline test scores (column 4, Table 7).

#### Nonlinear Estimates of Gender Peer Effects

To gain further insights into the extent of peer interactions, I relax the linear-in-means model and explore the potential nonlinearity in gender peer effects by replacing the single treatment in the regressions with the categorical variables. Table 8 presents the results.^{24}

As shown in column 2 of Table 8, switching into the second and third terciles in the proportion of female students improves female students’ math test scores. The effect, however, is larger and more precisely estimated for classrooms in the middle third in terms of their share of female students. Although I cannot reject the null hypothesis that the gender composition coefficients are equal, the patterns tentatively suggest diminishing marginal benefits of having more girls in the classroom. For peer achievement effects, the coefficient estimates continue to be statistically insignificant and similar in magnitude to that of Table 6.

Turning to male students (columns 3 and 4, Table 8), the findings are similar to those presented in the previous section. Specifically, the estimated effects of female proportion are positive but insignificant, irrespective of class share; and male students in the middle third and bottom third of the achievement distribution continue to benefit the most from an improvement in the average peer achievement of the classroom.^{25}

#### Heterogeneous Effects by Math Course Type and Power Calculations

In this section, I attempt to extend the analysis to see whether there are any differential peer effects by math course type. Table S4 in Online Resource 1 shows several gender composition statistics and variance decompositions disaggregated at the math course level. There is sufficient within-block variation in the proportion of female students across math courses. As such, random variation accounts for at least 15 % and 18 %, respectively of the total beginning-of-year and end-of-year variation in the proportion of female students across math courses (columns 4–9, Table S4).

Table 9 presents peer effects estimates. For the sake of brevity, I report only the IV results from the most extensive specification. Looking at the gender peer effects from less-rigorous math courses (such as general high school math or geometry), I continue to observe positive and significant effects for females (columns 1–2, Table 9). The coefficient estimate for the more-advanced math course (algebra II), however, reverses sign, and the estimate is indistinct from 0 (column 3). It appears that having a higher proportion of female students in the classroom improves girls’ math scores in only less-advanced math courses. For males, estimates of gender peer effects are positive for algebra I and geometry, but they continue to be insignificant (columns 4–6). For peer achievement effects, the patterns across math courses are similar to those obtained from the full sample.

I also explore the heterogeneity along the dimensions of student’s race; free lunch eligibility; and own baseline achievement; as well as several teacher characteristics, including teacher’s type (TFA/TF vs. control teacher) and teacher’s gender. I do not observe strong evidence for heterogeneity over student and teacher characteristics, with one notable exception: even though the coefficient estimates on gender peer effects interacted with own baseline achievement terciles are not statistically different from each other, gender peer effects appear to be more pronounced for female students in the top third and middle third of the baseline achievement distribution. For male students, I continue to observe no significant effects, irrespective of the location over the baseline achievement distribution. These results are available upon request.^{26}

Randomized experiments offer several advantages over survey-based data sets in terms of identifying peer effects, but they may also suffer from lack of statistical power to detect economically interesting hypotheses. Gender peer effect estimates for male students presented thus far have the same sign as those for female students, but they are consistently statistically insignificant (in addition to being smaller in magnitude). To examine power issue further, I follow McCrary and Royer (2011) and discuss the sample sizes required to rule out null hypotheses of no effect—the so-called power calculations.^{27} Specifically, for point estimates presented in Tables 6 and 9, I compute the minimal percentage increase in sample sizes, relative to the original samples, required to reject the null hypothesis of no effect.

As shown in the last column of Table 6, a sample that is roughly five times larger is required to rule out no effect, assuming that the point estimate in the larger sample is the same as what I obtained (0.378). For detailed math courses, Table 9 shows that a sample twice as large is warranted to rule out no effect in general math and algebra I (column 4); a sample at least 20 times as large is required to rule out no effect in the geometry course, assuming that the point estimate in the larger sample is the same as that from column 5 of Table 9. The necessary samples for rejecting the null hypothesis are not implausibly large; therefore, evidence from power calculations is not sufficient to conclude firmly that a higher proportion of female students has no effect on boys’ math achievement in less-advanced courses.

### Student Absence and Peer Effects

Thus far, I have focused on the effects of peer influences on student achievement. In this section, I take the analysis one step further and examine how peers affect student absenteeism. In the last decade or so, research interest in the role of school attendance on juvenile crime has grown. Jacob and Lefgren (2003) and Luallen (2006) showed that property crimes committed by juveniles decrease when school is in session, while violent crime rates increase on these same days. These researchers attributed the drop in property crime to an incapacitation effect in which schools keep juveniles occupied, leaving less time to commit crimes. The increase in the violent crime, on the other hand, is a by-product of a concentration effect: keeping children in school increases the number of potential interactions that facilitate violent delinquency. Berthelon and Kruger (2011) and Anderson (2012) found further supporting evidence for the incapacitation effect by showing that more time spent in school decreases not only juvenile property crime but also violent crime and teen pregnancy. A common perception arising from all these studies is that the crime-reducing effects of schools are significantly more pronounced in disadvantaged neighborhoods. Given these mediating effects of school attendance on risky behavior, it would be interesting to see how gender and achievement of peers affect student absenteeism. Considering that the incapacitation effect is more pronounced among students from disadvantaged neighborhoods, it is even more appealing to examine the link between peers and school attendance using the data at hand.

Math teachers were asked to report the students’ absentee rates from the math classrooms at the end of the school year using the following scale: (1) 0 % to 25 %; (2) 26 % to 50 %; (3) 51 % to 75 %; and (4) 76 % to 100 %. The absentee rates are available for 1,250 female and 1,010 male students.^{28} I describe absentee rates of more than one-fourth (26 % or more) to be chronic (Clark et al. 2013). Average chronic absenteeism in the sample is slightly more than 10 %, and interestingly, is similar among female and male students (last row, Table 10). The OLS and IV estimates given in Table 10 are based on the most extensive specification, in which I control peer achievement in a nonlinear fashion. For female students, neither the proportion of female peers in the classroom nor the peers’ average baseline math achievement seems to affect the probability of chronic absenteeism (columns 1 and 2, Table 10).

Turning to male students, the proportion of female peers significantly decreases the probability of chronic absenteeism in math classrooms. The effect size in column 4 of Table 10 is −0.84, which implies that a 10 percentage point increase in the proportion of female students in math classrooms decreases the probability of male chronic absenteeism by roughly 0.08. Considering the sample mean of chronic absenteeism among male students, the reduction is approximately 60 % of the sample mean. Unlike gender effects, however, I do not observe any impact of peer achievement on male student absenteeism (columns 3 and 4).

I also experimented with a similar analysis using total days absent from school as the outcome of interest. Unfortunately, administrative records are available for only a smaller subset of students (1,150 female and 930 male students). I ran the specifications as in Table 10 by (1) using only the nonmissing observations, and (2) imputing gender-specific sample means for missing observations. Overall, I continue to find no effect of gender composition for females, irrespective of how I treat nonmissing observations. For males, however, a 10 percentage point increase in the proportion of female students in math classrooms decreases total days absent from school by approximately 2.5 days. Taking the sample average for males as the benchmark, the estimated effect implies a 20 % reduction in student absenteeism.

Finally, I construct another measure of chronic absenteeism based on administrative records on total days absent from school. Specifically, using the national average of school year length as the benchmark (180 days), I define an indicator function that takes the value of 1 for total days absent from school of more than 45 days (one-fourth of the average school year). Using this alternative measure of outcome variable, I ran my most extensive specification for both female and male students. I continue to find no effect for female students, whereas the point estimate on the proportion of female students is −0.300 (SE = 0.127) for males, implying that a 10 percentage point increase in the proportion of female students in math classrooms decreases the probability of chronic absenteeism by approximately .03. Considering the sample mean of this alternative measure among male students (slightly more than 5 %), the reduction is almost identical to that obtained from the last column of Table 10. All these additional estimations are available upon request.

Coupled with gender composition effects on achievement, it appears that either the attendance effects are not large enough to generate any significant achievement gains for male students, or male students with absenteeism problems do not gain from attending more to math classrooms. In an attempt to provide further insights, I add chronic absenteeism indicator to the achievement equation as an additional control variable (keeping in mind that the variable is endogenous to the model). The coefficient estimates on chronic absenteeism indicator are −0.125 (SE = 0.057) and −0.030 (SE = 0.073) for female and male subsamples, respectively. Considering the magnitude of the effect of chronic absenteeism on math achievement for males, this exercise may lend tentative support to the first argument.

### Robustness Checks and Additional Estimates

I undertake several sensitivity checks to examine the robustness of the results. First, I exclude student and teacher controls from the specifications. Consistent with the integrity of the experiment, the peer effects estimates are all similar in magnitude to those presented in this article (Online Resource 1, Table S5). Second, I choose different cutoffs to describe the nonlinearity in gender composition and peer achievement (i.e., top 20 % and bottom 20 %). Doing so does not alter conclusions (see, e.g., Online Resource 1, Table S6).^{29}

Third, I replace average peer baseline achievement with the fraction of peers in classroom *c* in the bottom 20 % and top 20 % of the course-level baseline achievement distribution (the omitted category is peers in the middle ability group). I then interact these proportional peer achievement measures with indicators for student’s own position in the math baseline achievement distribution: that is, for students in the bottom quintile, the effect of a 1 percentage point increase in the proportion of peers in the top quintile represents the impact of increasing top peers and reducing the proportion of peers in middle quintiles by 1 percentage point.

Two results emerge (Online Resource 1, Table S7). I find that (1) the estimated effects of gender composition in the classroom remain intact for both female and male students, and (2) both female and male students in the bottom quintiles are hurt from an increase in the proportion of similar peers relative to peers in the middle quintiles. In addition, the estimated effect for males is considerably more pronounced. Although the lack of precision over the achievement distribution does not allow me to firmly test different models of peer effects, I can still rule out the *boutique* model (student’s performance is highest when their peers are similar to themselves), the *invidious comparison* model (higher ability peers adversely influence the outcomes of students who are moved to a lower position in the local achievement distribution), and the *single crossing* model (a high-achieving peer has a greater effect on another high-achieving peer than on a low-achieving peer) (Hoxby and Weingarth 2005).^{30}

Fourth, I rerun the specifications with sample weights and doing so produces almost identical gender composition and peer achievement coefficient estimates (Online Resource 1, Table S8).^{31} Finally, I add the average block achievement ($Y\xaf\u2212i,bbase$) into all OLS and IV specifications and rerun them. The results presented in this article remain intact (see, e.g., Online Resource 1, Table S9).^{32}

### Discussion of the Potential Mechanisms

There are at least two potential explanations for the findings of gender peer effects on student achievement. The first one pertains to gender stereotypes about math ability. Making gender salient adversely influences female math achievement (Niederle and Vesterlund 2010; Pope and Sydnor 2010; Spencer et al. 1999). For example , Spencer et al. (1999) showed that when they informed study participants that females do worse on the test they were about to take, female test scores dropped substantially (approximately 50 %) compared with a similar test in which participants were not informed of previous gender differences. The authors attributed the underperformance of females to lower assessment of their math ability. Having more female peers in the classroom is likely to make gender less salient, which may have then improved the self-concept of girls and may explain the positive impact of more female students in the classroom on female math achievement. This explanation is also consistent with my finding of gender peer effects in only less-rigorous math courses. The adverse gender stereotype effects are likely to dissipate as female students gain more confidence and progress toward more-advanced courses.^{33}

The second explanation pertains to gender-specific attitudes toward competition. In a laboratory experiment, Gneezy et al. (2003) gathered college students in groups of six (three male and three female). Each student was asked to solve mazes under one of two schemes: a piece rate scheme or a tournament scheme. Students were paid a fixed price for each maze under the first scheme, and the students in the group with the highest number of solved mazes received compensation under the tournament setting. Gneezy et al. (2003) found no gender differences in performance under the piece rate scheme. In the tournament design, however, male students significantly increased their performance, but female students did not; as such, male students ended up solving 40 % more mazes than female students. Importantly, female students did as well as male students (no gender differences) in the tournament setting when working in single-sex groups. The authors attributed the improvement among females to their relative failure to perform at a high level when competing against males. Put differently, it appears that females compete only against their own gender. Given that males are more responsive to tournament settings in general, gender-specific attitudes toward competition may explain my findings of peer achievement effects for boys only: a higher average peer achievement in the classroom may indicate a more competitive environment in which girls do not respond (Tables 6 and 7). Improvements in female performance in the single-sex tournament designs may also explain why females are more responsive to a higher proportion of female peers in the classroom. More female peers may trigger competition among female students. Once again, it may not be completely surprising to see the effects in only less-advanced courses in which gender in math classrooms is more salient.

Turning to gender composition effects on student absenteeism, anecdotal evidence has shown that boys attending secondary schools have more positive attitudes toward school (e.g., liking school) in coeducation environments (Sullivan et al. 2012). Male students may simply consider classroom as a spot to socialize with opposite-sex peers and thus may choose to spend more time in school when the share of female students increases.

Comparing this study with recent studies on gender peer effects, I obtain somewhat different results: I find that a 10 percentage point increase in the proportion of female peers increases average math scores by approximately 0.1 of a standard deviation for female students, and I fail to find any significant effect for male students.

Whitmore (2005) found that increasing the classroom female share by 10 percentage points increases third grade female test scores (average of math and reading scores) by approximately 0.07 of a standard deviation but decreases male test scores by 0.08 of a standard deviation.^{34} Lu and Anderson (2015) obtained an impact of roughly 0.03 of a standard deviation for seventh grade female students’ test scores (weighted average of scores from seven subjects) from a 10 percentage point increase in the female share of neighboring students, whereas they found negative but statistically insignificant effects for males. Antecol et al. (2016) did not find any effect of share of female students on achievement in primary schools, irrespective of student gender. Finally, using the variations in the composition of students at the grade level across adjacent cohorts within the same school in Israel, Lavy and Schlosser (2011) found that a 10 percentage point increase in the female share increases eighth grade female test scores by 0.04 of a standard deviation but has no effect on male students. The same magnitude of increase at the high school level leads to an improvement in the test scores of male and female students by approximately 0.03 of a standard deviation. Differences in schooling levels (high school vs. primary/middle school) and/or country-specific factors may explain the discrepancy in my findings.^{35}

## Conclusion

Obtaining convincing estimates for peer effects is a daunting task. The major obstacle stems from the fact that students are not randomly assigned to teachers, classrooms, or schools. I overcome the challenges to identification in gender peer effects using a well-executed random experiment in which high school students enrolling in the same math course were randomly assigned across classrooms in given schools. Using this unique feature of the data, I examine the gender peer effects for students who were randomly assigned to classrooms at the start of the school year. To address the potential confounding effects resulting from the existence of late enrollees and noncompliers, I instrument the end-of-year peer measures with the information from the beginning-of-year classrooms.

This study yields five key findings. First, it appears that having a higher share of female peers in math classrooms improves the math test scores of female students. The observed effects for females seem to be driven by interactions in less-advanced math courses. For male students, I fail to find any statistically significant effects of more girls in the classroom, but the evidence from power calculations does not rule out positive effects of a higher female proportion on boys’ achievement in less-advanced math courses. Second, I find that girls are not affected by an improvement in the average peer baseline achievement of the classroom. Boys, however, perform better when the achievement level of the classroom increases, and it appears that it is the average of the entire classroom—not just same-sex peers’ achievement—that matters. Third, both gender composition and peer achievement effects exhibit some degree of nonlinearity. Fourth, gender peer effects do not seem to be very sensitive to the inclusion of controls for predetermined classroom achievement. This may indicate that gender peer effects are primarily driven by nonachievement channels. I propose mechanisms relating to lower gender stereotype influences and changes in gender-specific attitudes toward competition as potential explanations for my findings on peer effects. Finally, I observe that having a higher proportion of female peers in math classrooms decreases the probability of chronic absenteeism among male students but has no impact on girls.

Carell et al. (2013) cautioned against large-scale policy interventions because these types of interventions may have unintended consequences. Relatedly, the extent of variation in this study’s data is not very informative for large interventions. Therefore, it is not possible to make any predictions regarding the effect of single-sex versus coeducation classrooms. Then, the question is whether small-scale interventions in classrooms in disadvantaged neighborhoods leading to some degree of gender segregation would yield optimum outcomes. My results indicate that some degree of gender segregation at the classroom level may improve female achievement, but that is likely to be at the expense of male students. Moreover, gender segregation may also exacerbate student absenteeism for boys and further trigger risky behaviors, such as involvement in delinquent activities.

## Notes

^{1}

The number of public schools providing same-sex classrooms increased to approximately 400 schools in 2011 from 12 schools in 2002. There were also more than 110 same-sex schools nationwide in 2011 (Chandler 2011).

^{2}

A large literature has also examined the effect of peers on student outcomes in college (see, e.g., Carrell et al. 2009; Foster 2006; Lyle 2007; Sacerdote 2001; Stinebrickner and Stinebrickner 2006; Zimmerman 2003). Moreover, recent studies have examined peer effects in labor markets (see, e,g,., Arcidiacono and Nicholson 2005; Black et al. 2013; Bifulco et al. 2011, 2014; Mas and Moretti 2009).

^{3}

A few studies have also examined the impact of single-sex schooling on student outcomes (see, e.g., Doris et al. 2013; Jackson 2012; Park et al. 2013).

^{4}

The other major identification issue—referred to as the *endogeneity* or the *reflection* problem (Manski 1993; Moffitt 2001; Sacerdote 2001)—occurs because it is often difficult to separate the effect that the peer group has on the student from the effect the student has on the peer group. Suggesting a regression of own achievement on contemporaneous average peer achievement is problematic because these outcomes are jointly determined, and peer achievement is likely to be endogenous to the model. Apart from this threat to identification, peers usually share an environment and thus are potentially subject to similar shocks. Ignoring these correlated effects may also contaminate the estimated peer effects.

^{5}

Peer aggregation at the classroom level has been limited. See, for example, Antecol et al. (2016), Betts and Zau (2004), Burke and Sass (2013), and Hoxby and Weingarth (2005) for notable exceptions.

^{6}

The mission of the programs is not limited to secondary schools. Both TFA and TNTP recruit teachers to serve in primary schools in disadvantaged communities.

^{7}

The control teachers could have entered teaching through either a traditional route or a less-selective alternative certification program.

^{8}

Because the data are confidential, the sample sizes are rounded to the nearest multiple of 10.

^{9}

There are 320 teachers in the data (70 TFA, 80 TF, and 170 control group teachers), indicating that some teachers in the study taught in more than one block in different periods during the school day and in both survey years.

^{10}

The high school study sample includes 110 classroom matches: 120 TFA/TF and 130 control group classrooms from a total of 40 schools. There are also 160 teachers (20 TFA, 50 TF, and 90 control group teachers).

^{11}

Baseline test scores for all students in the same classroom match come from the same grade level. To ensure this, MPR administrators calculated the tenth percentile of the current grade level within the math classroom matches, identified the highest grade level that was less than the tenth percentile grade level and was a grade level in which end-of-year state assessments were administered, and then used that particular grade level to obtain the baseline test scores.

^{12}

I also examine whether the baseline characteristics of the attrites are correlated with beginning-of-year classroom measures (e.g., average classroom math achievement) but do not find evidence for any correlation. These results are available upon request.

^{13}

The average proportion of the compliers in the end-of-year classrooms is approximately 85 %.

^{14}

Information on student and teacher characteristics is not available for all student observations. I use dummy variables to control for the missing values in student and teacher characteristics.

^{15}

More than 300 (330) late enrollees had nonmissing baseline math test scores.

^{16}

An increase in the number of tests increase the likelihood of falsely rejecting the null hypothesis, the so-called multiplicity problem (Anderson 2008). Specifically, of 19 hypotheses, the probability of falsely rejecting at least one of the 19 null hypotheses at the 10 % level is 1 – 0.9^{19} = 0.865. Therefore, rejection of one or two hypotheses among many does not necessarily pose a threat to randomization.

^{17}

Only 47 noncomplier students switched to a different block. For ease of notation, I use *λ*_{b} in the equations throughout this article, but I control for the end-of-year block fixed effects in the estimations of peer effects.

^{18}

As an alternative, I use the grade-level baseline achievement distribution to specify the achievement terciles. The results from this exercise are virtually identical to those presented herein and are available upon request.

^{19}

The mean of within-block difference in the proportion of female students across classroom matches is .1, with a standard deviation of .09. This difference also ranges between 0 and .55.

^{20}

I also examine the variance decomposition in the average math achievement of the classrooms. The within-block variation accounts for approximately 4 % of the total variation in the average math achievement.

^{21}

The coefficient estimates on the instrument from the first stage are 0.718 (SE = 0.075) and 0.750 (SE = 0.073) for female and male subsamples, respectively, and the first-stage *F*-statistics are 89.92 and 102.96, respectively.

^{22}

I interpret the coefficient magnitudes using the NWEA nationwide normative sample.

^{23}

I also examine the effects of peer achievement by excluding classroom gender composition measures. The coefficient estimates on peer achievement effects remain intact. I also add interaction terms between average peer math achievement and gender composition in the classroom to look at any interactive impact. These additional terms are all statistically indifferent from zero.

^{24}

Using the Angrist-Pischke first-stage statistic, I reject the null hypothesis of weak identification for all five endogenous variables.

^{25}

Compared with IV regressions, OLS estimates reported in columns 1 and 3 of Table 8 are less significant and are smaller in magnitude.

^{26}

Apart from the main focus of this study, I examine the effects of student-teacher gender match on math achievement in high schools and fail to find any significant effect of gender match on either female or male students’ achievement.

^{27}

Following McCrary and Royer (2011), power calculations for a two-sided test are based on the following inequality:

$N\u2212nN>1.962\theta \u0302N\u2212\theta 0SE\u0302\u22121,$

where θ_{0} is the point hypothesis to be tested (e.g., 0 in this case), $\theta \u0302N$ is the point estimate that I expect to obtain in the large sample size *N* (e.g., the actual point estimates from the randomized sample *n*), and $SE\u0302$ is the estimated standard error using this sample.

^{28}

I also estimate gender composition and average effects of peer achievement for the subsample of students with nonmissing absentee rates (1,250 females and 1,010 males). The findings for peer effects are very similar, and these additional results are available upon request.

^{29}

I also estimate a model similar to Eq. (6) where I construct achievement indicators denoting student’s own math baseline achievement at each decile of the course-level baseline achievement distribution. Viewing the complete set of results, it appears that male students in the first five deciles of the baseline achievement distribution seem to benefit the most from an improvement in average math achievement.

^{30}

I also experiment with the same analysis using fraction of gender-specific peers in the classroom. The results reinforce findings from Table 7 that it is the average of the entire classroom—not just same-sex peers’ achievement—that matters for male students.

^{31}

The sample weights reflect the probability of assignment of a student to a TFA/TF classroom.

^{32}

In addition to these robustness checks, I first estimate the gender peer effects separately for TFA and TNTP subsamples. Next, I interact student and teacher controls with indicators for student’s position in the math baseline achievement distribution. The coefficient estimates are similar to those reported throughout this article.

^{33}

Comparing the mean, median, as well as the range of the proportion of female students in the classrooms, I do not observe any discernible patterns across math subjects.

^{34}

Whitmore (2005) also reported the pooled sample results from a specification in which the share of female peers and average achievement of the classroom were simultaneously controlled for. The estimated effect of the share of female students falls from 2 percentile points to 1.3 when achievement of the classroom is included to the gender peer effects specification.

^{35}

Psychology literature offers ample evidence regarding the relationship between peer influences and age. Conformity to peer pressure follows an inverted U-shaped age pattern, with peer effects peaking during mid-adolescence (see, e.g., Blakemore and Choudhury 2006; Brown et al. 1986).