Abstract
Girls who experience father absence in childhood also experience accelerated reproductive development in comparison with peers with present fathers. One hypothesis advanced to explain this empirical pattern is genetic confounding, wherein gene-environment correlation (rGE) causes a spurious relationship between father absence and reproductive timing. We test this hypothesis by constructing polygenic scores for age at menarche and first birth using recently available genome-wide association study results and molecular genetic data on a sample of non-Hispanic white females from the National Longitudinal Study of Adolescent to Adult Health. We find that young women’s accelerated menarche polygenic scores are unrelated to their exposure to father absence. In contrast, polygenic scores for earlier age at first birth tend to be higher in young women raised in homes with absent fathers. Nevertheless, father absence and the polygenic scores independently and additively predict reproductive timing. We find no evidence in support of the rGE hypothesis for accelerated menarche and only limited evidence in support of the rGE hypothesis for earlier age at first birth.
Introduction
Early pubertal timing is associated with risk for adverse behavioral and health outcomes across the life course (Karapanou and Papadimitriou 2010; Mendle et al. 2009; Tamakoshi et al. 2011). Pubertal timing is influenced by both genetic and environmental factors (Ellis 2004; Ellis et al. 2011; Polderman et al. 2015). One environmental factor consistently associated with earlier pubertal timing is exposure to childhood adversity (Belsky et al. 1991; Ellis 2004)—in particular, absence of the biological father from the home (Webster et al. 2014). Two hypotheses are commonly advanced to explain the association between absence of the biological father and early puberty. One hypothesis, derived from evolutionary life history theory (Ellis et al. 2009; Del Giudice et al. 2015; Stearns 1992; Trivers 1972), is that father absence triggers accelerated maturation in the developing organism (Belsky et al. 1991). An alternative hypothesis is that the association is an artifact of gene-environment correlation (rGE) (Barbaro et al. 2017; Rowe 2002).
We report findings from a molecular genetic test of this rGE hypothesis. Using results from a recent genome-wide association study (GWAS) of age at menarche (Day et al. 2017), we quantify genetic influences on pubertal timing in a sample of female adolescents from the National Longitudinal Study of Adolescent to Adult Health (Add Health). We test whether the genetics discovered in GWAS of menarcheal timing (1) predict timing of puberty in Add Health girls, (2) correlate with girls’ exposure to father absence, and (3) explain any association between father absence and early puberty. We find no evidence to support the rGE hypothesis with respect to the genetics associated with menarcheal timing. We also consider genetics discovered in GWAS of age at first birth (Barban et al. 2016)—a related reproductive phenotype—and find limited evidence in support of the rGE hypothesis.
Background
Early Puberty and Life Course Development
Pubertal timing is of central interest to demographers because it signals the start of reproductive capacity. For girls, age at menarche is associated with a cascade of behaviors and events in the transition to adulthood that relate to fertility and health outcomes. Earlier menarche is associated with earlier sexual debut, first pregnancy and birth, and marriage (Kiernan 1977; Sandler et al. 1984; Udry 2008). These associations replicate across cultural contexts (Udry and Cliquet 1982). Earlier pubertal timing is also associated with greater risk-taking behavior more generally (Igra and Irwin Jr. 1996; Patton et al. 2004). Age at menarche is negatively correlated with poor health outcomes, including type 2 diabetes (He et al. 2010), cardiovascular disease risk (Feng et al. 2008; Lakshman et al. 2009; Remsberg et al. 2005), breast cancer (Stoll et al. 1994), and mortality (Charalampopoulos et al. 2014; Tamakoshi et al. 2011).
Father Absence and Accelerated Reproductive Development
Rooted in life course and evolutionary frameworks, a robust literature has documented associations between early childhood environment and reproductive profiles across adolescence and into adulthood (see Ellis 2004 for a thorough review). In particular, children exposed to psychosocial stressors during sensitive periods of childhood exhibit accelerated reproductive trajectories, such as early puberty, early sexual debut, and early childbearing (Belsky et al. 1991; Browning et al. 2004; Foster et al. 2008; Graber et al. 1995; Wu and Martinson 1993).
One specific childhood stressor that has received considerable attention is father absence. Girls who grow up in households from which their biological father is absent are consistently observed to experience menarche at younger ages compared with peers with present fathers (Alvergne et al. 2008; Bogaert 2008; Culpin et al. 2014; Hoier 2003; Quinlan 2003). Father absence is also associated with subsequent reproductive timing: girls with absent fathers have younger age of first sex, pregnancy, and birth (Anderson 2015; Ellis et al. 2003; Kiernan and Hobcraft 1997; Mendle et al. 2009; Moffitt et al. 1992).
Research in anthropology and human development hypothesizes that associations between early-life stressors, including father absence, and accelerated reproductive development reflect a causal process (Belsky et al. 1991; Chisholm et al. 1993; Draper and Harpending 1982; Ellis 2004; Ellis and Garber 2000; Ellis et al. 1999). In this causal process, father absence—either specifically or as one of several features of early adversity—signals to the developing child that resources may be scarce or unpredictable. This signal, in turn, elicits a biological response in the form of increased allocation of current resources toward achieving reproductive maturity as soon as possible (Belsky et al. 1991; Chisholm et al. 1993; Ellis 2004). Consistent with this hypothesis, father absence and other measures of early adversity are associated with accelerated pubertal development (Foster et al. 2008; Mendle et al. 2009, 2015). We focus here on father absence because it is central to hypotheses of environmental causation and because it is among the best-replicated environmental correlates of accelerated pubertal development in nongenetically sensitive designs (Webster et al. 2014).
Gene-Environment Correlation as an Explanation for Association Between Father Absence and Accelerated Reproductive Development
In contrast to anthropological and human development hypotheses of environmental causation, behavioral genetics researchers have proposed a genetic hypothesis for why girls who grow up in father-absent households tend to experience accelerated reproductive development (Hardy et al. 1998; Rowe 2002). This genetic hypothesis is built on three sets of research findings. First, reproductive timing is influenced by genetic factors (Barban et al. 2016; Burt et al. 2006; Campbell and Udry 1995; Elks et al. 2010; He et al. 2009; Tither and Ellis 2008; Towne et al. 2005). A recent meta-analysis by Polderman et al. (2015) estimated heritability for reproductive traits at 0.32, and a study of age at menarche specifically estimated heritability at 0.49 (Towne et al. 2005). In other words, roughly one-half of the variation in age at menarche has been attributed to genetic variation. Second, girls who mature earlier also experience earlier sexual debut (Moore et al. 2014), union formation (Kiernan 1977), and childbearing (Rowe 2002; Sandler et al. 1984; Udry 2008). Third, unions formed at earlier ages and with early childbearing are more likely to be unstable (Booth and Edwards 1985; Bumpass and Sweet 1972).
The hypothesis derived from this evidence posits that the genetic factors that accelerate pubertal timing place women at risk for early childbearing within unstable unions, which in turn results in their daughters inheriting genetics associated with early puberty and all the consequences thereof, including having an absent father (Hardy et al. 1998; Mendle et al. 2006; Rowe 2000, 2002). Stated differently, the mother’s genotype is associated with accelerated puberty and the resulting potential for early partnership formation and instability, and the daughter may inherit the genetics of early maturation and be exposed to father absence. This hypothesis conceptualizes the association between father absence and daughters’ early puberty as spurious, confounded by a gene-environment correlation (rGE). If this is true, we would expect to observe an association between the genetics of accelerated reproductive development and father absence, and an attenuation in the association between father absence and reproductive timing when controlling for known genetic factors associated with reproductive development.
To date, empirical tests of genetic confounding are limited to family-based genetic studies (Mendle et al. 2006; Ryan 2015; Tither and Ellis 2008). These studies have compared phenotypic similarities between pairs of relatives with different degrees of genetic relatedness, such as monozygotic and dizygotic twins. They have used these comparisons to partition phenotypic variance into genetic and environmental components and to further isolate environmental variance shared between siblings in a family (Plomin et al. 2013). Behavioral genetic studies have found no evidence of shared environmental influence on pubertal timing, generally interpreting this as evidence against environmental causation hypotheses because father absence is an environmental exposure shared by siblings (Mendle et al. 2006; Rowe 2000; Ryan 2015; Tither and Ellis 2008); this evidence is consistent with genetic confounding. The advent of GWAS and the discovery of molecular genetic influences on reproductive development now afford an opportunity to put the genetic hypothesis to a molecular test. For a more complete discussion of the advantages and disadvantages of molecular genetic tests vis-à-vis behavioral genetics than is possible here, see Conley and Fletcher (2017).
Genome-Wide Association Studies of Human Reproductive Development
Advances in genome science are now yielding molecular detail about genetic influences on human traits and behaviors. GWAS are large-scale data-mining expeditions of the human genome. They correlate millions of genetic variants, called single-nucleotide polymorphisms (SNPs), with phenotypes in samples of tens or even hundreds of thousands of individuals. GWAS on age at menarche (Day et al. 2017; Day et al. 2015; Perry et al. 2014) and age at first birth (Barban et al. 2016) provide an opportunity to conduct a molecular test of the rGE hypothesis advanced to explain associations between father absence and timing of puberty. GWAS results can be used to parameterize predictive algorithms, called polygenic scores, which can be applied to genomic data from independent samples (Belsky and Israel 2014; Dudbridge 2013). Using this approach, we conduct the first molecular genetic test of the hypothesis of rGE between genetic influences on reproductive development and father absence. We evaluate genetic confounding of the associations between father absence and accelerated reproductive development, considering both the genetics associated with menarche and age at first birth.
Data and Methods
Data
The National Longitudinal Study of Adolescent to Adult Health (Add Health) is an ongoing, nationally representative longitudinal study of the social, behavioral, and biological linkages in health and developmental trajectories from early adolescence into adulthood. The cohort was drawn from a probability sample of 132 middle and high schools and is representative of U.S. adolescents in grades 7–12 in 1994–1995. Since the start of the project, participants have been interviewed in home at four data collection waves (numbered I–IV), most recently in 2008. Data include measures of family structure, reproductive development, and genome-wide molecular genetic data (Harris 2010; Harris et al. 2013). Given the complications inherent in working with genetic data in diverse samples (Martin et al. 2017; Wojcik et al. 2017), we restrict our primary analytic sample to 2,681 unrelated non-Hispanic white women, and supplementary analyses of sisters to 411 genetically confirmed non-Hispanic white sisters. Add Health genetic data are available on roughly 800 non-Hispanic black and 700 Hispanic females for future research.
Reproductive Timing
Age at menarche was recorded from interviews conducted at Waves I–III. We use data from the interview wave at which a woman first reported having had her first menstruation. Age at menarche is measured in completed years. Age at first sex was recorded from interviews conducted at Waves I–IV. We use data from the interview wave at which a woman first reported having had vaginal intercourse. Age at first sex was recorded in month units at Add Health Waves I and II and in year units at Waves III and IV. We exclude from analysis data from 13 women who reported an age at first sex less than age 10. Age at first birth was recorded from interviews conducted at Waves I–IV. We use data from the interview wave at which a woman first reported having had her first live birth in the pregnancy/birth history portion of the interview. Age at first birth is measured in months. We exclude from analysis data from two women who reported first birth before age 10.
Age at menarche, first sex, and first birth are positively correlated (see Table 1). Women who develop later in adolescence tend to have a later age at sexual debut, followed by a later age at first birth. Age at first sex and age at first birth are most strongly correlated (r = .37).
In Fig. 1, we present Kaplan-Meier estimates for the cumulative proportion of women experiencing each reproductive event by age. The mean age at which women experience a given event is calculated as the area under the Kaplan-Meier survival function. All non-Hispanic white women experience menarche by age 20, with a mean age of 12.2. Nearly all (96 %) experience sexual debut by age 32, with a mean age for those who have sex of 17.2. Finally, 64 % experience a first birth by age 32, and the mean age for those who have a first birth is 26.2.
These estimates are generally consistent with other nationally representative estimates of the timing of reproductive events. Our estimate for mean age at menarche is 0.3 years earlier than estimates for non-Hispanic white girls in the National Health and Nutrition Study (NHANES), which may be due to differences in methodology: cross-sectional design and proxy report (for girls aged 8–11) in NHANES, compared with longitudinal and self-report in Add Health (Anderson and Must 2005). Our estimate for mean age at first sex is consistent with the 2002 national estimate of 17.4 for all reproductive-age women from the National Survey of Family Growth (Chandra et al. 2005). Finally, our estimate for mean age at first birth falls within national estimates for mean age at first birth among non-Hispanic white women, ranging from 25.9 in 2000 to 27.0 in 2014 (Mathews and Hamilton 2016).
Father Absence
We measure father absence as an indicator for whether the child’s biological father was ever nonresident from birth to age 7. We construct this measure drawing from four sources of data: (1) parent’s self-reported union history and current union status; (2) youth’s reports of their current family structure; (3) youth’s reports of the duration of life spent with current household members; and (4) whether and for how long a youth lived with a nonresident biological parent (Gaydosh and Harris forthcoming). Almost one-fifth (18 %) of girls were born into a family structure where the biological father was nonresident (not shown). By age 7, 31 % of girls experience biological father nonresidence for a period of at least one year (see Table 1).
Genotyping
At the Wave IV interview in 2008, saliva and capillary whole blood were collected from respondents. Of the 15,701 individuals interviewed, 15,159 consented to genotyping, and 12,254 agreed to genetic data archiving. After quality-control procedures (Highland et al. 2018; McQueen et al. 2015), genotype data were available for 9,974 individuals. Data were generated for 7,917 individuals from the Illumina HumanOmni1-Quad BeadChip, which includes 1,140,419 markers, and for 2,057 individuals from the Illumina HumanOmni2.5-Quad BeadChip, which includes 2,369,541 markers. We conduct analysis on the 609,130 SNPs genotyped on both chips. Additional information on the selection of the final SNP marker set can be found in the quality control report (Highland et al. 2018).
Polygenic Scores (PGS)
We calculate polygenic scores for Add Health participants based on published GWAS results for age at menarche from the ReproGen Consortium (Day et al. 2017) and for age at first birth from the Social Science Genetic Association Consortium and Sociogenome (Barban et al. 2016). This allows us to control for the known genetic factors associated with two reproductive phenotypes, both of which are associated with father absence. The GWAS were conducted on independent discovery samples to estimate associations between SNPs and the phenotypes of interest (age at menarche and first birth). SNPs are a common type of genetic variation where there is a difference in a single base pair in the DNA sequence; for each SNP, individuals may have no, one, or two copies of a reference allele (Bush et al. 2012).
Scores are calculated following the method described by Dudbridge (2013). Briefly, SNPs in the genotyped sample are matched to published GWAS results. For each of these SNPs (N = 577,997 menarche, N = 527,258 age at first birth), a loading is calculated as the number of phenotype-associated alleles (0, 1, or 2) multiplied by the effect size estimated in the original GWAS. Loadings are then summed across the SNP set to calculate the polygenic score.
The use of polygenic scores in population research is complicated by population stratification, or the nonrandom patterning of alleles across global populations (Cardon and Palmer 2003). Population stratification is a potential confounder in genetic association studies (Hamer and Sirota 2000) and, by extension, polygenic score analysis (Belsky and Israel 2014; Martin et al. 2017). GWAS and polygenic scoring rely on a subset of genetic markers to act as proxies for unmeasured genetic variation. This approach is effective because much of the genome is in linkage disequilibrium: that is, certain groups of genotypes tend to be inherited together and are thus correlated. Measuring one in a set of such genotypes can effectively provide information about the multiple unmeasured genotypes nearby. However, because patterns of linkage disequilibrium reflect genetic inheritance, they vary across populations of different ancestry. A result of ancestry-associated differences in patterns of linkage disequilibrium is that a given genotype may contain different information about the genome when measured in one population versus another (Carlson et al. 2013; Shifman et al. 2003; Wojcik et al. 2017). In the context of polygenic scoring, applying GWAS results derived in one population to compute a polygenic score in a different population introduces measurement error, which is one reason why polygenic scores derived from GWAS of European ancestry populations have reduced effect sizes in African ancestry populations (Belsky et al. 2013; Domingue et al. 2014, 2015).
Importantly, reduction in the predictive accuracy of polygenic scores may also reflect confounding by environmental factors that are differentially distributed across populations. Large differences in environmental factors, such as neighborhood social disorganization or exposure to discrimination, might suppress the influence of genetic factors in minority populations compared with non-Hispanic and white populations (Boardman et al. 2012, 2017). We believe that these are critical questions; however, given differences in allele frequency and linkage disequilibrium patterns across groups, the purpose of this study—testing rGE—is best served by restricting the analytic sample to increase the comparability to the population studied in the original GWAS. We therefore restrict our analysis to a genetically identified sample of unrelated women. Inclusion in this sample is based on the respondent’s first two genome-wide principal components falling within a given distance of the centroid of the first two principal components computed on the self-identified non-Hispanic white respondents (for greater detail, see the online supplement for Domingue et al. 2018).
Sample restrictions based on self-reported race/ethnicity may not completely protect against population stratification–related confounding (Campbell et al. 2005). To address residual population stratification within self-reported non-Hispanic whites in Add Health, we follow established practice and adjust analyses for principal components estimated from the genome-wide SNP data (Price et al. 2010). Principal component analysis is a statistical technique used to capture variation in a large set of variables using a reduced set of factors. We estimate principal components among non-Hispanic white respondents using the genome-wide SNP data according to the method described by Price et al. (2006), using the PLINK command pca. We then residualize polygenic scores for the first 10 principal components: that is, we regress polygenic scores on the 10 principal-component scores and compute residual values from the predictions (Conley et al. 2016a). Residualized polygenic scores are standardized (mean = 0, standard deviation = 1) for analysis.
We scale the polygenic score for age at menarche so that higher values correspond to genetic prediction of earlier age at menarche. We hereafter refer to this measure as the accelerated menarche polygenic score. Similarly, we scale the polygenic score for age at first birth so that higher values correspond to genetic prediction of earlier age at first birth. We hereafter refer to this measure as the earlier first-birth polygenic score. Accelerated menarche and earlier first-birth polygenic scores are computed for n = 2,681 unrelated non-Hispanic white women.
Methods
We use survival methods to analyze timing of reproductive events. We employ the Kaplan-Meier method of estimating the survival function to describe the cumulative proportion of females experiencing an event by a given age. This method accommodates censoring, in which a woman has not yet experienced an event by the end of follow-up, and provides estimates of mean age at event as the area under the curve. We use the Mantel-Haenszel log-rank test to compare survival curves for different risk groups as a way to test for the association of covariates.
We use a nonparametric proportional hazard model approach to test associations of genetic and environmental risks with accelerated reproductive development. This approach yields coefficients with a relative hazard interpretation, similar to the more familiar Cox model. The nonparametric proportional hazard approach can accommodate ties in event times, which are common in discretely measured data, such as age at menarche or age at first birth.
Results
Father Absence and Reproductive Timing
Table 2 presents hazard ratios for the relationship between father absence and reproductive development. Childhood experience of father absence—from birth to age 7—is significantly associated with earlier timing of all three reproductive events. Net of age, the risk of menarche is 24 % higher, on average, for girls who experienced father absence in childhood compared with peers who lived continuously with their fathers from birth to age 7 (Model 1). Exposure to father absence by age 7 is also associated with earlier age at first sex and first birth, with larger effect sizes (hazard ratio (HR) = 1.49 and 1.52, respectively).
We illustrate the magnitude of the associations between father absence and reproductive timing in Fig. 2. Across all reproductive events, father absence is associated with a significant acceleration in timing. Girls with absent fathers experience menarche two months earlier than girls with present fathers (at ages 12.1 and 12.3, respectively). Similarly, girls with absent fathers experience sexual debut almost a full year earlier compared with those with present fathers (at ages 16.2 and 17.6, respectively). Finally, girls with absent fathers go on to have their first birth nearly two years earlier, on average, compared with those with present fathers (at ages 25.0 and 26.8, respectively).
Accelerated Menarche Polygenic Score
Girls with larger accelerated menarche polygenic scores experience menarche at younger ages, on average, compared with peers with lower polygenic scores (HR = 1.33; see Table 2, Model 2). We find no association between the accelerated menarche polygenic score and age at first sex or age at first birth.
Figure 3 presents the cumulative proportion of women experiencing each reproductive event by age separately for high and low polygenic score groups. High score (solid) is defined as greater than or equal to 1 standard deviation above the mean, and low score (hollow) is defined as less than or equal to 1 standard deviation below the mean. Girls in the high polygenic score group experience menarche almost a full year earlier, on average, compared with those in the low polygenic score group (at ages 11.7 and 12.8, respectively).
Testing Gene-Environment Correlation
We test for gene-environment correlation in the relationship between father absence and menarcheal timing by computing correlations between girls’ accelerated menarche polygenic scores and exposure to father absence. The accelerated menarche polygenic score is not associated with father absence; we find no correlation between the accelerated menarche polygenic score and father absence (r = .0001; see Table 1). In Table 3, we compare the mean standardized polygenic scores by father absence, finding that girls with absent fathers have accelerated menarche polygenic scores similar to those of girls with present fathers.
We formally test for confounding by rGE in a hazard model regressing father absence on reproductive timing, controlling for the accelerated menarche polygenic score (see Table 2, Model 3). Father absence remains significantly associated with age at menarche, first sex, and first birth after we control for known genetic influences on menarcheal timing. For age at menarche in particular, father absence and genetic risk are independent and additive predictors. We find no evidence that the genetics of menarche confound the relationship between father absence and pubertal timing.
Earlier First-Birth Polygenic Score
We conduct the same analyses as outlined earlier using the polygenic score for age at first birth. On average, women with higher earlier first-birth polygenic scores experience accelerated timing of all three reproductive events (see Table 4, Model 1). Similar to results for the accelerated menarche polygenic score, the effect size for the standardized earlier first-birth polygenic score is largest for the phenotype matching the original GWAS (for age at menarche, HR = 1.07; for age at first sex, HR = 1.20; for age at first birth, HR = 1.28).
Figure 4 presents the cumulative proportion of women experiencing each reproductive event by age separately for high and low earlier first-birth polygenic score groups. High score (solid) is defined as greater than or equal to 1 standard deviation above the mean, and low score (hollow) is defined as less than or equal to 1 standard deviation below the mean. Girls in the high earlier first-birth polygenic score group experience menarche approximately two months earlier than their peers in the low polygenic score group (at ages 12.1 and 12.3, respectively). The gap in reproductive timing widens for subsequent events: girls in the high earlier first-birth polygenic score group experience first sex more than two years earlier (at age 16.1 vs. 18.3) and first birth almost three years earlier (at age 24.8 vs. 27.7) compared with girls in the low earlier first-birth polygenic score group.
Finally, we test the rGE hypothesis using the earlier first-birth polygenic score. The earlier first-birth polygenic score and father absence are correlated at r = .08 (Table 1); women who experience father absence have significantly higher earlier first-birth polygenic scores (Table 3). Some evidence suggests that the distribution of earlier first-birth polygenic scores varies by father absence, as shown in the right panel of Fig. 5. However, we test whether the association between father absence and reproductive timing is confounded by the earlier first-birth polygenic score (Table 4, Model 3). When the earlier first-birth polygenic score is added to the hazard models, effect sizes for associations between father absence and reproductive timing phenotypes are slightly reduced (by 4 % for age at menarche and first sex, and by 8 % for age at first birth), but they remain statistically significant.
Robustness
Ultimately, the only way to exclude confounding due to population stratification is to analyze genetic differences between relatives who share the same ancestry (Conley et al. 2015). We conduct our primary analysis using a subsample of unrelated individuals. Add Health includes a sibling sample, and we use this sample to repeat polygenic score analyses within a design that excludes potential confounding by population stratification. We test genetic associations using sister fixed-effects models, restricting again to individuals who identify as non-Hispanic white (n = 409). We report results in the appendix (Table 5). Among sisters, the carrier of the higher accelerated menarche polygenic score experienced marginally earlier menarche compared with her lower-scored sister. The effect size is approximately the same as found in the analysis of unrelated individuals, arguing strongly against confounding by population stratification. In contrast, the effect size for the earlier first-birth polygenic score is substantially reduced in the sibling fixed-effects analysis of age at first birth and is no longer statistically significant. This reduction in effect size could signal some confounding by population stratification. Alternatively, it could imply other processes causing sisters to have correlated reproductive development outcomes. However, the sample size for sisters in the analysis of first birth is greatly reduced given that many women have not experienced a first birth by the time of last interview. Low discordance on father absence prohibits an analysis of father absence in the sister subsample.
Results are robust to different measures of family environment. When we restrict the measure of father absence to nonresidence at the time of birth, results are almost identical. Stepfather coresidence is similarly predictive of earlier reproductive timing and is not confounded by genetic risk (see Table 6 in the appendix).
Limitations
This study has limitations. Ages at reproductive events are measured in years, which is imprecise, especially for timing of menarche. Nevertheless, the polygenic score for accelerated menarche and age at menarche are correlated, indicating that the differences observed are substantive and not the result of measurement error. Data on age at first birth are right-censored, with follow-up ending between ages 24 and 32 (Add Health Wave IV). Although the modeling strategy accommodates this type of data, future analyses should consider women with completed reproduction. Father coresidence is reported retrospectively. However, a strength of our measure is that it draws information from both primary caregiver and child reports, and it has been validated elsewhere (Gaydosh and Harris forthcoming). Polygenic scores based on current GWAS results are incomplete measures of genetic influences on reproductive development. The GWAS that we use to develop polygenic scores included hundreds of thousands of individuals. Nevertheless, the majority of genetic influence on reproductive development inferred based on family studies remains unexplained. As GWAS sample sizes increase, polygenic score performance is expected to improve (Okbay et al. 2016), which may change our ability to detect genetic confounding of associations between father absence and reproductive timing. Future GWAS of additional phenotypes, such as family structure, may also enable further testing of possible genetic correlation (Barbaro et al. 2017).
Residual population stratification may confound our analysis of rGE. Our sibling comparison analysis rules out confounding by population stratification in the case of genetic associations with reproductive phenotypes. However, sibling comparison analysis is not feasible for tests of genetic association with father absence. We include statistical adjustment for principal components estimated from the genetic data to address this limitation, the standard in the field (Price et al. 2006). Nevertheless, studies with larger samples of siblings could strengthen confidence that our null findings for rGE were not confounded by residual population stratification.
Add Health represents a single cohort in the United States. Genetic influences on human traits and behaviors change over time and across space (Conley et al. 2016b; Demerath et al. 2013; Liu and Guo 2015; Tropf et al. 2017; Walter et al. 2016). Replication of our findings in samples from different birth cohorts and from different countries will clarify the extent to which findings from recent U.S. birth cohorts generalize.
Finally, our findings are restricted to non-Hispanic white women. As we discuss earlier, this is motivated by the limitations of the current GWAS results, which are conducted in European ancestry populations. This is a pressing limitation to current empirical investigations of genetic and social factors influencing biological and behavioral outcomes, and GWAS in non-European ancestry populations should be a priority. It is our hope that new data, methods, or integrative frameworks become available in the near future that better enable researchers to use genome-wide data from all individuals.
Discussion
We find no evidence of genetic confounding in the relationship between father absence and accelerated menarcheal timing. Women with higher accelerated menarche polygenic scores experience menarche and first sex earlier. Family structure risks for early puberty and sexual debut are independent of accelerated menarche polygenic scores. The rGE hypothesis explaining the link between father absence and earlier puberty is not supported in our sample. Father absence and GWAS–discovered genetic risk for pubertal timing are independent and additive predictors of adolescent development. If common genetic factors do link family environments and pubertal timing, they have not yet been uncovered in GWAS of menarche.
The findings for age at first birth are similar although less conclusive. Some evidence shows that girls with absent fathers have, on average, greater GWAS–discovered genetic risk for younger age at first birth compared with girls with present fathers. Although the magnitude of the associations between father absence and reproductive timing declines after we control for the earlier first-birth polygenic score, the father absence association remains significant. Despite possible genetic confounding, known genetic factors associated with age at first birth cannot fully account for the acceleration in reproductive timing associated with father absence. Examining individual reproductive timing and subsequent partnership formation and stability is a fruitful area for future research on this topic.
Our findings do not support the use of molecular genetic testing in order to make specific predictions about outcomes for individuals. However, molecular genetic data can provide useful information about population parameters, contributing to our understanding of variation in reproductive timing. Polygenic scores for reproductive timing capture otherwise unobserved heterogeneity, allowing for more precise estimation of environmental effects, as well as investigations of interactions between genes and environments. We provide here an example of an approach proposed by Manski (2011) and others (Belsky and Israel 2014; Benjamin et al. 2012) that applies molecular genetic discoveries to test effects of environmental variables. Future studies may use the polygenic scores that we studied as a measure of genetic liability to early puberty.
The lack of evidence for rGE between age at menarche and father absence suggests that something else may underlie the consistent observation that girls who live apart from their fathers mature earlier than those who live together. Our findings are consistent with existing anthropological and developmental theories positing that early exposure to adversity accelerates physical development, perhaps through environmental signaling of resource scarcity, and behavioral or physiological responses to stress (Belsky et al. 2007; Kyweluk et al. 2018; McEwen 2012). Future research should investigate how father-absent family environments may influence the biological antecedents to puberty. Such research may identify targets for social interventions to modify the reproductive trajectories on which family environments set developing girls. The modest rGE between age at first birth and father absence suggests new directions for social science research. An ever-expanding array of genetically informed social science data resources afford new opportunities to investigate how genetics associated with early reproduction become correlated with family structure across generations and what social environmental factors may modify this relationship.
Acknowledgments
This research benefitted from GWAS results made publicly available by the ReproGen Consoritum, Sociogenome, and the Social Science Genetic Association Consortium. This research uses Add Health GWAS data funded by Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) Grants R01 HD073342 to Kathleen Mullan Harris and R01 HD060726 to Kathleen Mullan Harris, Jason D. Boardman, and Matthew B. McQueen. This research uses data from Add Health, a program project directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at University of North Carolina at Chapel Hill, and funded by Grant P01-HD31921 from the NICHD, with cooperative funding from 23 other federal agencies and foundations. This research was supported in part by NICHD P2C-HD050924. Lauren Gaydosh was supported by NICHD F32 HD084117. Daniel W. Belsky is an Early Career Fellow of the Jacobs Foundation and is supported by NIA Grants R01 AG032282 and P30 AG028716.