Sibling Similarity in Family Formation

Sibling studies have been widely used to analyze the impact of family background on socioeconomic and, to a lesser extent, demographic outcomes. We contribute to this literature with a novel research design that combines sibling comparisons and sequence analysis to analyze longitudinal family-formation trajectories of siblings and unrelated persons. This allows us to scrutinize in a more rigorous way whether sibling similarity exists in family-formation trajectories and whether siblings’ shared background characteristics, such as parental education and early childhood family structure, can account for similarity in family formation. We use Finnish register data from 1987 through 2007 to construct longitudinal family-formation trajectories in young adulthood for siblings and unrelated dyads (N = 14,257 dyads). Findings show that family formation is moderately but significantly more similar for siblings than for unrelated dyads, also after controlling for crucial parental background characteristics. Shared parental background characteristics add surprisingly little to account for sibling similarity in family formation. Instead, gender and the respondents’ own education are more decisive forces in the stratification of family formation. Yet, family internal dynamics seem to reinforce this stratification such that siblings have a higher probability to experience similar family-formation patterns. In particular, patterns that correspond with economic disadvantage are concentrated within families. This is in line with a growing body of research highlighting the importance of family structure in the reproduction of social inequality.


Introduction
Family-of-origin effects on family behavior are at the center of a controversial debate. On the one hand, growing evidence on profound social change in family formation with the proliferation of cohabitation, lone parenthood, and diverse family forms raises the question as to whether the family of origin still matters for peoples' family behavior (Elzinga and Liefbroer 2007;Fussell and Furstenberg 2005;Popenoe 1993;Shanahan 2000). On the other hand, several studies have compellingly demonstrated continuing importance of the family of origin for peoples' family behavior (Liefbroer and Elzinga 2012;Murphy 2013). The broader relevance of the topic is highlighted by another line of research emphasizing the role of family structure in the reproduction of social inequality and the perpetuation of family disruption across generations (Carlson and England 2011;McLanahan and Percheski 2008).
Sibling studies have been the method of choice to study family-of-origin effects on socioeconomic outcomes, such as education and occupational status (e.g., Conley 2008). To a lesser extent, sibling studies have also examined family-of-origin effects on demographic outcomes, such as family behavior. To date, sibling studies on family formation largely focus on isolated fertility transitions (Kuziemko 2006;Lyngstad and Prskawetz 2010). Yet, these fertility transitions are embedded in family-formation trajectories that typically lead through cohabitation, marriage, having one or more children, and then possibly separation and repartnering in between.
This article combines the sibling approach with sequence analysis to scrutinize the impact of shared parental background characteristics on sibling similarity in holistic family formation trajectories. We address three research questions. First, are family formation trajectories more similar among siblings than among comparable unrelated persons? Second, can shared parental background characteristics account for sibling similarity in family formation? Third, in what way is siblings' family formation more similar: that is, are siblings more likely than unrelated persons to experience specific family-formation patterns?
Drawing on Finnish register data, our empirical analysis employs (1) conditional assignment to build an analysis sample of sibling dyads and comparable unrelated dyads, (2) sequence analysis to measure similarity in family formation within these dyads, (3) dyadic regression analysis to assess the impact of shared parental background characteristics, and (4) cluster analysis to examine the way in which siblings' family formation is similar. To our knowledge, this is the first study that combines a sibling design with sequence analysis to study family-of-origin effects on family formation. This analytical strategy aims to contribute to the literature in three regards.
First, this is the first study to establish the simple descriptive fact that siblings are more similar to one another also in their holistic family-formation trajectories and not only in isolated fertility transitions. This is possible through the combination of a sibling design with sequence analysis. We thereby recognize the importance of linked lives within families (Elder et al. 2003) as well as the interdependence of multiple family-formation events across the life course and the diversity of family-formation processes (Wu and Li 2005).
Second, we compare sibling dyads with a control group of unrelated dyads that are identical with regard to important parental background characteristics. Siblings naturally share parental background characteristics, such as parental education and early childhood family structure, that unrelated persons do not generally share. This compositional effect alone might generate sibling similarity in family formation. Our sibling design enables us to scrutinize to what extent these shared background characteristics account for sibling similarity in family formation.
Third, by employing sequence analysis and cluster analysis, we can examine qualitative patterns of family formation and the nature of siblings' similarity in family formation. Beyond directly testing the effect of parental background characteristics on sibling similarity, we thereby show in which way siblings' family formation is more similar: that is, which substantive family-formation patterns are more likely to "run in the family."

Background and Theory
Family-of-origin effects on family behavior fall into two broad categories: shared parental background effects and mutual sibling influence. Sibling similarities in family formation can result from either or both of them.
Research on parental background effects focuses on intergenerational transmission-the degree to which parents and their children are similar in the occurrence and timing of focal family events: fertility (Barber 2001;Murphy 2013;Murphy and Knudsen 2002), marriage (Feng et al. 1999;Jennings et al. 2012;van Poppel et al. 2008), or divorce (Amato 1996;Amato and DeBoer 2001;Diekmann and Engelhardt 1999;Wolfinger 2000). Most of these studies have found relatively modest but significant similarity between parents' and their children's family behavior. Two recent studies compared holistic family-formation trajectories of parents and their children directly in a dyadic approach (Fasang and Raab 2014;Liefbroer and Elzinga 2012). Liefbroer and Elzinga (2012) concluded that in spite of profound social change in family formation over the past decades, there is persistent continuity in parents' and their children's family formation. Fasang and Raab (2014) showed that in addition to strong intergenerational transmission (i.e., similarity), other systematic patterns exist between parental and filial family formation: a pattern of delayed transmission as well as a pattern of intergenerational contrast in which children show systematically different family behavior from their parents.
Research on sibling similarity in family behavior has concentrated on particular aspects of fertility. Findings include that older siblings, particularly brothers, affect younger siblings' sexual initiation (Haurin and Mott 1990;Widmer 1997); sisters influence one another in the transition to first birth but not notably in second-parity transitions (Lyngstad and Prskawetz 2010); and the number of siblings affects own family size preferences (Axinn et al. 1994) as well as own fertility (Murphy and Knudsen 2002). Further, girls' teenage pregnancy and premarital birth increase their sister's risk of experiencing these events as well (East 1998;East and Jacobson 2001;Powers 2001;Powers and Hsueh 1997). In these studies, it is generally unclear both whether the sibling effect extends to the entire family-formation process or only to specific transitions, and how important shared parental background is relative to mutual sibling influence.
This article aims to extend prior research by examining whether there is sibling similarity in holistic family-formation trajectories and what could account for it if it exists. Specifically, we ask whether sibling resemblance in family formation is mainly brought about by shared background characteristics. We thereby offer a detailed analysis of one of the two broad categories of family-of-origin effects-parental background and mutual sibling influence-that potentially generate sibling similarity in family formation. In particular, we examine parental background in terms of parental marital history as an indicator of children's early childhood family structure and parental education as an indicator for socioeconomic background. Although these two indicators certainly cannot capture the full "package" of shared parental background effects, they are two central factors that have been shown to correlate considerably with other observable und unobservable parental background characteristics, such as parenting styles (Chan and Koo 2010;Lareau 2003). Subsequently, we discuss empirical findings and theoretical considerations on these two parental background characteristics.

Parental Marital History and Early Childhood Family Structure
Research on the intergenerational transmission of family behavior persuasively shows that the structure of the family of origin is associated with family formation in adulthood. Children from disrupted families have a higher risk of divorce than children from intact families (Amato 1996;Diekmann and Engelhardt 1999). Moreover, children of divorce have lower overall rates of marriage (Erola et al. 2012;South 2001), although they tend to have an elevated risk of teenage marriage (Wolfinger 2003). In addition, children who experience several marital transitions or alternative living arrangements during childhood have a higher "risk of forming a first cohabitational union" (Teachman 2003:520). More generally, instable living arrangements during childhood promote instable family trajectories in adulthood (Carlson and England 2011).
Family structure is closely linked to economic inequalities. Children from disrupted families often grow up in single-parent households and are more likely exposed to economic hardship (Amato 1996;McLanahan and Percheski 2008), which negatively affects children's status attainment (McLanahan and Sandefur 1994;Sigle-Ruston and McLanahan 2004). Low socioeconomic status (SES) in turn increases marital instability (Amato 1996;Wolfinger 2005) because it is associated with a number of relationship stressors, including higher incidences of poverty, substance abuse, and domestic violence. Moreover, parents' own family experiences shape parental preferences for their children's family trajectories. Several studies have reported that divorced parents and their children are more tolerant toward nontraditional family forms (Axinn and Thornton 1996;Cunningham and Thornton 2006). However, even though value transmission in families of separation is weaker than in intact families, parental preferences still affect their children's attitudes (van der Valk et al. 2008).
Siblings might be more similar in their family formation than unrelated persons simply because they grow up in similar family structures and share their parents' marital history to a greater extent. Of course, there will be considerable variation in how two siblings experience events such as a parental divorce. Siblings will generally experience divorce at different developmental stages and will have different coping resources available to them. However, compared with random unrelated persons, siblings are likely to share more of their parental marital history and early childhood family structure. If these factors are influential in generating sibling similarity in family formation, unrelated persons that share similar parental marital history and early childhood family structure experiences should also be more similar in their family formation. Sibling similarity in family formation compared with unrelated persons should then be smaller after parental marital history and family structure of the family of origin are factored out.

Parental Education
Parental education can influence their children's family formation directly and indirectly through the transmission of parental education to children's education. Socialization and social control theories (Barber 2000;Bernardi 2003;Starrels and Holm 2000) have emphasized a direct effect of parental education. Following this rationale, highly educated parents will increase the similarity of their offspring's family trajectories in two regards. First, highly educated parents have higher occupational aspirations for their children and are more likely to favor delayed family formation (Barber 2001;Plotnick 2007;Trent and South 1992). Through socialization, parents shape their children's family plans in accordance with these preferences. Although the socialization effect is partly mediated by status transmission, several studies have shown an effect of parental education even after children's education has been taken into account. For instance, evidence from Norway showed a direct influence of parents' education on children's first union formation (Blom 1994;Wiik 2008). Second, parents with higher status are in a better position to exert social control. They likely have more educational and financial resources to influence their children to behave according to their preferences than do lower-educated parents (Axinn and Thornton 1992;Barber 2001).
Indirect effects of parental education through intergenerational status transmission are rather obvious given the tight link between educational trajectories and family formation in developed societies. Parental education will then matter to the extent that it predicts children's own education. The prolongation of education during the last decades is seen as a major driver of postponement in the transition to adulthood (Billari and Liefbroer 2010;Fussell and Furstenberg 2005). Transitions such as marrying and childbearing usually take place after a person leaves the educational system and reaches some degree of financial independence. Although higher levels of education are often linked to individualization, educational institutions also standardize early life courses to be more uniform (Brückner and Mayer 2005;Mayer and Schoepflin 1989;Shanahan 2000). The standardizing effect of education continues into later adulthood for highly educated more than for lower-educated persons, whose lives unfold outside the grip of educational institutions at much younger ages.
Beyond structural effects, education is also related to children's own attitudes and preferences toward family formation. High education is associated with postmaterial values of self-realization and with liberal attitudes toward family formation (Inglehart and Baker 2000;Lesthaeghe and Neidert 2006;Trent and South 1992). Adolescents in higher educational tracks expect and desire marriage and parenthood at older ages and see themselves at a lower risk of childbearing out of wedlock (Plotnick 2007).
Attitudinal differences by education are also important in structuring gender differences in family formation. To explain educational differences in teenage childbearing and lone motherhood, McLanahan and Percheski (2008) proposed that motherhood plays a different role as a source of identity for higher-and lower-educated women. Lower-educated women expect motherhood as psychologically more rewarding and view it as a more essential role in their life course. In contrast, highly educated women "have other possible sources of identity from which they can derive meaning and fulfillment" (McLanahan and Percheski 2008:262). In addition, the opportunity costs of parenthood are higher for highly educated women. Consequently, they more often postpone the transition to parenthood or remain childless than their lower-educated counterparts.
For men, on the other hand, both high and low educational attainment likely delays family formation compared with medium education, albeit for quite different reasons. Highly educated men delay family formation until they have completed extensive education and are situated in an economically secure job, similar to highly educated women. In contrast, in view of the average increase in women's education, lowereducated men with few resources appear as less-attractive marriage partners. Struggling to find a partner, they postpone marriage and are more likely to live with their parents until later ages (Carlson and England 2011;McLanahan and Percheski 2008).
Both indirect and direct effects of parental education suggest that siblings will be more similar in their family formation simply because they share parental education and are more likely to have similar educational trajectories themselves. If indirect and direct effects of parental education are important mechanisms generating sibling similarity in family formation, unrelated persons who share the same parental educational background will also be similar to one another in their family formation. Part of the sibling similarity in family formation should then vanish after parental education and own education of siblings and unrelated dyads are taken into account. Subsequently, we empirically test to what extent two crucial parental background characteristics-parental marital history and parental education-account for sibling similarity in family formation.

Finnish Census Panel
The empirical analysis uses the Finnish Census Panel, a register-based database that consists of a random sample of 1 % of the population in 1970 (Österbacka 2004;Statistics Finland 1996). All subsequent family members are included in the sample. We use yearly panel data that were collected between 1987 and 2007 for the cohorts born 1969 to 1977 for whom we can construct family formation sequences from age 18 to age 30. Data requirements for the analysis are high: we need to follow individuals over several years to reconstruct their family-formation process, and we need to establish intergenerational links to parents and between siblings. Only people who have at least one sibling are included, which allows us to separate the impact of having any siblings from the specific impact of siblings, thus ensuring greater rigor in the sibling design. People are identified as siblings if they have the same mother. Parents are identified through living in the same household as the newborn child. However, until 1987, data were collected only every five years. Accordingly, the children were aged 0-4 years when the information was available for the first time. The same applies to parental education, which we define as the educational level of the highest-educated parent in the household. In most cases (71.1 %), both parents have the same level of education. For 16.8 %, father's education was higher; for 10.6 %, mother's education was higher. For a remaining 1.5 %, data were available only on mother's level of education. We use the earliest available education information for two reasons. First, this strategy ensures consistency in the measurement of parental education because it usually refers to the biological parents. Second, compared with other developmental stages, socioeconomic conditions during early childhood have been shown to matter most for predicting offspring outcomes in adulthood (Duncan and Brooks-Gunn 2000;Heckman 2006).
For assessing the effect of parental marital history, we draw on children's family experiences up to age 18. Because siblings are defined by having the same mother, we refer to only the marriage biographies of mothers. We do not use age-specific effects of childhood family structure because they were not supported by recent studies examining the effects of childhood living arrangements on the transition to adulthood (Musick and Meier 2010;Teachman 2003). Moreover, age-specific effects would prohibit creating a robust sample of sibling dyads and unrelated dyads with fixed family background by introducing too much complexity to the matching procedure described later herein.
Our starting sample comprises 11,803 individuals. We exclude persons with missing data or inconsistencies in the family trajectories. After this selection, we retain only those families with complete data on all siblings. These restrictions remove 25 % of the starting sample and reduce the sample size to 9,266 individuals. From these data, we extract 4,994 unique sibling dyads and generate 9,263 unrelated dyads in a procedure detailed in the section on the construction of the analysis sample. This sums to a total of 14,257 dyads for which we can construct familyformation trajectories from age 18 to age 30.
The timing and sequencing of family formation in the "third decade of life" (Fussell and Furstenberg 2005) is of particular interest as a "demographically dense phase" with multiple important life course transitions (Rindfuss 1991). In countries like Finland that are considered forerunners of the second demographic transition (Surkyn and Lesthaeghe 2004;van de Kaa 1987), this is particularly true for two markers of the transition to adulthood: leaving the parental home and entry into a first union. Compared with other developed countries, children start independent living early in Finland (Billari and Liefbroer 2010). By age 20, more than one-half of the children have left the parental home. Similarly, Finnish young adults experience a relatively early transition to a first union. The median age of entering a first union is approximately 22 for women and 24 for men born between 1969 and 1981 (Jalovaara 2012). Less than 10 % of all first unions are formal marriages (Finnäs 1995). After 10 years, 88 % of cohabiting couples have separated or married (Jalovaara 2013). By the age of 30, almost one-half of the population has children, two-thirds are living in a coresidential union or are married, and about 20 % live as singles (Statistics Finland 2013).
Moreover, Finland is one of the forerunners in female labor force participation: 61 % in 1970 and 70 % as early as 1980 (OECD 2013). This participation rate lowers women's threshold to leave a partner because they are less bound by economic dependence (Jalovaara 2013). In sum, these stylized facts show that family formation sequences until age 30 in Finland provide plenty of variation to meaningfully examine differences between siblings and unrelated persons.
Moreover, the Finnish case can be considered a conservative test of the effect of shared parental background characteristics. Finland, along with other Scandinavian countries, represents a fairly egalitarian social democratic welfare state regime. In this context, family-of-origin effects on offspring's family formation can be assumed to be relatively small compared with liberal and conservative welfare regimes as far as they are a by-product of status transmission. Stratification research has shown that the relationship of family background and SES is comparatively low in Finland (Österbacka 2001;Solon 2002). In terms of educational and class mobility, the family-of-origin effect has even become somewhat weaker during the last decades (Erola 2009;Kivinen et al. 2007). However, the modest intergenerational mobility effect does not imply that social status is irrelevant in Scandinavian countries. Evidence from Norway, for example, has shown that parental education directly affects the entry into a first union (Wiik 2008). In addition, recent studies analyzing Finnish data demonstrated an educational gradient in family formation: persons with low education are more likely to experience early parenthood (Jalovaara and Miettinen 2013) and have a higher risk of partnership dissolution (Jalovaara 2013).

Analytical Strategy
To address our research questions, we constructed two analysis samples. We employed sequence analysis to quantify the difference between two family-formation sequences in a dyad (Research Question (RQ) 1), used dyadic regression analysis to identify which factors account for this difference (RQ 2), and applied cluster analysis to explore substantive family-formation patterns and sibling similarity within them (RQ 3). In the following section, we explain in detail how we created the analysis samples and introduce the methods that we used.

Analysis Sample
As illustrated in the upper panel of Fig. 1, siblings naturally share the same parental background characteristics, whereas this is generally not the case for unrelated dyads. As a result, there is heterogeneity in parental background characteristics for most unrelated dyads but not for sibling dyads. This has two implications. First, it obscures the comparison of siblings and unrelated dyads, putting us at risk of overestimating the difference between them. Second, it would lead to ambiguous reference categories in the dyadic regression for parental background characteristics: more combinations of parental background characteristics are possible for unrelated dyads than for sibling dyads. For example, siblings always share the same level of parental education, whereas in unrelated dyads, one person might have highly educated parents while the other person does not. We therefore construct two analysis samples. First, we randomly assign each sibling an unrelated person to generate dyads of unrelated persons (upper panel of Fig. 1). Second, to equalize the variation of parental background characteristics between sibling dyads and unrelated dyads, we perform a conditional assignment to generate unrelated dyads by assigning two individuals to one another conditional on sharing the same parental background characteristics (lower panel of Fig. 1).
The analysis samples are constructed as follows. To generate sibling dyads, each of the individual focal children in our sample is matched with a sibling. For two-child families, there naturally is only one possible sibling match. In families with more than two siblings (10.5 % of all families), we randomly choose a sibling. To ensure that the sample contains only unique sibling dyads, we exclude doublets (i.e., a match of the same two siblings, once treating sibling 1 as the first dyad member and once treating sibling 2 as the first dyad member). In the paragraph that follows, we describe the construction of analysis sample 1 and 2 using these sibling dyads. Note that the case numbers between analysis sample 1 and 2 marginally deviate because of the exclusion of a few more doublet sibling dyads that are generated during random assignment but not in conditional assignment. For the first analysis sample, we randomly assign unrelated persons to form 9,261 unrelated dyads and can compare them with 4,981 sibling dyads. To generate the second conditionally assigned analysis sample of unrelated dyads, each focal child is matched with an unrelated child conditional on sharing the same combination of three parental background characteristics: parental education (low, medium, and high); 1 mother's age at first marriage (lowest 25 %, middle 50 %, and top 25 % of age distribution); and a dichotomous variable of whether the parents divorced. This yields 18 (3 ×3 ×2) possible combinations. Based on these conditions, we are able to generate 9,263 unrelated dyads: that is, there is a suitable unique unrelated match for virtually each individual in the sample. They can be compared with 4,994 sibling dyads. This conditional assignment simply equalizes the possible variance in observed parental background characteristics for sibling dyads and unrelated dyads. We later also include these parental background characteristics in the dyadic regression model to directly estimate their effect on similarity in family formation.

Methods
Sequence analysis is used to measure similarity between family-formation sequences. We define 10 family-formation states combining residential situation, relationship status, and the number of children: "single, parental home, no child" (SPH); "single, own home, no child" (SNC); "single, own home, 1+ child" (S1C); "cohabiting, no child" (CNC); "cohabiting, 1 child" (C1C); "cohabiting, 2+ children" (C2C); "married, no child" (MNC); "married, 1 child" (M1C); "married, 2 children" (M2C); and "married, 3+ children" (M3C). Those who separate from a cohabiting relationship or divorce are considered "single" again in order to prioritize their residential situation over their legal marital status. Divorce before age 30 is rare in Finland, and cohabitation as a substitute for marriage is relatively common. "Single" generally refers to not being in a cohabiting or married relationship, but it might include other relationship forms that we are unable to identify, such as living apart together.
To measure similarity in family formation, optimal matching analysis-the most common form of sequence analysis in the social sciences-calculates pairwise distances between all sequences using two transformation operations-namely, substitution and insertion/deletion of a state-to turn one sequence into another. The transformation operations are associated with specific costs, and the distance between two sequences is given as the sum of these costs for aligning two sequences. For a comprehensive introduction, see MacIndoe and Abbott (2004). Because the sequences are censored at age 30 and because the timing and spacing of events is crucial in family formation, we use an algorithm that emphasizes the timing of events in determining sequence similarity but nonetheless accounts for the order of family-formation states as well. This can be achieved by using only substitution operations and no insertion/ deletion operations in optimal matching (Aisenbrey and Fasang 2010;Lesnard 2010).
The substitution costs are specified to reflect both the substantive closeness of family-formation states and the timing of transitions between them. Cleary, "married, 3+ children" (M3C) is more similar to "married, 2 children" (M2C) than to "parental home, single, no children" (SPH). Therefore, we specify a substitution cost matrix based on the substantive closeness of family-formation states ( Table 3 in the appendix) where, for instance, the first substitution of M3C with M2C comes at a cost of only 1, and the substitution of M3C with SPH costs 11. To appropriately account for similarity in terms of timing, we combine this substitution cost matrix with the time pointspecific transition rates between family-formation states (probability to transition from one state to another), such that the substitution of states between which transitions occur very frequently are less costly and generate less distance (Lesnard 2010). 2 This results in a separate substitution cost matrix for each time point (year) by summing the substantive substitution cost matrix and the time point-specific transition probabilities for each year. The approach can be considered a modified version of the dynamic Hamming measure (Lesnard 2010) that accounts for the timing and the order of family-formation events at the same time. As a robustness check, we conducted all analyses for three additional cost specifications in the sequence analysis: only dynamic Hamming distance, only substantive substitution costs in Table 3, and Elzinga's longest common subsequence metric (Elzinga and Liefbroer 2007). Although the regression results slightly varied across cost specifications, the substantive findings were robust. Because the cluster analysis based on our proposed combined cost specification produced the best cluster solutions according to several cluster cutoff criteria (available upon request) and the highest explanatory power in the regression models, it was best suited to capture the variability in our data.
The output of the sequence analysis is a pairwise distance matrix that contains sequence distances for each possible combination of siblings and unrelated persons. Each of the 9,266 individual family-formation sequences is compared with all other sequences in a pairwise comparison, which yields (N(N -1)) / 2 comparisons = 42,924,745, (i.e., cells in the distance matrix). This matrix is the basis for three sets of analyses that speak to our three research questions. First, to address the descriptive question-whether siblings' family formation is more similar than for unrelated dyads-we simply extract the distance values for the sibling dyads and the randomly assigned unrelated dyads (upper panel of Fig. 1) from the larger distance matrix. The distribution of the sequence distances among siblings is then compared with that among randomly assigned unrelated dyads.
Second, to test whether parental background characteristics account for sibling similarity in family formation, we use the dyadic distances for the siblings and the unrelated dyads as the dependent variable in a dyadic regression. The independent variables include sibling status of the dyad, gender constellation, age difference, education, parental education, mother's age of first marriage, and parental divorce. Table 4 in the appendix shows descriptive statistics on all independent variables and the dyadic outcome measure used in the regressions. Sibling and unrelated dyads do not differ in parental background characteristics because of the conditional assignment of dyads, but there are small differences for some of the remaining variables. Given the rarity of twins, individuals in unrelated dyads have a higher probability of being born in the same year than siblings (11 % vs. 3 %). Siblings in turn have a higher chance of obtaining the same level of education than unrelated persons (52 % vs. 44 %).
We calculate three models. Model 1 (M1) is a baseline model using the first analysis sample of randomly assigned unrelated dyads and including only sibling status. Model 2 (M2) is the same as M1 but uses the second analysis sample of conditionally assigned unrelated dyads that are equalized on parental background characteristics to resemble the sibling dyads. Model 3 (M3) is a fully interacted model by sibling status to identify sibling specific effects, including the full set of covariates. M3 uses the conditionally assigned unrelated dyads that equalize variation on parental background characteristics and thus allows specifying the same reference categories for siblings and unrelated dyads. The dyads in our analysis samples are unique; but by design, the same individual can occur more than once in a sibling and an unrelated dyad. We therefore calculate robust standard errors.
Third, to examine whether siblings are concentrated in specific substantive familyformation patterns, we apply Ward cluster analysis using the entire distance matrix. Several cluster cutoff criteria, including point biserial correlation, average silhouette width, and Hubert's C, support a five-cluster solution (Hennig and Liao 2013;Kaufman and Rousseeuw 1990). We then examine whether siblings have a higher probability to be in the same family-formation cluster than unrelated dyads and whether this is particularly the case for specific substantive family-formation patterns. The sequence analysis and the calculation of different cluster cutoff criteria were conducted using the TraMineR and the WeightedCluster packages in R (Gabadinho et al. 2011;Studer 2013).

Is Siblings' Family Formation More Similar (RQ 1), and What Role Do Shared Parental Background Characteristics Play for Sibling Similarity (RQ 2)?
We begin by establishing that sibling's family formation is significantly more similar than for unrelated dyads, particularly for same-sex siblings. Figure 2 shows the distribution of the dyadic sequence distances for siblings (light gray line) and unrelated dyads (dark gray line) separately for the complete sample, opposite-sex dyads, female dyads, and male dyads for analysis sample 1 using the randomly assigned unrelated dyads. The distances are normalized between 0 and 100, where 0 indicates two identical sequences, and 100 indicates the most dissimilar family-formation trajectories. The width of the curves reflects 90 % asymptotic confidence intervals. For all subgroups, the two curves for siblings and unrelated dyads largely do not overlap, thus indicating that in these locations of the distribution, the difference between them is statistically significant. This difference is particularly pronounced for same-sex siblings, where the gap between the two curves is largest (bottom graphs in Fig. 2).
Family-formation sequences are most different for unrelated women, followed by sisters, unrelated men, and brothers. Note that family formation is still more different for sisters (42) than for unrelated men (38.9) up to the age of 30. Nonetheless, the difference between siblings and unrelated persons is slightly larger for sisters at 47.8 -42.0 = 5.8, compared with 38.9 -33.9 = 5.0 for brothers. These findings underline the primacy of gender in structuring family formation in two regards. First, same-sex siblings are much more similar to one another than are opposite-sex siblings. Second, the higher average distances for women might result partly from more complexity within individual family-formation sequences. There might simply be more happening in women's trajectories with more family states and more frequent changes between them. A sequence complexity measure (Elzinga 2010;Elzinga and Liefbroer 2007) that captures the number and duration variation of different family-formation states within individual sequences confirms significantly higher sequence complexity for women than for men (available upon request). If most men's family-formation events are simply delayed, expanding the observation window to age 40 might somewhat equalize this difference between men and women. However, the significantly higher sequence similarity for brothers than for unrelated men also until the age of 30 suggests that the censoring does not majorly distort the analysis of sibling similarity in family formation. Our findings further highlight that women are much more actively engaged in family formation in the crucial years (ages 18-30) that coincide with the accumulation of educational credentials and labor market entry and thus set the stage for future lifetime labor market success.
Overall, the sibling effect, calculated as the difference in mean distance between siblings and randomly assigned unrelated dyads, is 44.1 -40.1 = 4.0. Sibling similarity after conditional assignment ( Table 4 in the appendix) is slightly reduced to 43.7 -40.2 = 3.5. The conditional assignment on parental background characteristics thus reduces sibling similarity by 12.5 %, but siblings remain significantly more similar in their family formation after conditional assignment.
The dyadic regression presented in Table 1 further supports this finding. Negative coefficients indicate less distance and thus more similar family formation in a dyad.   Model 1 (M1) shows that siblings are, on average, significantly more similar to one another than are randomly assigned unrelated dyads by -3.94 on the distance measure ranging from 0 to 100. Except for rounding errors, this corresponds almost exactly to the mean difference of 4.00 established in Fig. 2. Model 2 (M2) shows that this effect remains similar at a difference of -3.52 for siblings and conditionally assigned unrelated dyads, corresponding to the 12.5 % reduction in sibling similarity established earlier.
Even though many of the covariates in M3 significantly account for the variance in the dyadic distance measure and the adjusted R 2 increases from 0.01 in M1 to 0.12 in M3, they add little to explaining sibling similarity. The sibling main effect remains significant when we include the full set of covariates (M3) but is reduced to -2.69. Note that in M3, the coefficients for the main effects in the left column refer to the effects for unrelated dyads. The interaction effects in the right column capture the additional effects for siblings. We find only three significant sibling interactions: for sisters, twins, and high education. Overall, the results support that sibling similarity in family formation is only moderately generated by the compositional effects of shared parental background characteristics in terms of early childhood family structure and parental education (RQ 2). They affect similarity in family formation in the direction that we would expect but, as shown by the interaction effects in M3, mostly for siblings and unrelated dyads alike.
The results from M3 substantiate the descriptive gender findings (Fig. 2): women are overall most different in their family formation, but sisters are relatively more similar than brothers compared with conditionally assigned unrelated dyads of the respective sex. This is visible in the positive main effect for female dyads and the significant negative interaction for sisters. In contrast, unrelated men and brothers both are more similar to one another by 6.91 distance points than opposite-sex dyads.
An age difference of more than three years increases sequence distance slightly by 0.89 for siblings and unrelated dyads, possibly a "mini cohort effect" for successive birth cohorts. There is a strong twin effect of -7.2 measured as siblings who were born in the same year. A closer examination of the sequences revealed that this effect is mainly driven by the joint experience of home leaving. The probability to leave the parental home at the same age is 27 % for twins, compared with only 13 % for siblings, 13 % for unrelated young adults who were born in the same year, and 11 % for unrelated young adults who were not born in the same year.
The results support the paramount importance of education in structuring early life courses and family formation. The higher the dyad's combined education, the more similar is their family formation for both siblings and unrelated dyads. This underlines the greater heterogeneity of family formation among the lower-educated, whose early life courses unfold outside of educational institutions from a much younger age. The only sibling-specific effect is that two highly educated siblings are significantly more similar in their family formation than two highly educated unrelated persons, net of controlling for parental education. Additional analyses (available upon request) on the interaction of gender and education show that education has a much stronger effect for female than for male dyads, substantiating that women's family formation is more tightly interrelated with their educational and employment trajectories during the third decade of life.
High parental education generates more similar family formation for siblings and unrelated dyads alike, net of their own education. Apparently, parental education has a direct impact beyond indirect effects through intergenerational status transmission as posited in socialization and social control theories. Possibly, highly educated parents make use of greater resources to guide their children's family formation, including financial support for a longer time into early adulthood. In contrast, children of lowereducated parents might leave the radar of parental influence and establish independence at younger ages.
In line with our expectations, parental marital history as an indicator of early childhood family structure equally affects similarity in family formation. Like education, the effects do not vary for siblings and unrelated dyads. Two members of a dyad are more similar to one another if their mother married late, likely by also delaying their own family formation. They are less similar to one another if their mother married young. This dissimilarity might be driven by intergenerational behavioral transmission: if they start family formation early themselves, there is more going on early on in the sequences, increasing the risk of future union disruption and instable family formation. Additional analyses substantiated that sequence complexity is indeed significantly higher for children whose mothers married young than for those whose mothers married late.
In line with previous research (McLanahan and Bumpass 1988;Wolfinger 2000), we find a similar pattern for divorce: calculating the complexity measure showed that children whose parents' divorced have more instable family formation themselves and are less similar to one another in their family formation, again within sibling and unrelated dyads alike. Parental divorce is a very different experience, even for siblings in the same family, depending on the age of the child, custodial arrangements, and the level of parental conflict before, during, and after the divorce (Amato and Anthony 2014). Our results underline this heterogeneity in divorce experiences and refute a uniform impact of parental divorce on siblings' family formation.
We conclude that parental background characteristics affect similarity in family behavior much in line with our expectations yet contribute surprisingly little to explaining sibling similarity in family formation. To gain a deeper insight into the nature of this sibling similarity, we next directly examine whether siblings have a higher probability to experience the same substantive family-formation patterns than unrelated dyads.
In Which Way Is Siblings' Family Formation Similar (RQ 3)? Figure 3 shows relative frequency sequence and dissimilarity-to-medoid plots (Fasang and Liao 2014) for the five family-formation patterns derived with cluster analysis. The clusters are ordered ascending according to average education in each group, starting with the most-educated group in the top panel of Fig. 3. Within each cluster, the sequences are sorted by the age of leaving the parental home for the first time. Each horizontal line in the left panel of Fig. 3 represents one individual sequence, with different colors indicating different family-formation states. It is impossible to plot all 9,266 sequences because it would lead to overplotting. The left panels of Fig. 3 therefore show a selection of representative sequences for each cluster that comprise 2 % of the sample (186 representative sequences). They are chosen as the medoid sequence (Aassve et al. 2007)-that is, the sequence with the minimum distance to all other sequences within equal-sized frequency groups across the clusters (for details, see Fasang and Liao 2014). The dissimilarity-to-medoid plots in the right panels of Fig. 3 indicate the degree of sequence variation within each cluster and thereby are informative about how homogeneous each family-formation pattern is. They are box plots of the mean and interquartile range of the pairwise sequence distances between the chosen  representative medoid sequences displayed in the left panels of Fig. 3 and the other sequences that this medoid represents. The size of the clusters is proportional to their relative frequency in the population. Table 2 shows descriptive statistics for the clusters. We subsequently describe the five family-formation patterns referring to Fig. 3 and Table 2. The highest-educated cluster (66 % medium or high education, Table 2) at the top of Fig. 3 follows a "traditional" pathway of leaving the parental home, cohabiting, marrying in their late 20s, and having one or two children by age 30. Only 26 % of young adults in this group do not experience the transition to parenthood until age 30.
In the middle of the educational distribution, we find two groups that are characterized by childlessness (roughly 85 % childless). The first group is called "home stayers" because they leave the parental home at the latest age, with a median home-leaving age of 25 compared with age 22 in the total sample ( Table 2). The left panel of Fig. 3 shows that this is the most homogenous group with low dissimilarities to the representative sequences displayed in the right panel. Extended living in the parental home is primarily a male experience (66 % men). The second group with medium education, called "living alone/transitional cohabitation," shows a pattern of establishing an independent household in the early 20s (median age 21) and moving in and out of transitional periods of cohabitation throughout their 20s.
Finally, we find two smaller but highly distinct family-formation patterns at the bottom of the educational distribution: "early marriage, high fertility" and "parenthood out of wedlock." In these groups, roughly 70 % have the lowest level of education. The same is true for their parents. Compared with the population average, their family formation is characterized by early parenthood (median ages 23 and 24) and high parities (2.46 and 1.72) either within marriage or out of wedlock in cohabiting unions and as single parents. Moreover, Fig. 3 illustrates that virtually all persons in these two patterns move directly into a coresidential union when leaving the parental home. The high distances to the medoid in the right panels of Fig. 3 shows that these are particularly heterogeneous groups. The corresponding family-formation experiences are primarily made by women (70 and 69 %).
The clusters underline the stratification of family formation by education and gender visible in the dyadic regression (Table 1): low education goes along with more eventful and variable family-formation trajectories of high parities and parenthood out of wedlock. High education is associated with a "traditional" timing and sequencing of family formation and establishing a one-or two-child family secured in marriage by age 30. Another pattern that is characteristic for high and medium education is a relatively uniform delay of family formation ( Table 2). The two postponement clusters are the most prevalent patterns and represent more than one-half of the study population.
After establishing these main patterns of family formation among our cohorts of young adults in Finland, we can address our third research question: whether siblings are concentrated in any of these patterns. Figure 4 shows the probability of the focal person's dyad partners-siblings and conditionally assigned unrelated persons-to be in the same cluster as the focal persons. If two members of a sibling dyad have a higher probability to be in the same cluster than two members of an unrelated dyad, this indicates that specific family-formation patterns are concentrated within families and offers insights into the nature of sibling similarity in family formation. The line in the middle of Fig. 4 and the corresponding percentages show the overall probability to be in the same cluster as the focal person, conditional on the focal person's cluster membership. The graph sets this line to 0 to illustrate deviations from the mean for siblings and unrelated persons. First, across all clusters, siblings have a higher probability to be in the same group as the focal person (solid lines on the right of Fig. 4) than unrelated persons (dashed lines on the left). For instance, the mean probability to be in the "extended cohabitation, parenthood out of wedlock" group-given that the focal person is in this cluster-is 22 % for siblings but only 14 % for unrelated persons. This corresponds to a 29 % deviation from the mean: that is, a 29 % higher probability for a sibling to be in the same cluster as the focal person than for the population average. The clusters in Fig. 4 are ordered descending according to the degree of sibling concentration within each group. Siblings particularly have a higher probability of being in the same cluster in the family-formation patterns that are associated with educational disadvantage (see Table 2): "extended cohabitation, parenthood out of wedlock" and "early marriage, high fertility."

Discussion
In this article, we propose a novel research design using sibling comparisons and sequence analysis to study family-of-origin effects on holistic family-formation trajectories. Findings show that siblings are moderately but significantly more similar to one another in holistic family-formation trajectories than unrelated persons. This is particularly the case for same-sex siblings. Equalizing siblings' and unrelated dyads' parental background characteristics decreases the sibling similarity moderately, by about 13 %. Moreover, sibling similarity is particularly pronounced for sisters in family-formation patterns of "extended cohabitation, parenthood out of wedlock" and "early marriage, high fertility." These family-formation patterns are at the same time associated with economic disadvantage. In addition to the moderate decrease of sibling similarity through parental background, the dyadic regression showed that parental background largely affects similarity in family formation for siblings and unrelated dyads in the same way. However, we want to highlight two notable sibling-specific interactions, which reduce sibling's dissimilarity in family formation. First, twinsmeasured as siblings born in the same year-are much more similar in their family formation than other siblings. On the one hand, shared genetic background may account for this twin effect (Kohler et al. 1999(Kohler et al. , 2002. On the other hand, twins also share more of their environment and socialization

Relative deviation from the mean (%)
Unrelated dyads Sibling dyads Fig. 4 Focal person and alter in the same cluster: Conditional probabilities and relative deviations from overall mean experiences than siblings. There might be a culture or norm for twins to jointly step through "rites of passage" in the transition to adulthood, such as leaving the parental home together or orchestrating a double marriage. As noted earlier, our results indeed support a clustering of joined home-leaving for twins. Small case numbers (133 twins) and the relatively crude twin indicator unfortunately prohibited further investigation of the twin effect in this study. Second, family formation is significantly more similar for highly educated siblings than for two highly educated unrelated persons, net of parental education. This corresponds with Conley and Glauber's (2008) finding that sibling correlations in education and earnings are strongest among siblings from advantaged families. Our study suggests that this might also be the case for similarities in family formation. Arguably, there is particularly strong mutual reinforcement of specific familyformation patterns among highly educated siblings.
Overall, we conclude that the observed parental background characteristics and sibling interactions play only a relatively marginal role in generating sibling similarity in family formation. At the same time, the strong effect of respondents' own educational level and gender supports continuing importance of the family of origin for shaping people's family behavior. These effects might be even larger in countries with closed stratification systems, in which intergenerational status transmission is stronger than in Finland. The same is true for the size of the sibling effect. Cross-national comparisons show relatively low sibling correlations in earnings for Nordic countries (e.g., Björklund et al. 2002). Systematic comparisons with countries representing other social mobility and welfare state regimes would be useful to test whether this also holds for sibling correlations in family formation. Nonetheless, our study supports that even in Finland, siblings are notably and significantly more similar in family formation than unrelated persons.
At first glance, the explanatory power of the sibling effect might seem small, explaining only 1 % of the variance as indicated by the R 2 in Table 1. Generally, sequence distances are a highly complex outcome where small effect sizes and relatively low R 2 are common (see Liefbroer and Elzinga 2012). Further, a comparison with the explanatory power of education puts the explanatory power of the sibling effect into perspective. An additional regression model including only education on the conditionally matched sample accounts for 5 % of the variance indicated by R 2 (available upon request). This means that the sibling indicator accounts for as much as 20 % of the variance in the sequence distances as education accounts for. Given the uncontested crucial relevance of education for demographic behavior, a sibling effect of this size-even if comparatively moderate-warrants future research. Family internal dynamics, such as parent-child relationship quality and mutual sibling influence, might generate the remaining unexplained sibling similarity. The moderate size of the sibling effect can also be interpreted in the context of these family internal dynamics. Although siblings share the same family of origin, they also have many experiences that they do not share, such as differential parental treatment or sibling position (East and Jacobson 2000;Whiteman et al. 2011). Our findings support that overall, the mechanisms that generate sibling similarity in family formation outweigh such differentiating factors.
To gain a deeper insight into the nature of sibling similarity in family formation, we subsequently showed that substantive family-formation patterns associated with economic disadvantage are concentrated within families. Sisters are concentrated in family-formation patterns of "extended cohabitation, parenthood out of wedlock" and "early marriage, high fertility." That is, specific gendered family-formation patterns and economic disadvantage are encapsulated within families and thus contribute to the reproduction of inequality (McLanahan and Percheski 2008). These results suggest that siblings reinforce one another in following family-formation patterns stratified by education and gender. Further, unobserved parental effects could generate particularly strong sibling clustering in certain family-formation patterns. East and Jacobson (2003:395), for instance, pointed out that "the disproportionally high teenage birth rates among the sisters of childbearing teens" could be partly explained by the strained parenting of their mothers, which in turn was caused by the increased financial and emotional burdens following the first teenage motherhood in the family. In response to the negative parental treatment, the younger sisters of teenage mothers are more likely to engage in problem behavior and are at a high risk of early pregnancy as well.
To further improve our understanding of sibling similarity in family formation, future research should address several issues. First, lacking data on the relationship quality between parents and their children as well as among siblings, we were unable to account for possible moderating effects of these psychological characteristics. Close emotional relationships can function as "transmission belts" (Schönpflug 2001) and could be one explanation for the remaining unexplained sibling similarity that we observe. Second, more detailed indicators of parents' and siblings' social and occupational status would be desirable to assess family background effects in greater detail. In this study, we were able to rigorously scrutinize the effect of two crucial background factors that have been shown to correlate with many other potentially relevant parental background characteristics, such as parenting styles (Chan and Koo 2010;Lareau 2003). Third, we had no information on interactions between siblings and could not directly measure mutual sibling influence in family behavior, which is an important alternative explanation for the sibling similarity in family formation. Finally, the analysis of longer family-formation trajectories until the age of 40 and beyond would allow for studying potential variation of the degree of sibling similarity across the life course. Previous research suggests that family-of-origin effects are more pronounced for early transitions but diminish as people age. For instance, Lyngstad and Prskawetz (2010) showed that siblings' birth events influence the transition to first parenthood but lose relevance for higher-order parities. The literature on older sisters' out-ofwedlock or teenage childbearing and their younger sister's risk of experiencing these events as well points in a similar direction (e.g., East and Jacobson 2001;Powers and Hsueh 1997). Because men experience family formation events later than women, an extension of the observation period might weaken the sibling-gender interaction that we found for women while strengthening it for men.
In sum, our findings show that there is significant sibling similarity in family formation and that shared parental education and parental marital history add little to explaining this similarity. Instead, education and gender stand out as major stratifying forces of family formation for both siblings and unrelated dyads. Yet, sibling status seems to reinforce the impact of education and gender. As a result, compared with unrelated dyads, siblings are particularly overrepresented in family-formation patterns that correspond with specific constellations of educational levels and gender. As our cluster results show, scrutinizing the nature and driving forces of sibling similarity in family formation can potentially improve our insight into the link between family behavior and the reproduction of social inequality. Combining a sibling design and sequence analysis offers insights both into the amount of similarity in the familyformation trajectories of siblings and unrelated dyads and into the substantive content of this similarity in the family-formation clusters. This approach is, in principle, easily transferable and might yield promising results when applied to other research questions, such as sibling similarity in educational and employment trajectories, or health trajectories.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.