## Abstract

Because of incomplete registration of deaths in most countries in sub-Saharan Africa, data on the survival of close relatives constitute the cornerstone of estimates of adult mortality. Since 1990, sibling histories have been widely collected in Demographic and Health Surveys and are increasingly being relied upon to estimate both general and maternal mortality. Until recently, the use of sibling histories was thought to lead to underestimates of mortality, but a more optimistic view in the literature emerged with the development by Gakidou and King (Demography 43:569–585, 2006) of corrections for selection biases. Based on microsimulations, this article shows that Gakidou and King’s weighting scheme has been incorrectly applied to survey data, leading to overestimates of mortality, especially for males. The evidence for an association between mortality and sibship size in adulthood is reviewed. Female mortality appears to decline slightly with the number of surviving sisters, although this could be an artifact of severe recall errors in larger sibships or familial clustering of deaths. Under most circumstances, corrections for selection biases should have only a modest effect on sibling estimates.

## Introduction

Vital registration systems cover only a small fraction of deaths in most parts of sub-Saharan Africa. With the notable exceptions of South Africa and Zimbabwe, often less than 25 % of deaths are recorded. As a result, adult death rates are routinely estimated based on extrapolations from child mortality rates. For instance, for approximately four of five African countries, the UN Population Division (UNPD) uses a two-step procedure. First, the background mortality is inferred from a combination of child survival and model schedules of mortality. Then, AIDS-related deaths are factored in to include the demographic impact of the HIV epidemic. The World Health Organization (WHO) uses a similar approach to produce its own life tables but with a different mortality age pattern (Murray et al. 2003). Important discrepancies remain between UNPD and WHO estimates, even though both organizations use the same levels of childhood mortality as starting points. For example, in Niger, the probability of a female dying between ages 15 and 60 (_{45}*q*_{15}) was estimated by the United Nations (2009) at .31 in 2006, whereas the WHO’s corresponding probability, estimated at .44, is 30 % higher. In a dozen countries, the two sets of estimates for the same probability _{45}*q*_{15} differ by more than 20 %. Such discrepancies arise because the estimates are overly sensitive to the choice of a model mortality schedule. In countries severely affected by the HIV epidemic, model-based estimates are also critically dependent on the accuracy of estimates of AIDS-related deaths. In addition, it is problematic to infer the level of background mortality from data on child survival because trends in child mortality may not reflect the prevailing path of mortality among the uninfected adult population.

The shortcomings of model-based estimates provide an important impetus for seeking empirical counterpoints. Regrettably, although mothers interviewed in fertility surveys provide first-hand information on the survival of their children, no equivalent source of data has been found to be fully satisfactory for adult mortality. Three types of data are currently used to elicit information on adult deaths: (1) orphanhood status, which is collected in both censuses and surveys (Timæus 1992); (2) recent household deaths reported by census respondents (Dorrington et al. 2004); and (3) survival of maternal siblings as collected in large-scale surveys (Timæus and Jasseh 2004). Although such retrospective reports are deemed to result in underestimates of adult mortality, sibling survival data have received increasing acceptance in recent years. This more optimistic view in the literature is partly due to the work of Gakidou and King (2006). They found a positive correlation between the number of siblings and the mortality in Demographic and Health Surveys (DHS), and suggested a weighting scheme to correct for selection biases arising from this association. Applied to survey data, this scheme yielded much higher estimates than previous calculations based on the same data (Obermeyer et al. 2008; 2010).

The purpose of this article is to provide a methodological critique of these latest developments. First, it is shown that recent attempts to correct for selection biases in sibling histories have led to overestimates of adult mortality, especially among males, because they were not adjusted to the sampling frame. Second, a microsimulation model is used to demonstrate that past declines in fertility and mortality can create spurious correlations between sibship size and mortality, but this phenomenon does not necessarily translate into selection biases. Third, an analysis of DHS data restricted to a single cohort of adult sisters indicates a possible negative correlation between number of sisters and mortality, although this could be an artifact of familial clustering of deaths or more severe recall errors as the number of siblings increases.

## Sibling Histories as a Source of Direct Estimates of Adult Mortality

Sibling data have been collected since 1990 as part of a maternal mortality module in more than 60 African DHS surveys, covering more than 30 countries. A standardized set of questions is used to elicit an exhaustive list of siblings born to the same mother. Information is collected by birth order about their gender and survival status. Current age is recorded for surviving siblings, and age at death and years since death are collected for the deceased. Additional questions are aimed at identifying pregnancy-related deaths. Questions about sibling survival are also incorporated into a men’s questionnaire administered for a subsample of some DHS surveys. Sibling histories are collected in other international survey programs, such as World Health Surveys, UNICEF’s Multiple Indicator Cluster Surveys, the Pan-Arab Project for Family Health Surveys, and U.S. Centers for Disease Control and Prevention reproductive health surveys.

A major advantage of sibling histories is that they provide direct estimates: the observed number of deaths can be divided by the corresponding person-years of exposure. Indirect techniques are also available, whereby proportions of surviving siblings are converted into survival probabilities (Hill and Trussell 1977; Timæus et al. 2001). However, indirect estimates need to be time-located, and the methods used to identify time references become flawed when mortality reversals occur. Direct calculation should therefore be used whenever possible. One limitation of the direct approach is that sample sizes are too small to allow for the computation of national age-specific death rates without introducing some smoothing. This can be overcome by pooling surveys in a regression model to borrow strength from neighboring countries.

*x*is an index for the age, for the age group,

*g*for the sex,

*i*for the country, and

*t*for the year. The overall mortality level and sex differences vary by country. The background mortality follows a log-linear trend with a country-specific rate of increase. A model age pattern is introduced to smooth the non-AIDS component of mortality

^{1}The shape of mortality is specific to each country, but it is not assumed to be time-dependent except for countries whose HIV prevalence ever exceeded 1 %. Four years after the prevalence reaches this threshold (

*T*

_{i}), the mortality is allowed to change along with the duration of the epidemic. Above age 20, the HIV epidemic can also modify the shape of the mortality; here, the age pattern of mortality increase attributable to AIDS is assumed to vary between geographic subregions. Finally, to accommodate possible declines in mortality, a quadratic term is added for a subset of countries (

*S*) with a stalling or decreasing HIV prevalence. All surveys are pooled and reshaped into a person-period data set, wherein each observation corresponds to the combination of a same age, sex, calendar year, and country. For a discussion of the mortality rates obtained from this model, see Timæus and Jasseh (2004) and Reniers et al. (2011).

Although they are widely collected and similar in nature to DHS birth histories, sibling histories are not as extensively used. Part of the skepticism about sibling data may be attributed to the underreporting of deaths. Respondents may not be aware of siblings who died before they were born or while they were young. They may also underreport the deaths of siblings whose whereabouts are unknown to them. However, such omissions do not affect direct estimates if the omitted deaths occurred in childhood or many years prior to the survey. In addition, the extent of underreporting can be assessed and partly corrected for. A standard approach consists in comparing mortality rates from successive surveys whose reference periods overlap (Obermeyer et al. 2010; Timæus and Jasseh 2004). In 17 countries of sub-Saharan Africa, at least two DHS have been conducted less than 10 years apart with a maternal mortality module. This provides a basis to evaluate how the completeness of death reporting decays as the reference period extends farther back in time. Compared with the 3 years immediately prior to the survey, the completeness of reporting of female deaths is 83 % 6 to 9 years prior to the survey (Reniers et al. 2011). Male deaths are significantly underreported as early as 3 to 6 years prior to the survey (91 %), and past that point, completeness of death reporting drops to less than 75 %. Estimates should thus be adjusted and mortality rates in the distant past should be treated with caution.

Another reason for the relatively scant use of sibling histories is the suspicion that they are plagued with selection biases. However, despite increasing attention devoted to this issue, the extent of these biases remains poorly understood, and there is some disagreement about how they should be corrected.

## Previous Research on Selection Biases in Sibling Data

Sibling histories present three structural limitations. First, groups of siblings (also referred to as sibships) with high mortality are underrepresented because no information is available for sibships without a surviving member. Second, low mortality sibships are overrepresented because a single sibship’s situation may be counted multiple times when more than one sibling is interviewed (as is the case in DHS surveys). Third, in most of the research based on sibling data, the respondents themselves are not counted in the denominator, which produces upward bias in the mortality estimates.

Trussell and Rodriguez (1990) demonstrated mathematically that these three structural limitations cancel one other out, provided that (1) the siblings interviewed constitute a probability sample of the population, (2) the experience of the respondents is excluded from the calculation, and (3) no correlation exists between mortality and sibship size (henceforth referred to as sibsize). To place recent developments into context, the work of Trussell and Rodriguez (1990) is summarized here.

### The Argument of Trussell and Rodriguez (1990)

*p*, which is assumed to be independent of

*n*, the sibsize. The observed number of deaths in each sibship follows a binomial distribution (with parameter

*np*). If every surviving member is interviewed and own reports are not counted, the number of respondents in each sibship equals (

*n − x*), the number of deceased siblings equals

*x*, and the number of siblings considered for the denominator is (

*n*− 1). I assume for now that the data come from a complete census. Under these conditions, if

*f*(

*x*) stands for the probability density function of the binomial distribution, the expected proportion of deceased siblings for a given sibsize

*n*can be expressed as

Intuitively, one can think of the denominator as representing *nq* living respondents, each reporting on (*n* − 1) siblings.

*n*. To complete the argument, the distribution of sibsizes

*g*(

*n*) must also be taken into account. If the numerator of Eq. (2) is denoted

*N*(

*n*) and the denominator is denoted

*D*(

*n*), the proportion of observed deaths averaged over all sibsizes can be expressed as

From Eq. (3), one sees that *N*(*n*) = *pD*(*n*). Because *p* is identical in all sibships, it can be factored out. Equation (4) simplifies to *PD* = *p*, which demonstrates mathematically the cancelling out of the three selection biases.

This argument can be illustrated with a numerical example. In Table 1, 100 sibships are randomly drawn from a Poisson distribution (with a mean sibsize of 5). If the probability of death is .25 regardless of the sibsize, interviewing every survivor will provide an unbiased estimate. By contrast, retaining only one respondent per sibship will translate into an overestimation of mortality. Here, the proportion of observed deaths would be 24 % higher than the real proportion (.309 / .25). This bias will be directly proportional to sibsize and to the level of mortality. Therefore, with only a few exceptions (Hirschman et al. 1995; de Walque and Verwimp 2010), studies based on sibling data combine the reports of all siblings included in the sample, without identifying duplicated sibships.

Methodological choices in previous research vary more in terms of weights used and in the inclusion (or not) of person-years lived by respondents. Most studies were based on the critical assumption that mortality is unrelated to sibsize. They did not correct for selection biases and simply excluded the person-years lived by respondents from the total of exposure (Bicego 1997; Gakidou et al. 2004; Reniers et al. 2011; Timæus and Jasseh 2004). This is also the approach taken in DHS reports. Conversely, on the presumption that mortality is related to sibsize, recent papers adopted a different approach whereby the experience of the respondent is included in the calculation but weights are used to correct for selection biases (Obermeyer et al. 2008, 2010; Rajaratnam et al. 2010). This approach was first taken in the work of Gakidou and King (2006).

### The Weighting Scheme Used by Gakidou and King (2006)

If death risks vary with sibsize, the standard calculation will be biased. A positive correlation between sibsize and mortality translates into overestimates because larger sibships facing less favorable mortality rates are oversampled. This is illustrated with a numerical example in Table 2. The probability of dying ranges from .1 to .5, and mortality is overestimated by more than 9 % (.338 / .309).

The procedure proposed by Gakidou and King (2006) (henceforth, GK) is twofold. The first step is to weight the data in order to recover death rates for sibships with at least one surviving respondent. The second step relies on extrapolation to correct for the fact that some sibships are not observed at all.

The key idea of the weighting scheme is to give less importance to sibships with high sibling survival by computing family-level weights of the form *B*_{i} / *S*_{i}, where *B*_{i} is the number of siblings of individual *i* at the start of the observation period, and *S*_{i} is the number of surviving siblings at the time of the survey. A crucial point is that respondents are included in the calculation, to compute both *B*_{i} and *S*_{i} as well as exposure. Also salient is that the *B*_{i} / *S*_{i} weights are designed to be applied to proportions of dead siblings reported by each survivor *i*.

*m*individuals survived through the survey period (

*i*= 1, . . .,

*m*), this average can be expressed in two equivalent ways:

*B*

_{i}/

*S*

_{i}or

*n*/(

*n*−

*x*). If

*D*

_{i}stands for the number of dead siblings, the following is obtained:

Table 2 shows that the observed proportion is now very close to the real probability of dying (.307 / .309), the difference being entirely attributable to the omission of sibships without survivors.

*s*= 1, 2, 3 . . .). This model is used to extrapolate back to

*s*= 0. GK discussed various approaches and eventually retained a quadratic fit of the log of the absolute number of observed deaths versus the number of survivors, as shown in Eq. 8. A transformation of β

_{0}gives .

^{2}

Using DHS data, GK presented model fits that look strikingly good. Two examples are provided in Fig. 1, based on DHS surveys conducted in Ethiopia in 2000 and Benin in 1996.

Unfortunately, as noted by the authors, nothing can guarantee the accuracy of extrapolating to the point at which *s* = 0. A binomial model can be used to demonstrate that accuracy is indeed compromised, despite the fact that the quadratic regression adjusts the data very well. Retaining the first example (Table 1), we know that 121 deaths will occur, of which 3 will remain unobserved. The log of the corresponding proportions of deaths are displayed in Fig. 1. An additional example is presented with larger sibsizes and more favorable mortality rates. The curves resemble those observed in DHS data, and the fit is very good for sibships with one to seven survivors (*R*^{2} ≥ .98). In these sibships, one could withhold one point and be able to predict its value with the remaining points with the same accuracy as with DHS data. However, reliance on goodness of fit is insufficient. Invariably, model predictions overestimate by a large margin the number of deaths in families with zero survivors. Predictions are even less accurate where the level of mortality is allowed to vary by sibsize. This is because unobserved deaths are almost exclusively from small sibships (*n* ≤ 3), which, by definition, play only a small part in shaping the overall distribution of deaths by number of survivors.

## Insights From Microsimulated Populations

For two reasons, it is necessary to go beyond the simple numerical examples presented and work with conditions that are more illustrative of DHS data.

First, the cancelling out of selection biases is mathematically correct when mortality rates do not vary by sibsize, but one might wonder whether it holds for survey data. In DHS surveys, only some individuals are eligible to respond to the sibling module (mostly females aged 15 to 49). Respondents also report on the survival status of siblings of the opposite sex. Risks of dying vary over time as well as by age and sex. Hence, even if it were correct to assume that mortality rates are unrelated to sibsize, the fact that death risks are drawn from different distributions adds some complexity.

Second, if mortality does vary by sibsize, the weighting scheme developed by Gakidou and King (2006) needs to be adapted. Selection biases arise because sibships are sampled with probability proportional to the number of survivors (*S*_{i}) rather than the number of siblings at the start of the observation period (*B*_{i}). But when using DHS data, *B*_{i} should not include every sibling; siblings who died outside the observation window or who are not members of the cohort of the respondents (i.e., those born 15 to 49 years ago) should not be counted. In addition, the second step of the correction is more complicated than merely estimating the deaths in families with zero survivors. What is needed is the number of adult deaths that occurred in sibships with no eligible respondent, as well as the corresponding person-years. These estimates need to be distributed by age, sex, and time period.

Microsimulations can provide sibling histories similar to those collected in surveys. They are stochastic models in which the units of analysis are individuals. For each time step, predefined vital rates are converted into waiting times preceding particular events (e.g., deaths, births, marriages). These events are then assigned to fictitious individuals. Some models are closed such that marriage partners are found in the population, rather than created on an ad hoc basis for each individual in search of a spouse. This allows for keeping track of kinship links as they are generated during the simulations. This is the approach taken by SOCSIM, a model developed in the 1970s at the University of California (Murphy 2011; Wachter et al. 1997). For the present purposes, underlying mortality rates can serve as the gold standard against which to evaluate mortality rates obtained from sibling histories.

The distribution of sibsizes observed in DHS surveys and the age and sex composition of sibships are shaped by past trends in mortality and fertility. To approximate DHS sibling histories as closely as possible, simulated populations mimic the demographic trajectories of 41 countries in sub-Saharan Africa. The main features of this set of microsimulations are presented herein; the model is described in greater detail elsewhere (Masquelier 2010).

Simulations are calibrated with estimates taken from the 2008 Revision of the World Population Prospects (United Nations 2009). Age-specific fertility rates and non-AIDS life tables are derived from UNPD estimates, and HIV infection rates are computed from UNAIDS incidence rates.^{3} For each country, the simulations start in 1900 and run under conditions of stability until 1951, when death and birth rates start exhibiting annual variation. To reduce random variability, each run is repeated 10 times, and the final populations are merged to reach 300,000 survivors in 2010. SOCSIM creates a file with one record per individual who lived in the population. This file contains identifiers for parents, which permits the reconstruction of sibships born to the same mother. It also contains birth and death dates, which are used to compute underlying mortality rates, as well as other aggregate indicators, with event-history analysis. Figure 2 compares the relative age composition of person-years in the simulated populations with the age structure estimated by the United Nations (2009) for Mozambique and Botswana in 1950, 1975, and 2009. It illustrates the good agreement between simulation outputs and UNPD estimates, even in the countries hardest hit by the HIV epidemic.

Simulated sibling histories are obtained by duplicating each sibship once for each surviving member eligible for an interview. Unlike DHS data, every eligible individual is interviewed here, as though a census were being conducted. This is because the simulated individuals are not organized into households, and it would require considerable ingenuity to replicate a DHS sampling procedure based on the SOCSIM outputs. As noted by Trussell and Rodriguez (1990), the fact that sibling histories are collected in surveys rather than censuses does not distort the estimates as long as all sisters could be interviewed, which is ensured by proper sampling.

### DHS Analog Calculation

**Fig. 3**

Scenario A: All surviving individuals at the end of the simulation provide information about their maternal siblings.

Scenario B: Individuals of both sexes aged 15–49 are eligible.

Scenario C: Only females provide information on their sibships (regardless of their age).

Scenario D: Women of reproductive age are the only eligible respondents, as is the case in most DHS surveys.

The quantity of interest is the probability of dying between ages 15 and 60 (_{45}*q*_{15}). Neither age pattern nor trend in mortality is modeled here.

For the decade preceding the fictitious census, sibling estimates approach underlying mortality rates in each of the four scenarios. The mortality rates for males can be properly estimated from female respondents:

Counting or not counting own reports in the exposure has no effect (males are not eligible).

Sibships with low female mortality are underrepresented in the data relative to brothers, but this does not introduce a bias because death risks are assumed to be uncorrelated with sibsize.

Mortality rates for males in sibships with no sisters are assumed to be identical to mortality rates in sibships that include sisters.

When sibling histories are collected from adults only (scenarios B and D), the scatter around the estimates is higher in the more distant past. This is because the number of siblings aged 15 to 59 of respondents aged 15 to 49 diminishes rapidly as the reference period extends further back in time. The accuracy of sibling estimates can be measured by drawing samples of respondents from among the population of eligible survivors. With approximately 8,000 adult female respondents, the random variation would be increased such that the percentage root mean square error (RMSE/mean) would exceed 25 % 10 years prior to the survey (results not shown here). This, along with significant underreporting of sibling deaths, confirms that it would be unwise to use DHS estimates to reconstruct adult mortality trends for more than 10 years prior to the survey.

For the more recent periods, these simulations show that the compensating effect of selection biases prevails even though mortality rates vary over time and differ by sex, and despite the fact that females aged 15 to 49 are the only ones to be interviewed. Although the estimates are unbiased, keep in mind that a significant fraction of sibships are duplicated in the data sets. This introduces some unobserved clustering, leading to a downward bias in the standard errors of coefficients when regression models are used. In the simulated example, 27 % of female deaths between ages 15 and 60 that occurred in the last decade are mentioned only once, 20 % are mentioned twice, 19 % are mentioned at least three times, and 34 % remain unobserved. In sample surveys, the percentage of duplicated sibships will be lower and will vary with the size of the clusters and the extent of the coresidence among siblings.

### Use of the GK Weights

The mechanism developed by Gakidou and King (2006) has been applied to different types of surveys, including DHS surveys and World Health Surveys (de Walque and Verwimp 2010; Hogan et al. 2010; Iraq Family Health Survey Study Group 2008; Obermeyer et al. 2008, 2010; Rajaratnam et al. 2010). Obermeyer et al. (2010) referred to this new approach as the “corrected sibling survival” method. They argued that “levels of adult mortality prevailing in many developing countries are substantially higher than previously suggested by other analyses of sibling history data” (p. 1). Nonetheless, most practical applications of the GK weights are flawed in several ways.

*S*

_{i}, and not

*B*

_{i}/

*S*

_{i}. This can be seen from Eq. (9), which reduces to Eq. (6):

Second, most attempts to apply this weighting scheme to survey data did not discriminate between adult siblings and siblings who died in childhood and did not compute sex-specific weights. GK warned that their procedure requires asking respondents about relatives in the same group (e.g. males, aged 40–44 about male siblings aged 40–44) unless the weighting mechanism is adapted. However, this has been overlooked; *B*_{i} has been computed as the original sibsize (i.e., all children born to the same mother), and *S*_{i} has been calculated as the total number of surviving siblings at the time of the survey. This is clearly inappropriate because sibling data are collected from adults only, typically from women of reproductive age.

These two errors translate into biases larger than those arising from selection. This is illustrated in Fig. 4, in which the simulated population resembles Mozambique. In this figure, the only correction applied is the weighting, and no attempt is made to estimate the number of deaths in sibships without survivors.

Weights of the form *B*_{i} / *S*_{i} work well in the first scenario, in which everyone is interviewed (Fig. 4, panel A). However, when women of reproductive age are the only eligible respondents, mortality is overestimated (Fig. 4, panel B). Sibling estimates for females are in line with the true risks of dying, but an additional upward adjustment would be made to account for the lack of data on sibships without any surviving female respondent. By contrast, the mortality rates for males are overestimated by about 20 % since 2000. These results suggest that the higher mortality rates recently obtained from sibling histories partly stem from a failure to adapt the weights to survey data. The incorrect weights also lead to an exaggeration of mortality sex ratios.

Adapting the correction to DHS data requires the weighting of deaths and person-years by 1 / *S*_{i}, where *S*_{i} stands for the number of surviving sisters aged 15 to 49 at the time of the survey. In the absence of recall biases or sampling errors, sibling estimates correspond to the true risks of dying in sibships where there remains at least one sister eligible for an interview. Although the estimates for males are unbiased (Fig. 4, panel C), the mortality rates for females in the simulation should be increased by 14 % on average for data since 2000 to account for the remaining “zero-female-respondent” bias.

### Comparisons Between Alternative DHS Estimates

A comparison of estimates from 60 DHS surveys conducted in sub-Saharan Africa corroborates the results of microsimulations.^{4} Three different series of the probabilities 45*q*15 are derived from the Poisson regression model detailed earlier. *Standard* estimates are obtained by excluding own reports and by using only DHS sample weights. *Adjusted* estimates are computed by adding own reports in the calculation of person-years and by weighting the data by the inverse of the number of surviving sisters aged 15 to 49. Finally, *inflated* estimates are obtained with weights of the form *B*_{i} / *S*_{i}, where *B*_{i} and *S*_{i} refer to the total number of siblings ever born and surviving.

If the standard calculation is unbiased, it should yield mortality rates equivalent to those of the adjusted estimates for males and higher estimates for females, the difference being attributable to the zero-female-respondent bias. This is observed in the left graph of Fig. 5. The median of the ratios of adjusted-to-standard risks of dying is .97 for males and .87 for females (both with an interquartile range of .06). The upward adjustment for the resulting zero-female-respondent bias should have only a modest effect on the male mortality rate, and simulations suggest that this should be around 10 % to 20 % for females. This means that corrections for selection biases in DHS data should not result in differences of more than a few percentage points.

The analysis of DHS data confirms that incorrect use of the GK weighting scheme can result in overestimates. The right-hand graph of Fig. 5 shows good agreement between inflated and standard estimates for females, but again, it does not mean they are equivalent to one another (because of the zero-female-respondent bias). Male mortality is substantially overestimated: the median of the ratios between both series of estimates is as high as 1.28 (with an interquartile range of 11 %). It should be stressed that comparisons presented here refer only to the weighting. They would differ if the final estimates published by Rajaratnam et al. (2010) or Obermeyer et al. (2010) were used instead. They opted for a logistic discrete-time hazard model, along with another modeling strategy to smooth trends and age patterns. They also made additional corrections for the underreporting of deaths and for the zero-female-survivor bias.

*n*is distributed binomially, the observed probability of dying

*p*(

*n*)

^{obs}can be expressed as a function of the true probability

*p*(

*n*):

The value of *p*(*n*) can be algebraically computed from *p*(*n*)^{obs} for sibships of size 2 or 3. In larger sibships, there is no need to adjust the observed probabilities upward because the chance that no one survived is close to zero. The idea is thus to compute *p*(*n*) for sibships of size 2 and 3 and to regress the percentage dead on the sibsize. By using the regression coefficients to predict the percentage dead in sibships of size 1, an estimate of the percentage of deaths that remain unobserved can be obtained.

Using DHS data, Obermeyer et al. (2010) evaluated that the probabilities _{45}*q*_{15} need to be corrected upward by approximately 0.2 % to 4.0 % for females, which is lower than observed in the simulations. However, their correction factors were again based on all sibling deaths, even those occurring in childhood. Obviously, the correction method is unbiased only if the risks of dying are homogeneous across all characteristics except sibsize. This is why calculations should be made by age group and time period, which will dramatically reduce the number of deaths from which the correction factors are derived. The latter will also be larger when computed from recent adult female deaths, potentially introducing great uncertainty into the estimates.

Because the zero-female-respondent bias is canceled out in the standard calculation, it is worth considering whether there is a real need to correct for selection biases, which amounts to reexamining the presumed association between sibsize and mortality.

## The Association Between Sibsize and Mortality Revisited

The first claim that sibsize is correlated with mortality in DHS data was made by Gakidou and King (2006). They estimated a weighted proportion of dead siblings for each sibsize and computed correlation coefficients between this index and sibsize. Using the same sample of DHS surveys, I replicate their calculations and present the correlations in the third column of Table 3.^{5}

When considering the sign of the correlations, I conclude that the standard calculation undoubtedly overestimates mortality: sibships facing higher mortality rates are oversampled because the risks of dying apparently rise with sibsize. Yet, according to authors applying the GK correction, the reverse is true because this correction yields higher estimates. This apparent contradiction has two sources. First, as shown earlier, an improper application of the GK weights results in overestimates. Second, correlations between proportions of dead siblings and sibsize are blurred by fertility and mortality changes. If fertility has declined over recent decades, family size will be positively related to the age of the respondent and thus to the survival status of her siblings. Hence, a positive correlation between sibsize and mortality will be observed. The same will be true if mortality has declined over time. However, it does not follow from this that the standard calculation is biased.

To demonstrate that correlations identified by GK could be spurious, the same calculations are applied to the simulated populations. Although there is no association whatsoever between sibsize and mortality, very high correlations are observed in most countries, as in DHS data (see fourth column of Table 3).

he only association that could introduce selection biases is one between adult mortality and the number of siblings eligible for an interview. Sibsize thus needs to be considered a dynamic variable. In what follows, I retain only those respondents and sisters who were still alive 10 years prior to the survey and were 15 to 39 years old at that time (reports on brothers are discarded). This ensures that they all belong to the same cohort. It also reduces the sibsize, which is now defined as the number of adult sisters surviving 10 years prior to the survey (including own reports). From the viewpoint of respondents, the mean sibsize observed in 60 DHS surveys in sub-Saharan Africa ranges from 2.1 to 3.1 (compared with 4.9 to 7.6 when all siblings are counted).

A Pearson’s chi-square test is first used to assess whether the survival of sisters is associated with the sibsize. The contingency tables are built on respondents merely reporting on their sisters, with no weights other than the DHS sample weights. Of 60 DHS surveys, the null hypothesis is rejected in 18 surveys (at the .10 level).^{6} In six of these 18 DHS, a negative and significant correlation is found between sibsize and mortality (Chad 1997 and 2004, Mozambique 1997, Niger 2006, Tanzania 2004, and Ethiopia 2000).

Trends and age patterns of mortality could confound this association, even in the 10 years prior to the survey. To control for these factors, I use a simplified version of the Poisson regression model of Timæus and Jasseh (2004) and restrict the analysis to females. The fit is significantly improved when sibsize is included in the model as a continuous variable (at the .10 level of significance). The rate ratio for a one-unit increase in sibship size is estimated at 0.96. When this effect is allowed to vary by country, it appears significant in 10 of the 30 countries considered here (Chad, Congo, Ethiopia, Kenya, Madagascar, Mozambique, Namibia, Niger, Rwanda, and Tanzania).

A reference period of 10 years is arbitrary. If a shorter period, such as 5 years, were retained, sisters dying between 5 and 10 years prior to the survey would be discarded. Therefore, in an alternative specification, sibsize is computed as a time-varying covariate. This leads to a steeper decline in mortality by sibsize; when one death follows another in a family, the second one is classified as part of a smaller sibship. Sibsize now appears to be significantly associated with lower risks of dying in 19 countries, with estimated rate ratios ranging from 0.81 in Chad and Mozambique to 1.03 in Togo.

This negative association contradicts my original intuition, informed by the extensive literature showing a strong link between poverty and higher fertility (Schoumaker 2004). Child mortality is also known to increase with parity because of sibling competition, overcrowding, physical depletion of the mother, and transmission of infections (Zaba and David 1996). In addition, some studies conducted in industrialized countries found an association between the number of siblings and adverse socioeconomic and health indicators in adulthood (Hart and Davey Smith 2003; Lundberg 1993).

Although a real correlation between mortality and sibsize cannot be ruled out, at least two alternative explanations can be advanced. First, to a large extent, siblings share similar socioeconomic conditions, adopt similar behaviors, and live close to one another. To monitor inequalities in health, Graham et al. (2004) even advocated adopting a “familial technique” linking the poverty status of DHS respondents with their sisters’ maternal deaths. A familial clustering of mortality will lead to lower mortality rates in sibships where many sisters survived, even if the risks of dying are independent of the original sibsizes.

Another explanation for this decline is that it is an artifact of more severe recall errors in larger sibships. An analysis of missing data on the timing of deaths points in this direction. In 55 of 60 DHS surveys, a chi-square test indicates that proportions of missing reports on age at death or years since death are not independent from the sibsizes. In 28 surveys, there is a positive and significant correlation between the extent of missing data on the timing of deaths and the sibship sizes. Older respondents, whose sibships are larger on average, could disproportionately underreport siblings’ deaths (Stanton et al. 1997). The contacts between siblings could also be less frequent in larger families, resulting in more age displacements and omissions of deaths.

Pinning down the reasons for this negative association between mortality and sibsize is an interesting direction for future research, but it is beyond the scope of this article. The objective here is to ascertain whether the key assumption that mortality does not vary with sibsize is upheld in DHS data. These results suggest that it is not, and that unadjusted estimates will be slightly conservative.

To account for this, one option would be to develop age-specific corrections for the zero-female-respondent bias in order to apply the GK weighting scheme. Here, I suggest an alternate correction method. This method draws on the fact that the argument of Trussell and Rodriguez (1990) remains useful even when mortality is associated with sibsize. It proves that a standard calculation will provide unbiased estimates for each sibsize taken separately. The problem is not the estimation of death rates per se, but rather the weights used to average them. If the original distribution of sibsizes can be approximated, an overall estimate can be recovered. Some extrapolation is still required because age-specific mortality rates in sibships of size 1 are unknown, but they can be obtained as out-of-sample predictions from a regression model.

As a preliminary exploration of this method, the mortality rates observed in the cohort of women born 25 to 49 years prior to the survey are modeled using the Poisson regression model, with sibsize included as a continuous variable. Death rates for sibships of size 1 are estimated by assuming a linear relationship between mortality and number of adult sisters. The corrected estimates are obtained as a weighted average of the predicted rates, with weights being the observed distribution of adult sibsizes at the time of the survey.^{7} On average, the upward adjustment on the probability of dying between ages 15 and 50 equals 7 %. It is higher than 15 % in only two countries: Chad and Mozambique. This confirms that corrections for selection biases should have only a modest effect.

## Conclusion

In sub-Saharan Africa, more than three decades after the onset of the HIV epidemic, demographic data describing chances of survival in adulthood remain scant and defective. Trends and age patterns of mortality above age 15 continue to be surrounded by considerable uncertainty. Because they are derived from levels of all-causes mortality, accurate estimates of maternal deaths and the prevalence of orphanhood also remain elusive.

In this context, it is crucial to properly identify the strengths and limitations of each type of data. Collected in more than 60 nationally representative surveys since the early 1990s, sibling histories are emerging as a very important source of estimates in the region (Bradshaw and Timæus 2006; Obermeyer et al. 2010; Reniers et al. 2011). They circumvent several complications of orphanhood techniques, such as the time-location of estimates when mortality has not evolved linearly, or the methodological problems posed by HIV (Timæus and Nunn 1997). Unlike death distribution methods, they allow the reconstruction of mortality trends from a single inquiry. Pooling all surveys in a regression model makes it possible to smooth noisy trends and bring out the salient patterns of mortality increases attributable to AIDS. Corrections can also be made to account for the fact that the reporting of deaths decays as the recall period lengthens.

From the earliest developments of sibling methods, concerns have been raised about the possibility that sibling data could be plagued by selection biases (Hill and Trussell 1977). This has generated considerable skepticism, which has been recently challenged by the work of Gakidou and King (2006). They observed a very high and positive correlation between mortality and sibsize in DHS data and developed a weighting mechanism to account for this association.

In this article, I demonstrate that a positive correlation between sibsize and mortality can arise from cohort trends in mortality and fertility, even in the absence of any association between risks of dying and sibship sizes. To review the evidence on this association in DHS data, I restrict the analysis to an age-homogeneous cohort of females and find that contrary to previous work, this correlation may be negative. Whether this negative association is real or an artifact of mortality clustering or recall errors remains a matter of conjecture. Further analysis is needed to evaluate the magnitude of the clustering effect in DHS sibling histories and how it might bias the mortality estimates. Ideally, sisters interviewed in the same household should be linked in the questionnaires to control for the duplication of some sibships in the data sets.

The slight decline of mortality with sibsize means that standard estimates will be on the low side. Under most circumstances, however, corrections for selection biases should amount to 5 % to 10 %.

Previous work on selection biases in sibling histories have left the false impression that standard estimates were more heavily biased downward (Obermeyer et al. 2010). This partly stems from a failure to adapt the weighting scheme to the characteristics of siblings and respondents, resulting in overestimates and distortions of mortality sex ratios.

Given the current state of knowledge, the standard calculation should be retained, with the understanding that it provides slightly conservative estimates. There is also room for improvement in the correction methods. First, weights applied at the individual level should differ from the original weights introduced by Gakidou and King (2006) because the latter were developed for the sibship level. Second, when using weights of the form 1 / *S*, the value of *S* should not refer to the number of surviving siblings but instead to the number of potential respondents. Third, no adjustment for the zero-female-respondent is currently satisfactory. The quadratic regression model developed by Gakidou and King (2006) results in overestimates of the number of unobserved deaths. Alternatively, adjustment factors can be computed by making distributional assumptions, but this should be carried out within each age group and time period. Hierarchical modeling would be an appropriate tool to shrink the group-specific adjustments toward an overall mean when adult deaths remain sparse.

Rather than the GK correction, one can apply a slightly different approach, which is more closely related to the standard calculation. It consists of reweighting sibsize-specific mortality rates according to an estimated distribution of sibship sizes. Mortality rates for sibships of size 1 can be obtained as out-of-sample predictions from a regression model. Undoubtedly, the best option is to derive estimates from various calculations to examine the sensitivity of the results to the underlying assumptions.

Because progress in the completeness of civil registration lags far behind gains made in other regions (Mathers et al. 2005), our understanding of adult mortality in sub-Saharan Africa will remain dependent on modeled estimates based on child mortality or unconventional data sources, such as kin survivorship. To reduce our reliance on models, it is essential to make more extensive use of sibling histories because they offer the most comprehensive view of premature adult deaths occurring in the region. Improving the accuracy of sibling estimates requires further work on selection biases, but more attention should also be devoted to the pervasive effects of underreporting of deaths, age misstatements, and inaccurate timing of deaths. Appropriate adjustments for such errors and refinements of the data collection to enhance recall should have larger effects than corrections for selection biases.

## Acknowledgments

This article was written with the support of fellowships from the BAEF and the FNRS. Thanks are due to D. Feehan, P. Gerland, C. Mason, J. Rajaratnam, G. Reniers, D. Tabutin, and two anonymous reviewers.

## Notes

^{1}

I use the general model of the United Nations (1982).

^{2}

They suggested computing , where σ is the standard error of the regression.

^{3}

In its core version, SOCSIM makes no allowance for AIDS mortality, but different transition rates can be set up for different groups with specific demographic rates. I model the HIV disease progression as a staged process from HIV infection to full-blown AIDS, allowing for reduced fertility of HIV-positive mothers and vertical transmission.

^{4}

Four DHS surveys are discarded: Sudan 1990, because the data set is not standardized; Nigeria 1999 and Liberia 2007, because the data are not of very good quality (NMCP Liberia 2009; Pullum 2008); and Sierra Leone 2008, because the survival status is unknown for as much as 9 % of all siblings. Only women’s surveys are used.

^{5}

These correlations are slightly higher than those published by GK, which are obtained from death rates corrected for the zero-survivor bias.

^{6}

Sibships of more than six members are merged.

^{7}

Own and proxy reports of person-years are divided by *S*_{i} and tabulated by sibsize. The weights are computed for each age group because the first and last age groups have smaller sibships on average, given that they can be reported only by either their older siblings (in the case of 15- to 19-year-olds) or their younger siblings (in the case of 45- to 49-year-olds). Linear interpolation is used between successive surveys.