## Abstract

Hundreds of millions of people live in countries that do not have complete death registration systems, meaning that most deaths are not recorded and that critical quantities, such as life expectancy, cannot be directly measured. The sibling survival method is a leading approach to estimating adult mortality in the absence of death registration. The idea is to ask survey respondents to enumerate their siblings and to report about their survival status. In many countries and periods, sibling survival data are the only nationally representative source of information about adult mortality. Although a vast amount of sibling survival data has been collected, important methodological questions about the method remain unresolved. To help make progress on this issue, we propose reframing the sibling survival method as a network sampling problem. This approach enables a formal derivation of statistical estimators for sibling survival data. Our derivation clarifies the precise conditions that sibling history estimates rely on, leads to internal consistency checks that can help assess data and reporting quality, and reveals important quantities that could potentially be measured to relax assumptions in the future. We introduce the R package *siblingsurvival*, which implements the methods we describe.

## Introduction

Death rates at adult ages are a core component of population health and a central topic of study for demography. Unfortunately, most of the world's poorest countries do not have complete death registration systems, meaning that most people die without ever having their existence officially recorded (AbouZahr et al. 2015; Setel et al. 2007). This lack of complete death registration means that critical quantities, such as life expectancy, cannot be directly measured. Improving death registration systems is the long-term solution to this scandal of invisibility, but progress has been very slow (Mikkelsen et al. 2015). Until complete death registration systems are available everywhere, sample-based approaches to adult mortality estimation will continue to play a critical role in understanding population health and well-being.

The leading approach to collecting information about adult mortality in the absence of death registration is the sibling survival method (Brass 1975; Rutenberg and Sullivan 1991). The idea is to ask survey respondents to report the number of siblings they have and to then ask for each sibling's gender, date of birth, and (where appropriate) date of death. This data collection strategy produces *sibling histories* that contain information about the survival status of all of the members of the respondent's sibship.

Because high-quality household surveys are routinely conducted in most countries—including countries that lack death registration systems—the sibling survival method offers the opportunity to try to estimate adult death rates in many places that have no other nationally representative adult mortality data. Over the past two decades, a vast amount of sibling history data has been collected; for example, as a part of the Demographic and Health Surveys (DHS) program alone, sibling histories have been collected in more than 150 surveys from dozens of countries around the world (Corsi et al. 2012; Fabic et al. 2012).

However, understanding how to analyze sibling histories has proven to be very challenging. Researchers have long been aware that the method suffers from many possible sources of bias (Gakidou and King 2006; Graham et al. 1989; Masquelier 2013; Reniers et al. 2011; Trussell and Rodriguez 1990). Previous studies have concluded that sibling history estimates can be problematic if (1) there are sibships with no surviving members who could potentially be sampled and interviewed in the survey; (2) more generally, there is a relationship between sibship size and mortality (e.g., larger sibships face higher death rates); and (3) respondents' reports about their siblings are inaccurate (e.g., respondents omit siblings or misreport a sibling's survival status). There has also been confusion about whether the survival status of the respondent should be included in the calculations, given that respondents are always alive (Masquelier 2013; Reniers et al. 2011).

Researchers have worked on addressing these concerns about the sibling survival method in three main ways: they have collected empirical information about possible sources of bias in sibling reports (e.g., Helleringer, Pison, Kanté et al. 2014), they have used microsimulation to illustrate how large certain sources of bias can be under different scenarios (Masquelier 2013), and they have used regression models to pool information from different countries and periods (Gakidou and King 2006; Obermeyer et al. 2010; Timaeus and Jasseh 2004). Together, these studies have produced many important insights about the sibling survival method. However, these insights have not yet brought about a consensus on how sibling histories should be analyzed. Currently, there is partial evidence about many individual sources of possible bias, but there is no way to integrate all of this evidence. Thus, even if we knew the exact size and direction of all the different sources of possible error, we still would not understand how the errors would combine to affect estimated death rates. More generally, little has been proven about the precise conditions under which sibling survival estimates can be expected to have attractive statistical properties, such as consistency or unbiasedness.

In this study, our goal is to help resolve some of the methodological uncertainty about sibling survival. Our analysis is based on the insight that the sibling relation induces a particular type of social network among the members of a population. In this network, two people are connected if they are siblings; thus, estimating death rates from sibling histories can be understood as a problem in network sampling. Starting from the principles of network reporting, we describe how a sibling survival estimator can be mathematically derived. Deriving an estimator from first principles in this way enables us to (1) clarify the precise assumptions that the estimator requires in order to be consistent, unbiased, and efficient; (2) describe how violations of any and all assumptions can combine to affect estimated death rates; (3) identify quantities that could potentially be measured in the future to relax assumptions; and (4) develop internal consistency checks that can be used to assess data and reporting quality in a given sample.

## Setup

Figure 1 illustrates how we understand sibling histories as a network reporting problem. Panel a shows a small population whose members are connected if they are siblings; that is, two nodes are connected if they have the same mother.^{1} Clear nodes are alive, and gray nodes are dead at the time of the survey. Because the sibling relation is transitive, the network is entirely composed of fully connected components, or cliques; each of these cliques is one sibship. Panel b shows one specific sibship, and panel c illustrates the bipartite reporting network that is generated when all the surviving members of that sibship are asked to report about their siblings.^{2} In the bipartite reporting network, each directed edge represents one sibling reporting about another. For example, the edge 13 → 12 indicates that node 13 reports about node 12. See Feehan and Salganik (2016a) for a more detailed description of bipartite reporting networks.

**Fig. 1**

*M*

_{α}, the death rate for a specific group α (for example, α might be all women aged 30–34 in 2018).

*M*

_{α}is defined as

where *D*_{α} is the number of deaths in group α, and *N*_{α} is the person-years of exposure among members of group α. We can develop an estimator for *M*_{α} by separately estimating the numerator and the denominator of Eq. (1); thus, the challenge is to derive sibling history-based estimators for *D*_{α} and *N*_{α}.

Panel c of Figure 1 illustrates that each sibling can potentially be reported as many times as she has living sibship members who are eligible to respond to the survey (Gakidou and King 2006; Masquelier 2013; Sirken 1970). Inferences from sibling reports must somehow account for this fact. Our approach is to distinguish between two groups of people: the first group is people who have no siblings who are eligible to respond to the survey. These people will never appear in the sibling history data: they are *invisible* to the sibling histories. The second group is *visible* people who do have siblings eligible to respond to the survey.

*U*be the population being studied, and let

*F*⊂

*U*be the

*frame population*, which is the set of all people who are eligible to respond to the survey. We define person

*j*∈

*U*’s

*visibility*,

*v*(

*j*,

*F*), to be the number of living siblings who would report person

*j*in a census of

*F.*

^{3}Everyone in the population is either visible (

*v*(

*j*,

*F*) > 0) or invisible (

*v*(

*j*,

*F*) = 0). Thus, we can write the number of deaths in group α as follows:

where $D\alpha V$ is the number of *visible deaths* that could be learned about using sibling reports, and $D\alpha I$ is the number of *invisible deaths* that cannot be learned about using sibling reports. We can define analogous quantities for the denominator $N\alpha \u2009=\u2009N\alpha V+N\alpha I,$ where the $N\alpha V$ is the *visible exposure*, and $N\alpha I$ is the *invisible exposure*. Finally, we define $M\alpha I=D\alpha IN\alpha I$ to be the *invisible death rate*, $M\alpha V=D\alpha VN\alpha V$ to be the *visible death rate*, and $M\alpha =D\alpha I+D\alpha VN\alpha I+N\alpha V$ to be the *total death rate*.

### Adjusting for Visibility in Sibling Reports

where $D^\alpha V$ is an estimator for the number of visible deaths in group α, and $N^\alpha V$ is an estimator for the amount of visible exposure in group α. In estimating these two quantities, we face the challenge that even people who are visible to the sibling histories may still differ in the extent to which they are visible; for example, visible people from larger sibships may tend to have different death rates than visible people from smaller sibships. We address this challenge by introducing statistical estimators that adjust for how visible reported siblings are.

We consider two different approaches to adjusting for differential visibility: *aggregate visibility* estimation and *individual visibility* estimation. These two approaches lead to two different estimators for the visible death rate. In both cases, we start by describing how to derive population-level relationships—that is, relationships that would be observed in a census. Of course, sibling histories are not typically collected in censuses, but starting from this perspective will enable us to derive relationships that can then form the basis for sample-based estimators.

### The Aggregate Visibility Approach

The *aggregate visibility* approach is based on the idea that reports about siblings can first be aggregated, and then the aggregated reports can be adjusted to account for visibility (Bernard et al. 1989; Feehan and Salganik 2016a; Rutstein and Guillermo Rojas 2006). To illustrate this approach, we first focus on reports about visible deaths among siblings, $D\alpha V$. Throughout the paper, we assume that there are no *false positive* reports—that is, we assume that respondents' reports may omit siblings but that they never mistakenly include someone who is not truly a sibling. Of course, this could in fact happen, but our assumption makes the exposition much cleaner, and the results derived in the online appendix consider reporting with false positives.

^{4}Section B.4 of the online appendix shows that if there are no false positive reports, then the total number of reports about sibling deaths divided by the average visibility of visible deaths will be equal to the number of visible deaths:

^{5}

*aggregate visibility estimand*:

Because this approach is based on combining reports about all the siblings and then adjusting for the visibility of these aggregate reports, we call $M^\alpha ,aggV$ an *aggregate visibility* estimator (Bernard et al. 1989; Feehan 2015; Feehan et al. 2017). In the online appendix, section E (Result E.1) formally derives the estimator; the derivation reveals that this approach can be expected to produce essentially unbiased estimates as long as reports about siblings are accurate and as long as there is no relationship between sibship visibility and mortality (i.e., as long as $v\xaf(D\alpha V,F)=v\xaf(N\alpha V,F)$).

### The Individual Visibility Approach

*individual visibility*approach is based on the idea that reports about siblings can be first adjusted for visibility and then aggregated (Feehan 2015; Gakidou and King 2006; Lavallee 2007; Sirken 1970). To illustrate this approach, consider reports about a specific deceased sibling

*j*∈

*D*

_{α}that are made in a census of the frame population. Let

*y*(

*F*,

*j*) be the number of times that people in the frame population

*F*report the deceased sibling

*j*. This quantity,

*y*(

*F*,

*j*), will be equal to the visibility of

*j*to

*F*,

*v*(

*j*,

*F*),as long as there are no false positive reports. Thus, for every visible sibling

*j*, we have

*i*who is in the same sibship as

*j*. Using this relationship, Eq. (9) can be rewritten as

where $y(i,F)$ is *i*'s reported number of siblings on the sampling frame. Equation (10) relates the population-level number of visible deaths $D\alpha V$ to survey respondents' reports about deaths in their sibships, $y(i,D\alpha )$, and survey respondents' reports about the number of frame population members in their sibships, $y(i,F)$.

*i*'s reported number of siblings on the sampling frame who contributed exposure; and $y(i,N\alpha \u2212F)$ is

*i*'s reported number of siblings not on the sampling frame who contributed exposure. Combining Eqs. (10) and (11), we have the population-level

*individual visibility estimand*:

where *i* indexes survey respondents in the probability sample *s*, and *w _{i}* is

*i*'s sampling weight. Because this approach is based on adjusting for the visibility of each individual reported sibling, we call it

*individual visibility estimation*. Section F of the online appendix (Result F.1) formally derives the estimator in Eq. (13), including the precise conditions required for it to provide consistent and essentially unbiased estimates of the visible death rate.

### Relationship to Previous Work

To the best of our knowledge, our study is the first to derive the aggregate visibility estimator from first principles. However, the estimator itself is not new: the aggregate visibility estimator is probably the most common approach to estimating death rates from sibling history data. For example, Eq. (7) is the estimator used to produce age-specific adult death rate estimates in all DHS reports (Rutstein and Guillermo Rojas 2006). The estimator appears to have been first proposed in Rutenberg and Sullivan (1991), and it has since been the subject of several methodological analyses, including Masquelier (2013), Gakidou and King (2006), Hill et al. (2006), Timaeus and Jasseh (2004), Stanton et al. (2000), and Garenne et al. (1997). By focusing on the networked structure of sibling relations, our derivation reveals that the aggregate visibility estimator is related to other network estimation approaches, including the network scale-up method (Bernard et al. 1989), and the network survival estimator (Feehan et al. 2017).

The individual visibility estimator has its origins in multiplicity sampling (Feehan 2015; Lavallee 2007; Sirken 1970). In the context of sibling survival, an estimator similar to the one derived here was introduced by Gakidou and King (2006) and then further discussed by Obermeyer et al. (2010) and Masquelier (2013). The actual individual estimator in Eq. (13) is somewhat different from the one Gakidou and King (2006) proposed, but both are motivated by the idea that observed information can be used to adjust for visibility at the level of individual reports. We prove that the individual estimator introduced here is correct, and we provide code that other researchers can use to implement it.

### Framework for Sensitivity Analysis

The individual and aggregate visibility estimators rely on several conditions to guarantee that they will produce consistent and essentially unbiased estimates of the death rate. These conditions make precise long-standing concerns researchers have had about sibling survival estimates. For example, researchers have often worried that inaccurate reports about siblings may lead to biased death rate estimates; our results reveal exactly how reports about siblings must be accurate in order to produce consistent and essentially unbiased estimates of death rates. They also reveal precise quantities that could potentially be measured to adjust sibling reports to account for reporting errors.

Sections E and F of the online appendix contain detailed derivations of the sensitivity frameworks for both the individual and aggregate visibility estimators; here, we present and discuss the results of that analysis. The simulation study in section J of the online appendix empirically illustrates the sensitivity frameworks and confirms their correctness.

### Sensitivity of the Aggregate Visibility

Equation (14) shows that the visible death rate can be decomposed into the product of the aggregate visibility estimand and several adjustment factors. When all these adjustment factors are equal to 1, the aggregate visibility estimand is equal to the total death rate.^{6}

The first group of adjustment factors—called the *visibility ratio*—describes how a relationship between visibility and mortality would affect estimated death rates. It is the ratio of the average visibility of all siblings who contribute to exposure ($d\xafV(N\alpha ,F)$)to the average visibility of siblings who die ($d\xafV(D\alpha ,F)$).^{7} When there is no relationship between these two quantities, the visibility ratio will be 1. When, say, siblings who have died tend to be in less visible sibships than siblings overall, then this factor will tend to be greater than 1 (meaning that the death rate will be underestimated).

The next group of adjustment factors captures the extent to which reports about siblings are accurate. It is the ratio of a quantity that captures the net accuracy of reports about exposure (γ (*F*, *N*_{α})) to a quantity that captures the net accuracy of reports about deaths (γ (*F*, *D*_{α})). Two particularly salient findings emerge from the derivation of this group of adjustment factors (online appendix, section E): first, the estimator requires that reporting be accurate *in aggregate* but not necessarily at the individual level; as long as reporting errors across individuals cancel out, estimates will not be affected. Second, this group of adjustment factors shows that aggregate visibility estimates will not be affected if reporting errors about deaths and reporting errors about exposure balance out. In other words, if respondents tend to, say, omit older siblings at a constant rate, independent of the survival status of older siblings (γ (*F*, *D*_{α}) = γ (*F*, *N*_{α}) < 1), then Eq. (14) reveals that death rates can still be accurate because reporting errors about deaths and about exposure will cancel out. Thus, Eq. (14) shows that the death rate estimator can be robust to situations in which respondents' reports are imperfect, but imperfect in similar ways for siblings who die and siblings who survive.

Finally, the last group captures the conditions needed to be able to use only information about visible deaths to estimate the total death rate. This group depends on two quantities: $pN\alpha I$, the amount of exposure that is invisible; and $K\u2009=\u2009M\alpha I/M\alpha V$, an index for how different the invisible and visible death rates are. Later in the paper, we will use an empirical example to illustrate this group of adjustment factors in more depth.

### Sensitivity of the Individual Visibility Estimator

Equation (15) again decomposes the population death rate into the product of the individual visibility estimand and several groups of adjustment factors. The main insights from Eq. (14) also apply to the individual visibility expression in Eq. (15). However, a few differences between the two frameworks are noteworthy. First, Eq. (15) does not include any adjustment factors related to a visibility ratio. This is an advantage of the individual visibility estimator: it does not need to make any assumptions about the absence of a relationship between visibility and mortality (within the visible population). Second, the individual visibility expression in Eq. (15) has a more complex set of adjustment factors that capture reporting accuracy. This more complex expression describes the extent to which reporting errors are correlated with reports about deaths and reports about exposure; for example, if reporting tends to omit deaths in sibships that have more deaths, then *K _{D}* > 1. More generally, even if deaths and exposure are underreported at the same average rate, such that the average reporting adjustment factor is equal to 1, problems can still arise if the reporting errors differ in their correlation with the number of sibship deaths and exposure. In general, reporting errors appear to be more complex under the individual visibility estimator; we discuss the implications of this later.

### Simulation Study

To confirm the correctness of the sensitivity results, we conduct a simulation study. We first simulate a large population whose members are linked in sibships. We base the structure of these sibships and the age and sex composition of the population on sibling reports from the 2000 Malawi DHS. Next, we simulate deaths by age and sex in the population, using model-based estimates for adult death rates in Malawi in 2000–2005 as the “true” underlying mortality (United Nations Population Division 2020). We simulate sibling history interviews with each surviving member of the population under four sets of assumptions about reporting errors: perfect reporting, imperfect reporting about deaths (only 80% of deaths are reported, τ_{D } = .8), imperfect reporting about exposure (only 80% of exposure is reported, τ_{N }= .8), and imperfect reporting about deaths and exposure (τ_{D }= .8 and τ_{N} = .8). These four sets of assumptions lead to four “censuses,” one for each set of reporting assumptions. Because our population is simulated, the exact death rate and visible death rate can be calculated by age and sex. For each set of reporting assumptions, we can also calculate the exact estimand for each of the four estimators we consider (aggregate and individual visibility, including and not including the respondent). Finally, we can calculate all the adjustment factors that appear in our sensitivity framework. If our sensitivity framework is correct, then the estimands that have been adjusted using the sensitivity framework should be equal to the underlying death rate.

For brevity, we focus here on the results for the aggregate visibility estimator not including the respondent; results for all four estimators can be found in the online appendix, section J. Figure 2 compares the true visible death rates (*x*-axis) with the adjusted and unadjusted aggregate death rate estimands (*y*-axis). Two important features emerge from Figure 2: first, all the adjusted estimands lie on the diagonal *y * = *x* line, confirming the correctness of the sensitivity framework for the aggregate estimator (Eq. (14)). Second, by comparing the unadjusted estimates across the four reporting scenarios, it is clear that the unadjusted estimands for the scenario in which τ_{D} = .8 and τ_{N} = .8 (top-left panel) are nearly as accurate as the scenario in which τ_{D }= 1 and τ_{N } = 1 (bottom-right panel). Intuitively, this suggests that in some cases, imperfect reporting may not be very problematic for the aggregate sibling survival estimator, as long as the imperfect reporting is similar for deaths and for exposure; in that case, Figure 2 shows that the reporting errors can cancel out (confirming the intuition from Eq. (14)).

**Fig. 2**

The online appendix (section J) describes our simulation setup in greater detail and provides results confirming that the sensitivity framework is correct for all four estimators.

## Empirical Illustration

We motivate our technical results with an empirical example: the sibling history data from the 2000 Malawi DHS (Malawi National Statistical Office and ORC Macro 2001). We use this example for two reasons. First, we wanted the empirical example to be a DHS because these surveys are the largest available source of sibling history data; as of this writing, more than 150 DHS surveys in 60 countries have collected sibling histories over a period of about 30 years. Second, among DHSs, the 2000 Malawi DHS is preferable because it has low missingness in sibling reports and because its sample size of 13,220 women is close to average.^{8} Our analysis of the 2000 Malawi DHS sibling histories uses the *siblingsurvival* R package, which we created as a companion to this paper.^{9}

Figure 3 shows estimated death rates and confidence intervals for male and female death rates in Malawi over the seven-year period before interviews were conducted. For males, the individual and aggregate visibility estimates are qualitatively quite similar. For females, however, aggregate visibility estimates are systematically higher than individual visibility estimates, and these differences are larger than the estimated sampling variation. We discuss a possible explanation for this difference in more detail in the online appendix, section G.3.

**Fig. 3**

### Variance Estimation

Researchers need to be able to estimate the sampling uncertainty of death rate estimates, which can be calculated using Eqs. (7) and (13). We recommend that researchers use a resampling approach called the *rescaled bootstrap* to do so (Rao and Wu 1988; Rao et al. 1992). The rescaled bootstrap is appealing because (1) it accounts for the complex sampling design that is typical of surveys such as the DHS, (2) it enables researchers to use a single approach to estimating the sampling uncertainty for death rates and for other quantities (such as the internal consistency checks we introduce later), and (3) it has been successfully applied to other network reporting studies (e.g., Feehan et al. 2017). The *siblingsurvival* package uses the rescaled bootstrap to provide estimated sampling uncertainty for death rate estimates.

An alternate approach to estimating sampling uncertainty is to derive a mathematical expression that relates sampling variation to a function of study design parameters and population characteristics that are known or that can be approximated; this approach is discussed in section I of the online appendix, where we illustrate how linearization can be used to derive an approximate variance estimator for estimated death rates. There are also alternate resampling approaches to estimating sampling variance from survey data. For example, the Malawi DHS report uses a jackknife method to estimate sampling uncertainty for some quantities (but not adult death rates)^{10} (Malawi National Statistical Office and ORC Macro 2001). In practice, we expect the bootstrap approach discussed here to be most useful in empirical analyses, but future work could explore this topic in greater depth.

Figure 3 shows confidence intervals for estimated male and female death rates in Malawi over the seven-year period before interviews were conducted. To compare the amount of estimated sampling uncertainty for the aggregate and individual visibility estimates in Figure 3, we define the relative standard error of an estimate of $M^\alpha V$ to be $SE^(M^\alpha V)/M^\alpha V$, where $SE^(M^\alpha V)$ is the rescaled bootstrap-estimated standard error. We then calculate the average of these relative standard errors across all age and sex groups for each estimator. The results suggest that the individual visibility estimator has slightly larger sampling variance than the aggregate visibility estimator (Table 1). This empirical finding is consistent with simulation results discussed in the online appendix, section J.

**Table 1**

. | Estimated Average Relative Standard Error . | |
---|---|---|

Estimator | Females | Males |

Aggregate Visibility | 0.09 | 0.093 |

Individual Visibility | 0.11 | 0.108 |

. | Estimated Average Relative Standard Error . | |
---|---|---|

Estimator | Females | Males |

Aggregate Visibility | 0.09 | 0.093 |

Individual Visibility | 0.11 | 0.108 |

### Applying the Sensitivity Framework

#### Sensitivity to Invisible Deaths

Equation (14) expresses the difference between the visible and total death rate in terms of two parameters: *K*, an index for how different the visible and invisible death rates are; and $pN\alpha I$, the proportion of exposure that is invisible. Sibling history data cannot be used to determine the fraction of exposure that is invisible, $pN\alpha I$. However, we can try to approximate this quantity by taking advantage of the fact that we have a random sample of the frame population that is currently alive. We can use the sample to estimate what fraction of respondents would be invisible to sibling histories at the time of the survey—that is, to estimate the fraction of survey respondents who have no siblings in the sampling frame.

Panel a of Figure 4 shows the estimated fraction of respondents to the 2000 Malawi DHS who would not be visible to sibling histories. As our derivations reveal, it is this visibility—and not sibship size *per se*—that matters for estimating death rates. Panel a gives an approximate sense for the type of values that we might expect to see for $pN\alpha I$ by age. The share of respondents who are invisible has a U-shaped relationship with age, reaching its highest levels among the youngest and oldest survey respondents. This relationship is likely due to the definition of the frame population, which includes women aged 15–49; age groups close to the boundaries of the frame population will tend to have siblings who are too old or too young to be included as respondents, reducing the visibility of these ages. At worst, about 40% of women at ages 45–49 would be invisible to sibling histories; at best, only about 15% of women ages 30–34 would be invisible to sibling histories.

**Fig. 4**

What difference would this range of invisibility make to death rate estimates? The sensitivity relationship in Eq. (14) reveals that the answer relies on understanding how different death rates are in the invisible and visible populations. Panel b of Figure 4 illustrates by showing the relative error that would result from (1) a range of differences between invisible and visible death rates, from 20% higher death rates in the invisible population to 20% lower death rates in the invisible population (*K* parameter, shown on the *y*-axis); and (2) a range of different proportions of exposure that is invisible, from 15% of exposure invisible to 30% of exposure invisible ($pN\alpha I$, shown on the *x*-axis). Even relatively large values for the two parameters appear to result in modest relative errors.

#### Reporting Errors

Researchers have long been aware that reporting errors may affect the quality of sibling survival estimates. We will illustrate two ways to assess the sensitivity of aggregate visibility estimates to reporting errors. First, we will use Eq. (14) to assess how much different levels of reporting error affect death rate estimates. Second, we will illustrate how our network reporting framework leads to data quality checks that can be performed on sibling history data.

#### Analyzing the Impact of Reporting Errors Using the Sensitivity Framework

We will illustrate the sensitivity framework by focusing on aggregate visibility estimates for brevity. The sensitivity framework in Eq. (14) shows that reporting errors will affect the accuracy of aggregate visibility estimates through the ratio of two parameters: $\gamma (F,N\alpha )$ and $\gamma (F,D\alpha )$. Importantly, in principle, it is possible to design a study that could measure $\gamma (F,N\alpha )$ and $\gamma (F,D\alpha )$. In fact, some promising research on the sibling method to date has compared sibling reports to ground-truth information at a demographic surveillance site in southeastern Senegal (e.g., Helleringer, Pison, Kanté et al. 2014). Studies such as that by Helleringer, Pison, Kanté et al. (2014) were designed to estimate somewhat different reporting parameters from $\gamma (F,D\alpha )$ and $\gamma (F,N\alpha )$; thus, currently no direct evidence about these parameters is available. However, to illustrate our sensitivity framework, we can base some back-of-the-envelope calculations on the data reported by Helleringer and colleagues. Suppose that respondents never mistakenly count nonsiblings as siblings but that, on average, about 4% of living siblings in group $\alpha $ are omitted from respondents' reports and about 9% of dead siblings in group $\alpha $ are omitted from respondents' reports. Then $\gamma (F,N\alpha )$ ≈ 0.96 and $\gamma (F,D\alpha )$ ≈ 0.91. If these approximations were correct, Eq. (14) shows that the estimated death rates should be multiplied by about 0.95 / 0.91 = 1.05 to adjust for reporting errors. In other words, under this scenario, the unadjusted aggregate visibility estimator produces estimates that have a relative error of about 5% because of imperfect reporting.^{11}

#### Internal Consistency Checks

A second strategy for assessing sensitivity to reporting errors is to perform data quality checks on sibling history data (e.g., Garenne and Friedberg 1997; Helleringer, Pison, Masquelier et al. 2014; Masquelier and Dutreuilh 2014; Rutenberg and Sullivan 1991; Stanton et al. 2000). We now show how our reporting framework enables us to introduce several new internal consistency checks that can be used to assess the accuracy of sibling reports. The idea is to use the network reporting framework to identify several quantities that can be estimated in two different ways using independent subsets of the sibling history data (e.g., Feehan and Cobb 2019). If reporting is highly accurate, then we expect these independent estimates to agree; when these independent estimates are very different, that suggests that there may be considerable amounts of reporting error.

The quantity on the left side of Eq. (16) can be estimated from the survey respondents aged 30, and the quantity on the right side of Eq. (16) can be estimated from all of the survey respondents who are not aged 30. These two estimates can then be compared; if reporting is accurate, then we expect the estimates to agree.

*F*but not aged $\alpha $; similarly, we write $y(F\u2212\alpha ,\u2009F\alpha )$, for the total reported connections from respondents not aged $\alpha $ to siblings who are in

*F*and who are aged $\alpha $. In theory, these are the same quantity. From sibling history data, we can independently estimate $y(F\alpha ,\u2009F\u2212\alpha )$ and $y(F\u2212\alpha ,\u2009F\alpha )$ and then determine the similarity of these two independent estimates of the same quantity by calculating $\Delta \alpha $:

When the two estimates agree, $\Delta \alpha $ is close to 0. If there is considerable reporting error, then $\Delta \alpha $ can be very different from 0.

Figure 5 illustrates this idea by showing internal consistency checks for each age from 15 to 49 from the 2000 Malawi DHS sibling histories (Malawi National Statistical Office and ORC Macro 2001). Each point shows the difference between two independent estimates for the same quantity. If these independent estimates agreed perfectly, they would all lie on the horizontal *y* = 0 line. The confidence intervals capture estimated sampling variation. Most of the confidence intervals include 0, suggesting that reports are internally consistent. However, the figure also suggests some misreporting, particularly between ages 20 and 25.

**Fig. 5**

In general, we expect that such plots or quantitative summaries of internal consistency checks will be a useful way for researchers to assess the face validity of sibling history data. As we describe later, these internal consistency relationships could also form the basis for developing model-based approaches to analyzing sibling histories.

## Recommendations for Practice

### Should the Respondent Be Included in the Denominator?

The sibling survival literature has debated whether or not respondents should be included in the sibling reports. The concern is that respondents are, by definition, alive: thus, it seems possible that including respondents will bias estimated death rates downward (Masquelier 2013; Trussell and Rodriguez 1990). The estimates published in DHS reports, which use the aggregate visibility estimator, do not include the respondent. Gakidou and King (2006) argued that respondents should be included in sibling reports, but Masquelier (2013) disputed this notion.

In our framework, deciding whether to include respondents in the denominator of the estimator amounts to *defining* the visible and invisible populations. Because sibling history methods estimate a visible death rate, the appropriate question is, Which visible population's death rate is more likely to be a good estimate for the total population death rate, $M\alpha $?

where *C* is the number of people who contribute exposure and who have visibility of exactly 1 when respondents are included in the denominator; that is, *C* is the number of people in the population who are in group $\alpha $, are eligible to respond to the survey, and who have no siblings who would be eligible to respond to the survey. *C* will tend to be bigger, inducing a bigger difference between $M\u2032\alpha V$ and $M\alpha V$, when (1) there is more overlap between the group $\alpha $ and the frame population; and (2) visibilities tend to be smaller, meaning that more people have a visibility exactly equal to 1 when respondents are included in the denominator.

Equation (17) reveals that the decision to include respondents in reports will move some people into the denominator, but it can never move anyone into the numerator.

A model developed in section G.2 of the online appendix shows that, in a simple situation in which everyone has the same probability of dying, it is most natural to exclude respondents from sibling reports. Under the model, when respondents are excluded, the invisible and visible populations have the same death rate, and that death rate can be estimated from sibling reports. Including respondents, on the other hand, induces a difference between the death rate in the visible and invisible populations (even though every individual faces the same probability of death). Thus, our model suggests that excluding respondents from reports is preferable, at least in the simple world it describes. This conclusion agrees with the earlier modeling work of Trussell and Rodriguez (1990), which also argued that respondents should be excluded from reports.

However, these are suggestive results: without additional information, there is no way to be certain that $M\alpha V$ or $M\u2032\alpha V$ will produce a better approximation to the population death rate, $M\alpha $, in a given population. The estimates presented here exclude respondents from reports. The full derivations of all of our estimators in sections F and E of the online appendix cover both including and not including the respondent in the denominator. Researchers who wish to include respondents in the denominator of the death rate can find the appropriate estimators there.

### Aggregate Versus Individual Visibility Estimator

Our analysis suggests that both the individual and aggregate visibility estimators have strengths and limitations. Aggregate visibility estimates are typically based on the assumption that the visibility of deaths and the visibility of exposure are equal. The individual visibility estimator avoids this assumption altogether; thus, when no information about adjustment factors is available, we recommend using the individual visibility estimator. In section G.3 of the online appendix, we analyze the difference in aggregate and individual visibility estimates for Malawi, and we show that it is likely that this visibility assumption explains most of the difference between death rate estimates for females in Figure 3.

Table 1 and the simulation study in section J of the online appendix, however, suggest that the individual visibility estimator has slightly higher sampling variance than the aggregate visibility estimator. We hope that future research will continue to systematically compare the aggregate and individual visibility estimators; in the meantime, our view is that the individual visibility estimator's relatively small loss in precision is a price worth paying to avoid having to make assumptions about the visibility of deaths and exposure.

Another disadvantage of the individual visibility estimator comes from comparing the aggregate sensitivity framework (Eq. (14)) with the individual sensitivity framework (Eq. (15)). This comparison reveals that the quantities that would be needed to adjust the individual visibility estimator for reporting error are much more complex than the analogous quantities needed to adjust the aggregate visibility estimator. Thus, if data that can be used to estimate adjustment factors become more widely available, then we expect the relative appeal of the aggregate visibility estimator to increase.

To recap, we recommend excluding respondents from reports. We also recommend using the individual visibility estimator in the absence of any empirical estimates for adjustment factors. However, we expect empirical information about adjustment factors to be much easier to collect for the aggregate visibility estimator. Thus, as empirical information about adjustment factors becomes available, we expect the aggregate visibility estimator to be more attractive. In all cases, we recommend that researchers who produce sibling history-based estimates use the sensitivity frameworks to assess how estimated death rates are affected by the assumptions used to produce them.

## Discussion and Conclusion

We showed how sibling history data can be understood as a type of network reporting. We explained how to derive network-based estimators for adult death rates, how to devise internal consistency checks, and how to understand how sensitive death rate estimates can be to the different conditions on which the estimators rely. We illustrated with an empirical example, based on our freely available R package *siblingsurvival*, and we outlined several recommendations for practitioners who wish to estimate death rates from sibling histories.

We see several important avenues for future research. Methodologically, a deeper comparison between the individual and aggregate visibility estimators would be useful. In particular, analytic results could help better explain our empirical finding that the individual visibility estimator has somewhat higher sampling variance than the aggregate visibility estimator. This analysis could also produce insights that might be useful for designing future data collection.

Our results also suggest next steps for developing models for death rate estimates based on sibling histories. This paper focused on design-based estimators for death rates using sibling histories. Future research can use these design-based estimators as the starting point for developing model-based estimators. For example, the internal consistency checks that we discussed could form the basis for model-based adjustments of sibling reports (see McCormick et al. (2010) for a similar approach that has been developed in the context of aggregate relational data). Our framework also offers a natural way to think about how to pool information across countries and periods.

Collecting more information on sibling reports from settings where gold-standard adult death rates are available is crucial; the Helleringer, Pison, Kanté et al. (2014) study offers a useful template for the type of study design that could help produce more information. This type of study investigates the properties of sibling history reports in small areas where a gold-standard underlying truth about adult death rates is available. Data collected in this way can produce the information needed to estimate the adjustment factors in the individual and aggregate visibility sensitivity relationships. Combined with the framework introduced here, estimates for these adjustment factors could provide a principled way to adjust national-level sibling survival estimates from surveys such as the DHS, relaxing the conditions required for the estimates to be accurate.

Our analysis focused on death rate estimates for the period immediately preceding the survey. In principle, the estimators discussed here could also be used for more distant periods. However, the assumptions—although mathematically the same—presumably get stronger further into the past. Future work could investigate this topic in more depth.

Our framework can also be applied to other demographic estimation techniques related to sibling survival, such as methods in which people report about their parents, spouses, or children (Moultrie et al. 2014; United Nations Population Division 1983). For example, in settings with complex kinship structures that arise from social phenomena, such as polygamy or remarriage, our framework would enable exploring definitions of sibship that go beyond those people born to the same mother as the survey respondent. More generally, ideas from the sibling survival literature can be used to develop new methods for collecting data that have the potential to overcome some of the limitations of sibling histories. Feehan et al. (2017) explored how reports about two social network relationships could form the basis for death rate estimates at adult ages (see also Feehan et al. 2016). Future research could continue to explore how to collect reports about more general types of relations, such as broader kin relationships or other types of social networks, with the goal of producing information that is timely and accurate enough to estimate adult death rates. Future research could also explore how sibling reports might be used to estimate quantities other than adult mortality.

## Acknowledgments

For helpful feedback on earlier versions of the manuscript, the authors thank the participants in the 2018 Formal Demography Workshop at UC Berkeley, participants in the 2018 annual meeting of the Population Association of America session “Social capital and older adults in developing countries,” and Stephane Helleringer. The authors also thank the Berkeley Population Center, which is funded by the Eunice Kennedy Shriver National Institute of Child Health and Development (P2C HD 073964) and the Berkeley Center for the Economics and Demography of Aging, which is funded by the National Institute on Aging (5P30AG012839).

## Notes

^{1}

Respondents are typically asked to consider siblings to be all children born to their mother.

^{2}

The dead person (in gray) cannot be interviewed and thus is not shown on the left side of the bipartite reporting network.

^{3}

The idea behind the notation *v*(⋅, ⋅) is that the first argument is whoever is being reported about, and the second argument is the set of people who make reports; thus, *v*(*j*, *F*) is the number of times person *j* is reported about in a census of members of the frame population, *F*. When we add a bar, we mean the average taken with respect to the first argument. That is, $v\xaf(A,F)$ is $v(A,F)\u2009/\u2009|A|$, the average number of times a member of *A* is reported about by *F*.

^{4}

The idea behind the notation *y*(⋅, ⋅) is that the first argument is the set of people reporting, and the second argument is the set of people who are being reported about; thus, $y(F,D\alpha V)$ is the total number of deaths in $D\alpha V$ reported by people in a census of the frame population *F*.

^{5}

To avoid overcomplicating notation, we use $D\alpha V$ to indicate both the number of visible deaths in group α and the set of visible deaths in group α.

^{6}

More generally, if the product of these adjustment factors is 1, the aggregate visibility estimand will be the total death rate. This means that the conditions on which the estimator relies are sufficient but not necessary.

^{7}

In the visibility ratio, $d\xaf(A,B)$ refers to the true average number of sibship connections between the average member of group *A* and group *B*; $d\xaf(\u22c5,\u2009\u2009\u22c5)$ can differ from $v\xaf(\u22c5,\u2009\u2009\u22c5)$ because $v\xaf(\u22c5,\u2009\u2009\u22c5)$ could be affected by reporting errors. These reporting errors are accounted for in the reporting accuracy factor of the visibility framework. See the online appendix, section E, for details.

^{8}

The average DHS survey that included the sibling history module interviewed 14,224 women. (In DHS surveys, the sibling history module is typically asked only of women.)

^{9}

The *siblingsurvival* package is open source and freely available: https://www.github.com/dfeehan/siblingsurvival

^{10}

The Malawi DHS report does not present estimated sampling errors for adult death rate estimates.

^{11}

We do not recommend that this value derived from a back-of-the-envelope calculation be used in practice. We offer it as an illustration how our framework might be used when measurements of $\gamma (F,D\alpha )$ and $\gamma (F,N\alpha )$ are available.

## References

*Proceedings of the Demographic and Health Surveys World Conference*(Vol.