Estimates of adult mortality in countries with limited vital registration (e.g., sub-Saharan Africa) are often derived from information about the survival of a respondent’s siblings. We evaluated the completeness and accuracy of such data through a record linkage study conducted in Bandafassi, located in southeastern Senegal. We linked at the individual level retrospective siblings’ survival histories (SSH) reported by female respondents (n = 268) to prospective mortality data and genealogies collected through a health and demographic surveillance system (HDSS). Respondents often reported inaccurate lists of siblings. Additions to these lists were uncommon, but omissions were frequent: respondents omitted 3.8 % of their live sisters, 9.1 % of their deceased sisters, and 16.6 % of their sisters who had migrated out of the DSS area. Respondents underestimated the age at death of the siblings they reported during the interview, particularly among siblings who had died at older ages (≥45 years). Restricting SSH data to person-years and events having occurred during a recent reference period reduced list errors but not age and date errors. Overall, SSH data led to a 20 % underestimate of 45q15 relative to HDSS data. Our study suggests new quality improvement strategies for SSH data and demonstrates the potential use of HDSS data for the validation of “unconventional” demographic techniques.
With child mortality declining (Hill et al. 2012; Lozano et al. 2011), an increasing proportion of deaths in low- and middle-income countries (LMICs) will occur at adult ages in the next decades (Mathers and Loncar 2006). In LMICs, adult mortality rates may be high because of HIV/AIDS (Feeney 2001; Stover 2004), maternal mortality (Hogan et al. 2010), injuries (Garrib et al. 2011; Lagarde 2007), noncommunicable diseases (Duthé and Pison 2008; Mathers et al. 2009), or war (Obermeyer et al. 2008). Adult deaths threaten the livelihood of entire families (Yamano and Jayne 2004), prevent the schooling and development of children (Case and Ardington 2006; Evans and Miguel 2007), limit support to the elderly (Ainsworth and Dayton 2003; Kautz et al. 2010), and precipitate the closure of businesses (Chao et al. 2007). Each year, a significant proportion of all development assistance for health is thus allocated to interventions aimed at preventing premature deaths in adulthood (e.g., Goosby 2012). The ultimate impact of these interventions remains largely unknown, however, because few of the target countries have vital registration (VR) systems that permit measuring adult mortality accurately (Setel et al. 2007).
In countries with limited VR, estimates of adult mortality produced by international organizations are often extrapolated from child mortality rates using model life tables (e.g., Murray et al. 2003; United Nations 2010). Other approaches to obtaining national estimates of adult mortality consist of “unconventional” techniques (Hill et al. 2005), which use information on a respondent’s close relatives collected during censuses and surveys (Graham et al. 1989; Timaeus 1991b). Such data include questions on the number of deaths in a respondent’s household over the previous 12 months (Hill et al. 2006) and the survival of a respondent’s spouse, parents, or siblings (Timaeus 1991a, b). Collecting siblings’ survival histories (SSH), in particular, is becoming increasingly popular. Demographic and Health Surveys (DHS) conducted in sub-Saharan and Asian countries now frequently include SSH modules, as did the WHO World Health Survey (Reniers et al. 2011).
In SSH, respondents are asked to list all siblings born to their biological mother by birth order and then to provide information about each sibling’s sex, survival status, and current age or age at death, as well as time elapsed since death if the sibling is deceased. These questions produce a “siblings’ data set,” in which adult mortality rates can be estimated directly because dates of birth and (possibly) death can be calculated for each reported sibling.1
Sample Selection Bias and Reporting Errors in Siblings’ Survival Histories
SSH are relatively inexpensive to collect, but they suffer from several potential drawbacks. First, because adult deaths are rare events compared with births or child deaths,2 SSH may still yield sample sizes that are too small to produce reliable estimates (Hill et al. 2006; Timaeus and Jasseh 2004).
Second, SSH are prone to sample selection bias (Trussell and Rodriguez 1990) because SSH surveys are based on random samples of individuals who have survived until the survey date rather than on random samples of individuals who were at risk of dying during the n years prior to the survey (Gakidou and King 2006). If adult survival is positively correlated with the size of one’s sibship, then members of high-mortality sibships (e.g., those with zero survivors) will be underrepresented in SSH and adult mortality rates will be underestimated (Trussell and Rodriguez 1990). Analytical corrections for sample selection bias have been proposed (Gakidou and King 2006), but they may artificially inflate estimates of 45q15 (Masquelier 2013).
Third, SSH estimates may be affected by reporting errors. These occur when SSH reported by survey respondents differ from the “true” survival experience of their maternal siblings. Demographers have generally equated reporting errors in SSH with omissions of deceased siblings by survey respondents (e.g., Obermeyer et al. 2010), but reporting errors may be more complex. We distinguish four main types of reporting errors.
List errors occur when the reported list of maternal siblings differs from the true sibship of a respondent. Respondents may not list all their true maternal siblings (omission) or may include others who are not their true maternal siblings (addition). Omissions may happen, for example, when a respondent has not known one of her siblings, when s/he has not seen one of her siblings for an extended period of time (possibly because of migration), or when a sibling died long ago. Omissions may also happen because of interviewer behavior. SSH are highly repetitive questionnaires, with the same set of 5–10 questions being asked about each of sometimes more than 10 siblings. Thus, to save time, interviewers may be inclined to omit a subset of the respondent’s siblings. Additions to siblings’ lists may occur in areas where extended families coreside or where child fostering and orphanhood are widespread. In such settings, respondents are likely to have paternal half-siblings or cousins, which they may erroneously include in SSH.
Vital status errors occur when a respondent reports that one of her live siblings is dead at the time of the SSH survey, or when she reports that a deceased sibling is alive at the time of the survey. This may happen, for example, if a respondent has not been in touch with one of her sibling(s) for some time or may not be aware of health conditions or accidents having affected that sibling.
Age errors refer to respondents’ inaccurate reporting of the current age or age at death of one of their true maternal siblings.
Date errors refer to respondents’ inaccurate reporting of the year in which their sibling(s) died. They are relevant in SSH analysis because analysts often restrict the siblings’ data set to reports of deaths and person-years occurring during a reference period of a few years before the survey (e.g., 6–8 years). This practice has been adopted because SSH data are believed to be more complete in that reference period (Timaeus and Jasseh 2004). Date errors are important because data on dates of siblings’ death have been used to investigate time trends in adult mortality (de Walque and Filmer 2013), estimate the effect of short-term mortality crises (e.g., de Walque and Verwimp 2010), and evaluate the benefits of public programs (Bendavid et al. 2012).
Potential Effects of Reporting Errors on Adult Mortality Estimates
Reporting errors have potentially complex effects on SSH estimates of adult mortality. List errors affect estimates when the likelihood of omission/addition depends on the vital status of the sibling (Timaeus and Jasseh 2004). For example, if respondents are more likely to omit their siblings who died at adult ages than their live siblings, adult mortality rates will be underestimated. The direction of bias resulting from vital status errors depends on whether respondents are more likely to report deceased siblings as alive, or vice versa. Compared with list errors, however, vital status errors likely have a disproportionate impact on mortality estimates because they affect only the numerator of mortality rates.3
Errors about the current age of live siblings may affect the denominator of mortality rates. For example, if respondents underestimate the current ages of their siblings, then fewer person-years lived among these siblings will be counted, and (all else being equal) adult mortality rates will be overestimated.
The effects of date and age errors about a sibling’s death depend on (1) the magnitude of these errors, and (2) the width of the age interval and the length of the reference period used in calculating adult mortality rates. We illustrate this point for calculations of 45M15 over the eight years prior to the survey with the Lexis diagram in Fig. 1. Sibling deaths can be misplaced along the vertical axis (age error; e.g., D1 and D2), along the horizontal axis (date error; e.g., D3 and D4), or along both axes (e.g., D5) of the Lexis diagram. When misreported deaths remain within the reference area of the diagram (e.g., D1SSH,1 and D3SSH,1), age and date errors affect the count of person-years lived by deceased siblings (i.e., the denominator of 45M15) but not the number of events (i.e., the numerator of 45M15). In Fig. 1, for example, SSH data about D1 and D3 yield fewer person-years than expected but not fewer deaths. On the other hand, when misreported deaths are shifted outside the reference area (e.g., D1SSH,2 and D3SSH,2), age and date errors also affect the numerator of adult mortality rates. Deaths, which are shifted outside the reference area by age and/or date errors, can be “replaced” by misreporting of more-distant deaths (e.g., D4, D5) or deaths having occurred outside of the adult age range (e.g., D2, D5).
Current Diagnostics of Reporting Errors in Siblings’ Survival Histories
To detect reporting errors, demographers perform internal consistency checks on siblings’ data sets (Stanton et al. 2000). These verify that reported SSH data replicate trends in fertility and mortality known from other sources. For example, because the number of siblings of a respondent corresponds to her mother’s lifetime fertility, older respondents should report larger sibships than younger respondents in settings where fertility has been declining. Checks also include testing for imbalances in the sex ratio at birth (SRB) in reported sibships. In the 2010 Senegal DHS, the SRB was 108, thus suggesting a possible underreporting of sisters (ANSD and ICF 2012).
Other diagnostics have consisted of external comparisons between a test SSH data set—that is, whose quality is to be evaluated—and an independent reference data set. SSH data have been compared with model life tables, VR, census data on household deaths in the past 12 months, and orphanhood data (e.g., Feeney 2001). Most recently, several studies have compared the reporting of deaths in consecutive SSH surveys conducted several years apart in the same country (e.g., Obermeyer et al. 2010; Reniers et al. 2011). If SSH are collected in, say, 2005 and 2010, then adult deaths having occurred in 2004 should be reported both by respondents in the 2005 and the 2010 surveys, but recall should be more complete in the 2005 data set. Using this strategy, Reniers et al. (2011) estimated that the recall of deaths having occurred 6–9 years before a survey was only 83 % complete. Obermeyer et al. (2010) incorporated this approach into a regression-based method aimed at correcting estimates of 45q15 (the corrected sibling survival method, or CSS).
These diagnostics have important limitations, however. First, they frequently assume that respondents completely recall the recent (i.e., less than three years before a survey) events that have affected their siblings. This is the case, for example, of the CSS method, which assumes that a respondent’s recall of recent deaths is complete but subsequently declines linearly with time between the death and the survey. Studies in cognitive psychology suggest that this is a particularly strong assumption: even the most recent events can be forgotten, whereas more-distant events may be more easily recalled (Wixted 2004). Furthermore, some omissions of deaths may not be due to forgetting. Respondents may deliberately neglect to report a death, for example, if it is due to a stigmatized cause, such as AIDS. Other omissions may be due to interviewers’ behaviors (as described earlier). SSH may thus also be incomplete for the most recent reporting periods, and current diagnostics may underestimate the extent of reporting errors in SSH.
Second, current diagnostics do not control for differences in sample universe between reference and test data sets. As a result, they can identify only the joint effects of sample selection bias and reporting errors on SSH estimates. For example, a recent adult death may not be reported on a census question about household deaths in the past 12 months if the death precipitated household dissolution. It may, however, be reported in a SSH survey if the deceased has at least one surviving adult sibling residing elsewhere. In this context, differences in estimates of 45q15 between the reference (census) data set and the test SSH data set can be attributed to (1) the incomplete coverage of both the reference and test data sets, and/or (2) reporting errors in both the reference and test data sets.
Finally, current diagnostics include only aggregate comparisons of adult mortality indicators, which do not permit identifying which type of reporting errors is most common in SSH. This is the case because different types of reporting errors have potentially similar and/or offsetting effects on adult mortality rates.4 This is an important limitation because a detailed understanding of the different types of reporting errors that affect SSH is required to guide improvements in data collection and/or develop analytical adjustment strategies.
An Alternative Approach to Diagnosing Reporting Errors
A more precise approach to diagnosing reporting errors requires linking the SSH report of the survival of a particular sibling to a record of that same sibling’s survival in a high-quality reference data set (Fig. 2). Such record linkage studies have been used, for example, to estimate the quality of old-age mortality data collected in the United States (e.g., Hill et al. 2000; Preston et al. 1996).
For list errors in SSH to be ascertained in a record linkage study, individuals must be grouped into sibships in the reference data set. Unfortunately, most available data sets on adult mortality do not permit constituting reasonably complete lists of the maternal siblings of a SSH respondent. VR and census data, for example, are rarely available at the individual level because of confidentiality issues. When available, VR data include only the name of the mother on the birth certificate, which is not sufficient to establish a sibship list: a fairly large number of individuals who are not siblings may have a mother with the same name. Census/survey data on recent household deaths do not collect information on the biological mother of recently deceased household members. They thus permit identifying siblings only if they reside in the same household as their live mother at the time of the census/survey. Even then, reports of household relationships may be affected by “adoption bias”: that is, the caretaker of a household member may be mistakenly classified as his/her “mother” by the household informant.
Using Health and Demographic Surveillance Data as Reference
Health and demographic surveillance systems (HDSS) constitute another potential source of information about adult mortality that could be used to evaluate SSH. HDSS consist of monitoring over time an entire population located in a small geographic area (Pison 2005). They include a baseline census, followed by continuous registration of demographic events (i.e., births, deaths, marriages, migrations) affecting this population. Event registration happens yearly or more frequently. HDSS interviewers visit every household and ask household informants to provide information on recent demographic events among household members (e.g., Oduro et al. 2012). In some HDSS, data collection relies on key informants at the community level (e.g., Jahn et al. 2007): a small number of village residents provide information on the demographic events that have occurred among households in their vicinity.
HDSS were initiated to remedy the lack of VR data in a number of sub-Saharan and South Asian countries (Sankoh et al. 2006). HDSS estimates of adult mortality may be affected by many factors, including being predominantly rural (Sankoh and Byass 2012), small sample sizes, and selective in- and out-migrations (Clark et al. 2007). HDSS nonetheless can help address limitations of other reference data sets because on one hand, the biological mother of each member of the population is often identified in HDSS data sets: that is, she is attributed a unique ID number. This is accomplished at the time of the initial census, at the time of birth, or when an individual migrates into the HDSS population. Sibship lists are then formed by looking up all the individual members of the HDSS population who have the same mother ID number. On the other hand, HDSS data also provide potentially more accurate information on vital status, ages, and dates of death of siblings than other data sources: they are collected prospectively, whereas other data sets are primarily retrospective.5
In this study, we used data from the Bandafassi HDSS (Senegal) as a reference data set. We identified sibships included in that reference data set that could also be included in a SSH survey (hereafter, “reference sibships”). We then conducted a SSH survey (our “test data set”) among the adult female members of these reference sibships. After SSH data collection, we linked the reference and test data sets at the sibling level (Fig. 2). Finally, we measured the prevalence and correlates of each type of reporting error in the SSH data set and then compared measures of adult survival obtained from SSH and HDSS data.6
The Bandafassi Health and Demographic Surveillance System
The Bandafassi HDSS is located in southeastern Senegal (Online Resource 1, section 1). With 12,770 residents (all rural) as of August 1, 2010, the area is divided in three ethnic groups (the Bedik, the Mandinka, and the Fulani) who largely live in separate villages. In the period 2000–2006, the total fertility rate was high (at about seven children per woman). Life expectancy at birth was low (<55 years) despite declines in child mortality (Desgrées du Lou et al. 1995; Kanté and Pison 2010). Out-migrations from the study area are frequent: adults often migrate to urban areas of Senegal or abroad for work. In-migrations are less common and happen primarily at the time of marriage (e.g., when a woman from a neighboring village marries a male resident of the HDSS area).
The Bandafassi HDSS started in 1970 with a census of the local population. This census was followed by detailed oral genealogies7 of each population member (Pison 1987), which permit identifying with precision the maternal siblings of individuals who were already born at the time of the census.8 During the census, innovative procedures were used to determine the age of population members whose exact year of birth was unknown (Pison 1980; Pison and Langaney 1985). Since the initial census, key informants in each village have been visited every year. During every visit, the lists of residents are updated; and births, marriages, migrations, and deaths that occurred since the last visit are registered. For births and in-migrations, information on the identity of the child’s or the immigrant’s biological parents is collected from the key informants. This information is used to update the list of maternal siblings of every population member.
The Bandafassi HDSS provides reference information on (1) lists of maternal siblings of a respondent, (2) age and vital status of each sibling at the time of the SSH survey, and (3) age and date of death of deceased siblings. We classified each sibling as either a child (i.e., individuals who have never reached age 15) or an adult (i.e., individuals who have ever reached age 15). The age at death of adult siblings was classified in three groups (<25 years old, 25–44 years old, and ≥45 years old). Deaths were classified as having occurred during an eight-year reference period (2002–20099) or before that reference period. HDSS data also provide information on characteristics of siblings and sibships used in multivariate analyses (see the Results section), including a categorical variable indicating whether a sibling was still alive, dead, or lost to follow-up at the last HDSS visit; a dummy variable indicating whether a sibling has the same father as the SSH respondent; and a categorical variable identifying the ethnic group of the sibship.
The 2010 SSH Survey
In 2010, we conducted a SSH survey among the population of the Bandafassi HDSS. Our key objective was to estimate the extent of reporting errors in SSH. A secondary objective entailed testing whether reported adult deaths were correctly classified by SSH respondents as pregnancy-related (Helleringer et al. 2013). Because SSH are most commonly collected among women during DHS, we included only female respondents. Unlike most DHS, which include respondents aged 15–49, we also included women aged 50–59. The sample of the SSH survey was not designed to be representative of the HDSS population but rather to control for differences in sampling universe between reference and test data sets. We thus reviewed HDSS data and identified sibships in which at least one known sister was alive at the time of the SSH survey and met eligibility criteria (reference sibships). To ensure adequate numbers of adult deaths in reported SSH, we focused on reference sibships in which at least one sister had died at age 15–59 in the Bandafassi HDSS since 1980.
We then drew a list of all the female members of these reference sibships who were born between 1950 and 1995 and were potentially still alive at the time of the SSH survey. Among those, some were still residing in the Bandafassi HDSS area, and others had emigrated. All eligible HDSS residents were contacted for participation. Because recall of SSH may be affected by separation from one’s siblings after migration, we also traced10 and interviewed emigrants who had moved to Tambacounda or Kédougou, the two cities that are the largest and closest to Bandafassi. Migrants who had moved to other regions of Senegal were not contacted.
We found 573 deaths among women aged 15–59 in Bandafassi since 1980. Among those, we had no information on the mother of 40 deceased women. For 256 other deaths, there was no known eligible sister, according to the HDSS. Finally, the eligible sisters of 80 women who died between 1980 and 2010 could not be interviewed because they resided in regions not covered by the survey, they had temporarily migrated to small agricultural hamlets around Bandafassi (which were inaccessible by car because of floods), or they refused to participate in the SSH survey (n = 2). We interviewed at least one of the sisters of 197 women who had died at adult ages between 1980 and 2010. In some instances, multiple members of the same sibship were interviewed. In total, there were 268 respondents in the SSH survey. Further details on the constitution of the SSH sample and survey response rates are available in sections 2 and 3, respectively, of Online Resource 1.
We used a SSH questionnaire similar to the 2005 Senegal DHS instrument.11 It provides test data on lists of maternal siblings as well as the vital status, ages, and dates of death of these siblings. It also provides information on respondent characteristics used in multivariate analyses of reporting errors. These include a dummy variable indicating whether the respondent was interviewed in the migratory situations (i.e., in Tambacounda or Kédougou); a categorical variable indicating the age group of the respondent (<25 years old, 25–34 years old, 35–44 years old, and ≥45 years old); and a dummy variable indicating whether the respondent had ever attended school.
To link SSH reports of a sibling’s survival to HDSS records of the survival of that same sibling, the basic SSH instrument was first augmented by questions about the last known residence of the sibling (including village and name of the head of compound), the name of the sibling’s spouse(s), and other names/nicknames the sibling may be known by. To maximize comparability with the standard SSH instrument, these questions were asked for each sibling only after completion of the entire standard SSH module. Second, each SSH report was independently linked to the HDSS by two members of the study team.12 We used HDSS information only about the expected sibship of a given respondent, including the names, sex, and residence of each sibling to establish linkages. We did not use HDSS information on vital status, age, and date of death because they constituted study outcomes. Finally, we assessed the concordance of the two independently assigned linkages.13 For discordant linkages, or linkages that were made by only one of the two team members, the linkage was reviewed by both team members and another investigator. Discrepancies were resolved. Full linkage results are presented in Online Resource 1, section 4. For simplicity, we call “matches” the SSH reports of siblings that were linked to a HDSS record for that sibling; we call “additions” the SSH report of siblings that could not be linked to a HDSS record; and we call “omissions” the siblings included in the DSS who were not reported during the SSH survey (Fig. 2).
The characteristics of SSH survey respondents are presented in Table 1. Forty-five respondents (16.8 %) were migrants. Most respondents had never been to school (81.0 %). Two-thirds of respondents belonged to sibships in which more than one sister was interviewed. Basic indicators of data quality and internal consistency checks for SSH are available in section 5 of Online Resource 1.
How Complete Are the Reference and Test Data Sets?
Table 2 presents the extent of overlap between the sibship lists obtained in SSH and HDSS data. A total of 993 unique sisters of all ages are listed in either the HDSS or SSH data sets, with 80.1 % appearing in both data sets, 10.8 % appearing only in HDSS, and 9.1 % appearing only in SSH. Only 62.9 % of children appeared in both data sets versus 88.2 % among adult sisters.14
Using capture-recapture techniques, we estimated the completeness of each data source for the reference sibships (Amstrup et al. 2005; Sutherland 2006). We considered that each sibling was first marked through HDSS and was then recaptured by SSH. We assumed that being marked in the HDSS data set did not affect the probability that a sibling would be recaptured during the SSH survey.15 This yielded estimates of completeness of 89.8 % for HDSS and 88.1 % for SSH. However, we estimated that HDSS was 96.3 % complete at adult ages versus 91.3 % for SSH. Among sisters deceased at adult ages, the HDSS was virtually complete (99.1 %).
List Errors in SSH Data
SSH respondents reported 626 adult sisters, among which 26 (4.2 %) could not be linked to the HDSS record of a woman from her maternal sibship (i.e., additions); 5.2 % of adult sisters reported as alive were additions versus approximately 1 % of adult sisters reported as deceased (p < .01).
Results are presented in the rightmost column of Table 3. Even after we controlled for characteristics of sibships and respondents, sisters who were deceased at the time of the survey remained more likely to be omitted than live sisters (OR = 2.95), as were sisters lost to HDSS follow-up (OR = 11.6). Omissions were also more common among respondents interviewed in migratory situations (OR = 5.18), among younger respondents (<25 years old), and among sibships of the Bedik ethnic group (OR = 7.28). Finally, respondents were less likely to omit sisters who had the same father as them (OR = 0.19).
Omissions were less common among deaths having occurred during the reference period (2002–2009) than among deaths having occurred before the reference period (Fig. 3a; p < .01). Omissions of deaths were also associated with age at death (Fig. 3b; p = .05): they were most common among sisters who had died before age 25 (10.8 %) or at ages 45 and older (18.8 %).
Vital Status Errors in SSH Data
Among matches between the HDSS and SSH data sets, only one adult sister was reported as alive in SSH but classified as deceased in HDSS, yielding a sensitivity17 of 99.6 % for SSH reports of vital status. There were no SSH reports of death among adult sisters who were alive at the time of the last HDSS visit (specificity18 = 100 %).
Age Errors in SSH Data
SSH respondents underestimated the ages of their adult sisters. Among the 252 live matches, the average difference between the current age recorded in the HDSS and the SSH-reported value was 1.74 years (standard deviation (SD) = 4.6). Among the 264 deceased matches, the average difference was 3.79 years (SD = 7.93). The distributions of age errors were overdispersed because of a small number of large errors, possibly resulting from linkage or interviewer errors.19
We analyzed the determinants of age errors using a multilevel model comparable to Eq. (1). The dependent variable was a continuous variable representing the difference between the HDSS-observed and SSH-reported values of the current age/age at death. In the model for errors in age at death, we also included a dummy variable taking a value of 1 if the death had occurred during the reference period (2002–2009). We considered only two levels in these models—siblings and sibships—because only a very small number of respondents had multiple adult female deaths in their sibships. The results from these analyses are reported in Table 4.
The largest age errors concerned the oldest adult sisters of respondents: those older than 45 ( = 6.28 for deceased sisters, = 4.22 for living sisters). Migrants underestimated their live sisters’ age more frequently than residents of the HDSS ( = 2.04). Older respondents overestimated the age of their living sisters (e.g., = –2.37 for respondents ≥45) but not the age at death of their deceased sisters. Finally, errors in reported age at death also increased among deaths having occurred during the reference period ( = 1.86).
Estimates of Adult Survival in Reference Sibships
We compared life-table estimates of adult survival among the reference sisbships according to both HDSS and SSH data (see Fig. 4). We found that differences in estimates of adult survival between HDSS and SSH emerged primarily after age 50. For example, 25q15 was 0.5 in HDSS versus 0.47 in SSH; 35q15 was 0.61 in HDSS versus 0.59 in SSH; but 45q15 was 0.78 in HDSS versus 0.63 according to SSH. We further detail these patterns using Cox regressions in section 6 of Online Resource 1. The estimates presented in Fig. 4 do not consider the potential impact of date errors because there were too few adult deaths in recent years to restrict estimates of 45q15 to a reference period.
Date Errors in SSH Data
To explore the potential effects of date errors, we first measured whether respondents correctly classified their sister’s death as having occurred before/during an eight-year reference period. This was not the case: only 65 of 85 deaths recorded during the reference period by the HDSS were reported as having occurred during the reference period by SSH, whereas 20 of the 124 deaths recorded before the reference period by the HDSS were reported as having occurred during the reference period by SSH. This yielded estimates of 76.5 % and 83.9 % for the sensitivity20 and specificity21 of SSH date reports, respectively.
We then investigated the correlates of date errors, using a multilevel model similar to the model used to investigate age errors. The dependent variable was a continuous variable measuring the difference between HDSS-observed and SSH-reported duration between survey and death (Table 5). Respondents made larger date errors for deaths observed during the reference period than for deaths having occurred before the reference period ( = –3.36). On average, they reported that deaths during the reference period occurred earlier than their HDSS-observed date. To account for missing data on dates of deaths (21.3 % of female adult deaths had missing date of death; Online Resource 1, section 5), we conducted a sensitivity analysis using Heckman sample selection models (Heckman 1979). In these models, we used a set of dummy variables identifying the interviewer having conducted the SSH interview as instruments (for a similar approach to accounting for missing data in DHS data, see Barnighausen et al. 2011). Results were similar to those obtained in a model based solely on the complete case data. Further details about the Heckman models are available in section 7 of Online Resource 1.
In this article, we improved on previous assessments of reporting errors in SSH through an innovative record linkage study with HDSS data. This permitted isolating reporting errors from sample selection biases; it also fostered a better understanding of the diversity and complexity of reporting errors in SSH and their implications for estimates of adult survival.
Respondents frequently omitted some of their adult sisters in SSH. They also added others who were not their maternal siblings, but this was less common. Such list errors were associated with the vital status of siblings: omitted sisters were more likely to be deceased or to have migrated away from the HDSS area, whereas added sisters were more likely to be alive at the time of the survey. Age and date errors were common: respondents underreported the current age of their living sisters and the age at death of their deceased sisters; they also misreported the date of death of their sisters by several years. On the other hand, virtually no respondent reported their living sisters as deceased and vice versa.
In our reference sibships in Bandafassi, SSH yielded an estimate of 45q15 that was 20 % lower than the HDSS estimate (Fig. 4). The differences between SSH and HDSS estimates emerged largely above age 50. Prior to that age, SSH and HDSS yielded remarkably similar estimates of adult survival (e.g., 25q15 ≈ 0.5 according to both SSH and HDSS data). This congruence between SSH and HDSS below age 50, however, does not imply that respondents accurately report the survival of their younger siblings: list errors and age errors were common even for siblings younger than 50 (e.g., see Fig. 3b). Rather, it is the result of a complex interaction between different types of reporting errors. For siblings below age 50, list errors are offset by age errors: the omitted deaths are replaced with deaths that have in fact occurred at older ages but are shifted downward (in age) by the respondent. Above age 50, however, this substitution does not operate because (1) list errors are more common (Fig. 3b), and (2) there are too few deaths above age 60 to replace the omitted deaths in the age group 50–59.
There were several unexpected findings. First, the quality of SSH data was not uniformly better during a recent reference period. Only list errors were reduced (Fig. 3a), whereas age and especially date errors were actually more common (Tables 4 and 5). Because of date errors, a number of deaths were shifted in and out of the reference period. This has potentially troublesome consequences for estimates of levels of, and trends in, adult mortality. In contexts of adult mortality declines (where most adult deaths in respondents’ sibships will have occurred prior to the reference period), date errors as observed in our study will lead to overestimates of 45q15 in the reference period (Fig. 5). In contexts of increasing adult mortality rates (e.g., HIV epidemics, wars), the opposite is true. This may limit the validity of program evaluations based on SSH data and a difference-in-differences econometric framework (Bendavid et al. 2012). In such evaluations, deaths must be precisely dated as having occurred before or after the implementation of a public health program.
Second, contrary to expectations (e.g., Stanton et al. 2000), older respondents provided more complete sibship lists than younger respondents (Table 3) but made only slightly larger age errors about their living sisters than other respondents (Table 4). Our findings thus support the call by Obermeyer et al. (2010) to extend the collection of SSH data to older age groups during the DHS and other surveys. This would not deteriorate the quality of data on SSH, but it would increase available sample size and possibly allow using SSH to estimate mortality at older ages (e.g., >60 years old).
There are several important limitations to our study. First, some of the discrepancies between HDSS and SSH data observed in our study may be attributable to limitations of the HDSS data. Key informants may not be aware of specific events at the time of the annual visit and may thus report them with delays and imprecisions. Loss to follow-up in the HDSS may be associated with mortality if, for example, healthier individuals leave the HDSS area to look for work in cities. The association between list errors and vital status may thus not be as strong as we estimated in Table 3. Finally, the procedures used to estimate the age of population members born prior to the initial census may also be imprecise. This may explain why age errors are more prevalent among siblings aged 45 and older (i.e., born prior to the initial census).
Second, the generalizability of our analyses may be limited because they concern only one small western African population. Patterns of reporting errors may be different in other populations where SSH are used to estimate adult mortality rates—for example, in eastern African or south Asian countries. SSH may be more accurate in settings with uxorilocal marriage, where sisters coreside for much longer periods of time than in Bandafassi (where marriage is virilocal). The reporting of SSH may also be affected by local attitudes toward death or specific causes of death. Deaths resulting from HIV/AIDS, in particular, may be underreported because of stigma.
The impact of a given pattern of reporting errors on estimates of adult mortality may also vary across epidemiological contexts. For example, because respondents are more likely to omit deaths occurring at young (i.e., <25 years) and older (i.e., >45 years) adult ages (see Fig. 3a), biases in SSH estimates may be larger in settings where causes of death that follow a U-shaped age pattern (e.g., maternal mortality) are prevalent. Comparative studies are thus needed to provide a better understanding of reporting under a variety of sociocultural and epidemiological contexts.
Third, our SSH survey may not have been representative of typical SSH surveys. It deviated from the DHS study protocol in a number of respects. We included respondents older than 50, whereas most DHS focus on respondents aged 15–49. Robustness tests (Online Resource 1, section 8) indicate that this inclusion does not alter our main findings. We also included only a few (n = 10) sociodemographic questions prior to the SSH module, whereas the DHS include numerous other modules (e.g., fertility history and sexual behaviors). Respondent and interviewer fatigue may thus be significantly more common in the DHS than in our SSH survey. Finally, after being studied for such an extended period of time, the HDSS population may have gained a better knowledge of ages and dates of events than similar neighboring populations not undergoing demographic surveillance. The extent of reporting errors identified in our study could thus be an underestimate of the likely extent of underreporting in other non-HDSS populations.
Fourth, our SSH survey had significantly more missing data on dates of death than typical DHS, possibly because of differences in the questionnaire editing process. Our analyses of the correlates of date errors, however, were not affected by these missing data patterns (Table 5; see also section 8 of Online Resource 1).
Finally, our analyses are based on small sample sizes that do not permit investigating more complex patterns of reporting errors. For example, we could not calculate 45q15 separately for the person-years that occurred during a recent reference period, nor could we investigate interactions between different types of reporting errors. Future investigations of reporting errors in SSH data should be based on larger samples.
Our results also have important implications. They suggest that recently proposed methods to correct for reporting errors (i.e., the CSS method of Obermeyer et al. 2010) are unlikely to produce unbiased estimates of adult mortality. These regression-based adjustments assume that reporting errors are limited to the forgetting of deaths and that such forgetting is a simple linear function of time between a death and the survey. Our results show that reporting errors are much more diverse. They also have complex effects that may offset each other under specific conditions. Such effects are not well captured by a simple correction factor. Our results also suggest new strategies for improving data quality for SSH. Omissions from sibship lists could be prevented by adding probes (Brewer and Garrett 2001; Brewer et al. 2005) in SSH interviews focused on siblings that are frequently omitted (e.g., migrants, siblings deceased at young or old adult ages). Age and date errors could be reduced by adapting calendar tools previously used by demographers for the collection of data on contraception, migration, or sexual behaviors (Freedman et al. 1988; Luke et al. 2011; White et al. 2008). Finally, our study highlights the potential use of data from HDSS to help improve demographic measurement in countries with limited VR. More than 40 HDSS are now in the INDEPTH network (Sankoh and Byass 2012), an international organization that fosters collaboration between research centers operating HDSS in low and middle-income countries. We hope our study will help establish HDSS as “laboratories” where demographers can refine their data collection tools and analytical techniques.
We thank Patrick Gerland, Bruno Masquelier, and Samuel Preston for comments on previous drafts of this article. The project described was supported by Award Numbers R24HD058486 to the Columbia Population Research Center and R03HD071117 to S. Helleringer, both from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, and by grant ANR-11-BSH1-0007 from the Agence Nationale de la Recherche to G. Pison/Institut National d’Études Démographiques. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Eunice Kennedy Shriver National Institute of Child Health and Human Development or the National Institutes of Health.
In such analyses, “adulthood” is most commonly defined as the age range 15–49 years or 15–59 years. The most commonly used indicators of adult mortality are thus 35q15 or 45q15.
The primary objectives of a Demographic and Health Survey often focus on estimating the total fertility rate and infant/child mortality rates in a given country. The sample size calculations for each DHS thus aim to achieve a given level of precision for these indicators. Births and infant/child deaths are significantly more common than adult deaths in sub-Saharan populations, however. As a result, even if a respondent reports about several adult siblings during SSH, SSH may still yield an insufficient number of reports of deaths to enable specific analyses.
For example, in a cohort of 20 women in which 10 deaths occur, all at age 35, the true death rate at adult ages 45M15 is 10 / [(10 × 20) + (10 × 45)] = 15.4 per 1,000. A list error in which a respondent omits one deceased sister would yield 45M15 = 9 / [(9 × 20) + (10 × 45)] = 14.3 per 1,000, whereas a vital status error in which a respondent misclassifies one deceased sister as alive would yield 45M15 = 9 / [(9 × 20) + (11 × 45)] = 13.3 per 1,000.
For example, in a cohort of 20 women in which 10 deaths occur, all at age 35, the “true” death rate at adult ages 45M15, obtained from a complete and accurate reference data set, is 10 / [(10 × 20) + (10 × 45)] = 15.4 per 1,000. If one of these women is omitted by respondents during SSH (i.e., list error), but respondents make no age errors, then the estimated death rate 45M15 would be 9 / [(9 × 20) + (10 + 45)] = 14.3 per 1,000. On the other hand, the estimated death rate 45M15 would also be 14.3 per 1,000 if respondents reported all their deceased sisters (i.e., no list errors) but overestimated the age at death of their sisters by an average of five years (age errors).
VR data are also collected prospectively in theory, but events may be registered late (e.g., children are registered when they reach school age) so that VR data are affected by significant age/date errors.
Several comparisons of SSH and HDSS data have already been conducted, but these have compared DSS data for small areas with SSH estimates from national data sets (Obermeyer et al. 2010). In these comparisons, differences in estimates of adult mortality between SSH and HDSS may have been due to idiosyncrasies of the local HDSS populations rather than issues with SSH data quality. Some comparisons were conducted within the same geographic areas but did not control for differences in sampling universe between HDSS and SSH data sets (e.g., Ngom et al. 1999). To the best of our knowledge, only one study comparing SSH and HDSS was based on a linkage design (Shahidullah 1995). This study, however, was limited to assessing the reporting of maternal deaths.
In oral genealogies, multiple informants are interviewed to confirm the existence and nature of kinship ties between population members. See, for example, Pison (1987), Quinlan and Hagen (2008), and Silagan (1986).
The Bandafassi HDSS differs from other HDSS in that respect given that oral genealogies are not necessarily conducted at the time of the census in each HDSS. In other HDSS, the data on maternal sibships of individuals who were already present in the population at the time of the initial census may be missing or may be obtained from one single informant. Lists of maternal siblings may thus be less complete/accurate in other HDSS.
The SSH survey took place before the 2010 HDSS data became available; as a result, the reference period and all results herein do not include deaths and person-years from 2010.
To trace migrants, we first conducted a short migration survey with a member of the last known HDSS residence of the migrant. This survey included information on destination, new address (if available), phone number, and other information (open-ended) that may be useful for migrant tracing.
The full questionnaire, in French, is available online (http://www.columbia.edu/~sh2813/BandafassiQuest.pdf).
Pairs of reviewers were randomly selected among the study’s coauthors. Each team member was asked to select an ID number from the HDSS files for each SSH-reported sibling based on the available information.
The concordance was assessed by comparing the ID number attributed by each member.
Some siblings were reported as adults by the SSH and recorded as children by the HDSS, and vice versa.
In Bandafassi, different persons report demographic events during demographic surveillance and SSH: for example, key informants may be health workers posted in the rural villages or heads of compounds (predominantly older males), whereas SSH respondents were women aged 15–59. However, the HDSS and SSH data may not be entirely independent because, for example, key informants may have heard about the death of an individual by relatives of the deceased who may be interviewed during SSH surveys. Similarly, some deaths may be less likely to be reported in both data sources because of the cause of death or the location where the death took place. For example, if a DSS resident seeks health care in a city and dies while away, his/her death may be less likely to be reported both in the DSS and in the SSH data sets. Capture-recapture estimates thus likely overestimate the completeness of each data source.
This is the case because multiple sisters were interviewed by sibship (see Table 1).
Sensitivity refers to the proportion of deaths among adult sisters recorded in the HDSS who are reported as deceased in the SSH survey.
Specificity refers to the proportion of alive sisters recorded in the HDSS who are reported as alive in the SSH survey.
It appears that in a few cases, interviewers mistakenly recorded the date of death in the space where the age at death should have been recorded, and vice versa. For example, if a respondent reported that her sibling died three years ago at age 37, the interviewer could have mistakenly recorded that the death occurred 37 years ago at age 3. This generates very large age and date errors that cannot be detected by data editors because they leave the reported birth intervals between siblings unaffected.
Sensitivity refers to the proportion of deaths observed during the reference period (2002–2009) that were reported by SSH respondents as having occurred during that time frame.
Specificity refers to the proportion of deaths observed before the reference period (before 2002) that were reported by SSH respondents as having occurred during that timeframe.