## Abstract

In some surveys, women and men are interviewed separately in selected households, allowing matching of partner information and analyses of couples. Although individual sampling weights exist for men and women, sampling weights specific for couples are rarely derived. We present a method of estimating appropriate weights for couples that extends methods currently used in the Demographic and Health Surveys (DHS) for individual weights. To see how results vary, we analyze 1912 estimates (means; proportions; linear regression; and simple and multinomial logistic regression coefficients, and their standard errors) with couple data in each of 11 DHS surveys in which the couple weight could be derived. We used two measures of bias: absolute percentage difference from the value estimated with the couple weight and ratio of the absolute difference to the standard error using the couple weight. The latter shows greater bias for means and proportions, whereas the former and a combination of both measures show greater bias for regression coefficients. Comparing results using couple weights with published results using women’s weights for a logistic regression of couple contraceptive use in Turkey, we found that 6 of 27 coefficients had a bias above 5 %. On the other hand, a simulation of varying response rates (27 simulations) showed that median percentage bias in a logistic regression was less than 3 % for 17 of 18 coefficients. Two proxy couple weights that can be calculated in all DHS surveys perform considerably better than either male or female weights. We recommend that a couple weight be calculated and made available with couple data from such surveys.

## Introduction

The married or in-union couple is the unit of interest in many studies of reproductive health and family sociology more generally. For example, studies with a longitudinal design have shown the importance of measuring fertility intentions and desires of both partners in order to best predict the couple’s future fertility (Bankole 1995; Gipson and Hindin 2009; Schoen et al. 1999; Thomson 1997; Thomson and Hoem 1998). Data collection is obviously more complicated for couples than for individual members of a couple. Specifically, to avoid potential contamination of responses between spouses, interviews are ideally conducted with the partners separately and simultaneously (Hertz 1995). When surveys interview each partner separately and couples are formed by the matching of the individual interviews, couple nonresponse is a function of the joint nonresponse of the individuals.

A number of demographic surveys include both men and women in households. Examples are the British Household Panel Survey (Taylor et al. 2010), the U.S. Current Population Survey (CPS), and the Demographic and Health Surveys (DHS); the latter are the focus of this analysis. Since the mid-1980s, the DHS project has conducted nationally representative household surveys in more than 90 countries (ICF International 2017). In 1987, a separate questionnaire and sample design for males in households was added, and men are now interviewed in nearly all DHS surveys. In all occupied households, reproductive-aged women are interviewed, and men aged 15–59 are typically interviewed in a subsample of households. Although previous studies have analyzed men, women, and husbands and wives separately (e.g., Ezeh et al. 1996), couples have also been analyzed (e.g., Allendorf 2007; Bankole and Singh 1997; Becker and Costenbader 2001; Thomson et al. 2015). Analysis of sero-discordant couples is particularly important in studying the AIDS epidemic (Chemaitelly and Abu-Raddad 2016; Chemaitelly et al. 2012; Eyawo et al. 2010). Couple analyses are more complex but allow a richer array of hypotheses to be examined than is possible with individual data (Thomson 1990).

Results from national demographic, health, economic and other surveys are representative but usually only after weighting. Such sampling weights generally have two overall components: (1) the design weight, which reflects the probability of selection; and (2) the nonresponse adjustment, which adjusts for potential bias introduced by differential nonresponse. Nonresponse adjustment in surveys can proceed in several ways (Valliant et al. 2013). If population data are available (e.g., a census), then poststratification weights that are usually based on standard demographic variables (e.g., age, sex, race, urban/rural residence) can be calculated and used to adjust the survey responses to the total population of interest (Little 1993). When recent population data are not available or are of dubious quality—and one of these two is often the case in developing countries—two options are widely used. A first option is to collect observations into categories (called *weighting classes*) and adjust the design weight to account for nonresponse estimated for each class. To successfully reduce nonresponse bias, the weighting classes should have three attributes: (1) response rates vary over the classes; (2) the values of survey estimates vary over the classes; and (3) the values of survey estimates are similar for both respondents and nonrespondents within a class (Little and Rubin 1987). The last criterion is obviously difficult to show in practice, but the first two criteria are easily checked using the household data in surveys.

A second option when multiple covariates are available for both respondents and nonrespondents is to model the probability of response for individuals, usually with logistic regression. Then the inverse of these response propensities is used to adjust the base sampling weight, again after forming relatively homogeneous groups for these propensities (Kreuter et al. 2010).

In the analyses described here, we use the weighting classes option. This is the approach currently utilized for adjusting design weights for households, women, and men in many such surveys, in particular DHS surveys and Multiple Indicator Cluster Surveys (ICF International 2012; UNICEF 2014). When men and women in unions within households are matched, the question becomes, What is the appropriate sample weight for the couple? Debate exists about whether a man’s or woman’s weight is more appropriate for couple analyses. We searched articles that utilized survey data with sampling weights and that analyzed couple data in the sample to see which weight was used. Of 75 such articles (52 utilizing DHS data), only 9 specified the weight that was used. Among those 9 articles, 5 used women’s weights, 3 used men’s weights, and 1 used poststratification weights (Becker 1999; Becker and Costenbader 2001; Becker et al. 2006; Chiao et al. 2011; Lasee and Becker 1997; Ngom 1997; Story and Burgard 2012; Upadhyay and Karasek 2012; Wilcox and Dew 2016).

For analyses of couple data, however, neither female nor male weights are optimal in general. Consider the probability of nonresponse. The nonresponse rate for couples is obviously different (and greater) than that of either the women or the men in partnerships, but it is not a simple function of either or both. Furthermore, selectivity in the type of couples who are nonresponders is likely. Finally, response rates for men and women in couples are very likely to differ from response rates for all men and women, which includes single men and women, respectively.

For several national surveys that have included couples and provide online documentation, the derivation of a couple weight has been ad hoc and simplistic. Some examples will illustrate this point. In the U.S. Survey of Income and Program Participation (SIPP), the couple weight is “the average of husband’s weight and wife’s weight” (U.S. Census Bureau 2008:C-6). In the U.S. Current Population Survey (CPS) “when the reference person is a married man, for purposes of family weights, he is given the same weight as his wife” (U.S. Census Bureau 2006:10–13). For the U.S. Panel Study of Income Dynamics (PSID), family weights were derived as “the average of the individual weights of all the family members” (Gouskova et al. 2008). A separate couple weight has not been created for the Household, Income and Labour Dynamics in Australia (HILDA) survey, so researchers using couple data from that survey “have had to work out their own way” (Nicole Watson, 2016, personal communication) in this regard.^{1} For the analyses of DHS data, senior staff at DHS say, “It is standard practice at DHS to use the husband’s weight (MV005) when analyzing the couple (CR) files” (DHS Program User Forum 2015). As noted earlier, many analysts of couple data from surveys actually either ignore weights or do not state the weight that was used (e.g., Chemaitelly and Abu-Raddad 2016; McClintock 2017)

The women’s, men’s, and couple’s response and nonresponse combinations are illustrated in the Venn diagram shown in Fig. 1. The shaded areas C, D, E, and F represent couples. Only area C represents completed couples. Completed men’s interviews are represented by the sum of areas A, C, and D; completed women’s interviews are represented by the sum of C, E, and H. Among couples in a household in which both partners are eligible (e.g., in the appropriate age range), the woman, the man, or both may have incomplete interviews (areas D, E, and F, respectively).

In this research, we have three objectives. First we aim to provide one derivation of couple weights. We then apply this to data from DHS surveys. As mentioned earlier, multiple methods exist for constructing weights; we use the same methodology as used for all the other weights in the DHS to make it easy to implement for all such surveys.

Second, we then compare results of analyses using the couple weights with results using the female or male weights and two proxy estimates of couple weights that can be calculated for all DHS surveys that interview women and men in the same household (but in which partner line number information is not collected for matching of all couples in the household). In addition, we replicate results from published couple analyses that used women’s weights to examine how the estimates differ when couple weights are used instead.

Third, we go beyond the constraints of analyses of existing data. We simulate a range of response rates for men and women and examine the effects of these variations on estimates, comparing the performance of the two proxy weights and the male and female weights with results when the couple weight is used.

## Methods

We first define the probabilities to consider for the derivation of the couples’ weight and then estimate them with data. We apply the derivation for DHS couples’ data, but the simple theory extends to calculation of couple weights in other household surveys. The definition of a couple in the DHS is a heterosexual pair of married or in-union partners, with both partners being usual residents of the household and/or sleeping there the previous night. The probability that interviews with both partners in a couple are completed in a household can be decomposed into a series of conditional probabilities as follows (Table 1): the probability of selection of the cluster (*p*1); the probability of selection of the household within the cluster (*p*2); the probability that the selected household is also selected for male interviews (*p*3); the probability that the selected household is completed (*R*^{h}); and the probability that both partners in a couple in the household have completed interviews (*R*^{c}).^{2} In many surveys, *p*3 is 1.0, but it usually is not 1.0 in the DHS because men typically are interviewed in only a random subset of households.

Within each weighting class where sample size is sufficiently large to give fairly stable estimates, *r*^{h} and *r*^{c} (the household and couple response rates that estimate *R*^{h} and *R*^{c}, respectively) are found by processing household and couple data. Thus, *r*^{c} is estimated at the level of couples, not households, because multiple couples may live in a given household; that is, the estimate is the proportion of eligible couples with completed interviews from both partners. The product of the estimated conditional probabilities is then calculated (i.e., *p*1 × *p*2 × *p*3 × *r*^{h} × *r*^{c}).^{3}

## Data

To calculate weights for couples, one must be able to identify all eligible couples in households. Unfortunately, the information needed to link these partners (line number of spouse/partner in the household form) is not routinely collected in the DHS. However, 11 surveys did include that information, all of which were completed between 1991 and 1998: Bangladesh, 1993/1994; Bangladesh, 1996/1997; Burkina Faso, 1992/1993; Cameroon, 1991; Dominican Republic, 1996; Kenya, 1993; Nicaragua, 1997/1998; Niger, 1992; Tanzania, 1996; Turkey, 1998; and Uganda, 1995. Note that six of the surveys are in sub-Saharan Africa.

Table 2 displays summary information for these surveys that is relevant to this study. The sample sizes for couples vary from 848 in the Dominican Republic survey to 3,284 in the Bangladesh survey of 1993/1994. The number of unique geographical areas (or first variable for weighting cells) for which sampling weights are derived vary from 3 (national capital, rural, and urban outside the capital) in both Burkina Faso and Niger to 64 in Uganda. Men’s response rates are lowest overall in Turkey and highest overall in the Bangladesh survey of 1996/1997.

Consistent with the derivation of weights for men and women by ICF International for these surveys, we estimate the couple response rate (*r*^{c}) at the level of the sampling domain/region.^{4} The existing nonresponse adjustments of weights for households, females, and males account only for the region. This approach ignores the fact that nonresponse typically varies considerably by demographic characteristics, such as age and marital status, which are relevant when considering couples as distinct from individuals.

Indeed, the most basic distinction that is ignored with the individual weights in relation to their use with the couple data is marital status. Only two of the six surveys (Nicaragua and Burkina Faso) have marital status information available in the household form, so response rates by marital status could be calculated. As shown in Table 3, in 10 of the 16 comparisons, response rates are significantly higher for individuals in union than for those not in union. Thus, marital status clearly meets the first criterion for nonresponse adjustment: the response rates vary based on marital status regardless of age or sex. Therefore, for couples, the weight should be for married individuals; unfortunately, the present female and male weights used in DHS couple analyses are for all persons of the given sex, regardless of marital status.

We next test whether two survey outcomes vary by marital status. Table S1 (Online Resource 1) shows means of age and years of schooling by sex and marital status in each of the six surveys that included both married and unmarried men and women and in the remaining five surveys that included females alone. These data are from the household questionnaire so all eligible women and men are included. All but 1 of the 34 comparisons in the table show statistically significant differences by marital status. For women who are age eligible, the average number of years of schooling for married individuals is consistently below that for unmarried individuals. Mean ages are significantly higher among married and in-union individuals than among individuals not in union in all 17 comparisons. The table also shows mean number of contraceptive methods known spontaneously for each of the surveys but now only for completed women’s and men’s interviews because these results come from the individual questionnaires. Again, for all but 4 of the 17 comparisons (both sexes in Burkina Faso, and women in Uganda and Niger), differences between the married and unmarried are statistically significant. At least on these indicators, the survey estimates clearly vary by marital status; thus, ignoring marital status in calculating response rates is likely to yield estimates for married individuals (and therefore couples) that are more biased than estimates using weights specific to married individuals or to couples.

## Derivation of Couple Weights and Analyses

We derive the couple weight from data on all matched couples, both complete and incomplete, in the 11 surveys following the aforementioned procedure and then append them to the couple records. (See the appendix for details.) In addition to the couple weight, we calculated two proxy weights that can be calculated for any DHS survey because line number of the spouse, although not typically recorded in the household form, is virtually always recorded in completed wife/husband DHS questionnaires.

Thus, with these line identifiers, we can find and match the number of completed couples, the number of wives complete but husbands incomplete, and the number of husbands complete but wives incomplete. However, the universe of total eligible couples must be estimated because couples with both partners’ interviews incomplete are not included. For the first proxy, we estimate the denominator of *r*^{c} (number of all eligible couples) using couples with complete data, plus the number of married women interviewed whose husbands reside in the household but have incomplete interviews, plus the number of married men interviewed whose wives reside in the household but have incomplete interviews. (This proxy weight is labeled ALT in the text, tables, and figures that follow.) What is missing from this estimate is the number of couples in which both the woman and the man have incomplete interviews. The denominator of the second proxy weight is the same as that for ALT except that we add an estimate of the number of couples in which neither partner has a completed interview using the Chandra-Sekar–Deming technique (Chandra Sekar and Deming 1949), which assumes independence of response probabilities; this proxy weight is labeled EST in the presentation that follows. By construction, therefore, couple response rates from the second proxy will be lower than those estimated with the first proxy because of the additional estimate in the denominator.

For the couple weight, these two proxy weights, and the women’s and men’s weights (provided by ICF International), we then normalized each to give the couple sample size. Because some argue that weights are not needed in regression analyses, for several comparisons, we also include unweighted analyses.^{5}

We then conduct several sets of analyses using couple variables as outcomes (i.e., variables that are constructed from the responses of both partners) and various covariates, and we repeat them with each of the six sets of weights (couple’s, women’s, men’s, ALT, EST, and none (i.e., 1.0)) and each survey. Table 4 summarizes the statistical methods and variables in the analyses.

First, simple means and proportions are calculated with each set of weights. Next, we compute linear regressions for three outcome variables: number of children ever born, differences between spouses’ ages, and differences between their levels of schooling. Covariates used in these analyses are indicated in the table. The third set of analyses uses logistic regression for many of the same indicators that are calculated as descriptive proportions in the first set. In the fourth set, we estimate multinomial logistic regressions, again with many of the variables used in the descriptive analyses but now with multinomial responses, separating wife only, husband only, both partners, or neither partner for each variable. Across all analyses, 1,912 estimates are produced using each type of weight, giving a total of 11,472 estimates (i.e., 6 × 1,912). All the analyses adjust the variance estimates for stratification and intracluster correlations between observations because of the survey design; this is accomplished with the *svy* commands in either Version 13 or 14 of STATA (StataCorp 2013; 2015).

To evaluate the numerical estimates calculated with female, male, and the couple proxy weights compared with those using the actual couple weights, we use several criteria. As a preliminary step we calculate the percentage difference between the estimate using the female, male, or proxy weight from the estimate using the actual couple weight (i.e., 100 × (estimated value with women’s, men’s or proxy weights – estimated value with couple weights) / estimated value with couple weights). The first criterion is the mean absolute percentage difference across the estimates using a given weight compared with estimates using the actual couple weight.

A second criterion is the percentage of the estimates that differ by more than 5 % from the estimate using the couple weight.^{6} In most national DHS samples of couples and in nearly all the recent surveys, which have larger sample sizes than earlier surveys, differences of less than 5 % are within sampling error after design effects are taken into account. In addition, few would have programmatic implications. On the other hand, differences above 5 % could lead to inappropriate conclusions if the individual weights are used instead of the couple weights.

*M*could refer to a mean, proportion, or a regression coefficient;

*Mi*refers to the estimate with an alternative weight

*i*(e.g., male or female weight);

*Mc*refers to the estimate with the couple weight; and SE(

*Mc*) is the estimated SE of the measure when couple weights are used. We label this ratio a deviation measure of bias. Although this ratio is a continuous measure, in conjunction with the percentage difference criterion, we establish a cutoff above which deviations are considered problematic (as discussed in the Results section).

The variances of estimates will likely be larger when couple weights are used than when male or female weights are used because the variability in couple weights is typically larger. To assess the trade-off between bias and potentially larger variances, we calculate SEs and an estimated mean square error (MSE) for each estimate (MSE = bias^{2} + variance). Because the scale of the MSE will vary with the indicator and the interest is its value using a given weight relative to the value using the actual couple weight, we calculate the ratio of the MSE for a given estimate and weight to the MSE for the same estimate using the couple weight. By definition, here the bias of the estimate using the actual couple weight is set to 0. Of course, the derived couple weight would be the “true” weight only if a whole set of assumptions about the sampling and characteristics of nonrespondents is met, and the true couple weight generally cannot be known. However, relative to use of the inappropriate women’s or men’s weights, results with couple weights are assumed to be the correct results.^{7} Thus, an MSE ratio of less than 1.0 implies that the added bias involved in using a given weight is more than offset by the smaller variance using that weight relative to using the couple weight. Conversely, values of the ratio above 1.0 imply that using the couple weight is superior overall. Estimates of bias and ratios of SEs and MSEs are displayed in boxplots and summarized in tables by type of analysis and weight used.

Further, because researchers often focus on statistical significance of coefficients in regression analyses, we calculated *t* values of test statistics for the null hypothesis that coefficients were equal to 0.0. Then using the traditional *p* values of .05, .01 and .001, we compared whether a coefficient was significant using a given weight to significance of the same coefficient when the couple weight is used. This comparison helps to inform whether analysts may find statistical significance (or not) when weights other than couple weights are used. In effect we calculated sensitivity and specificity for male, female, ALT, EST, and a weight of 1.0 relative to the significant (or not) result with the couple weight.

In addition we were able to assess the estimated bias in one published analysis of couple data from one of these surveys but that used women’s weights. Kulczycki (2008) used the 1998 Turkish DHS to study the effect of power relations in the couple on contraceptive use (concordant reports of the two partners). We replicate univariate, bivariate, and multivariate results from that study; we present only the multivariate comparisons in this article. Because we are considering variances, we show coefficients and their SEs rather than odds ratios, which Kulczycki (2008: table 5, final model) used.^{8} Further, we fit the same models with the other possible weights and compare these coefficient and SE estimates.

Finally, given the literature suggesting that unweighted regression analyses can be appropriate if stratification variables are included in the model, we fit models both with and without these variables.

### Simulation

The analyses outlined so far are specific to the 11 surveys with the observed response rates of single men, single women, and couples. But one wonders whether such results hold more generally. In particular, how would the results differ with other response rates? We conduct a simulation to address this. Rather than construct a simulated population and relationships between variables *de novo*, we use another approach. We use an existing survey and duplicate the sample as many times as needed to reach a sample of approximately 100,000 so that sampling variances are trivially small and differences in parameter estimates would solely represent bias. The simulation then entails varying the proportion of single women and men who were subject to nonresponse because these proportions together with the proportion single in the population determine the differences between the couple weight and the women’s and men’s weights. Couples are situated within households, so the household sampling probabilities for women, men, and couples are the same (within a multiplicative constant, given that men are usually interviewed in only a fraction of households selected for women’s interviews in DHS).^{9}

To vary the response rates, we start with the observed response rate and then take one-half of the difference between this rate and 100 %. For one simulation, we add this quantity to the observed rate for each strata; for another, we subtract it; and for a third, we leave the quantity as observed for a check.^{10} Then we consider all possible combinations of these simulated response rates for women and men and across each strata.

We use the Burkina Faso survey of 1993 as a base. One reason for that choice is that it is possible to reconstruct the household sampling probabilities for that survey.^{11} A second reason was that this survey has only three strata. Thus, 3 strata × 3 levels of response rates for single women × 3 levels of response rates for single men gives 27 simulations.

Regarding parameter estimates to compare, we use the multinomial logistic regression of couple fertility preferences because this regression produced significant coefficients for most of the couple covariates (14 of 18) in the earlier analyses. In these analyses, the indicator of bias divided by the SE of the measure using the couple weight will not be comparable with that from the original analyses because the SE with 100,000 cases is very small. Thus, we revert to using the percentage bias criterion to compare the performance of each of the weights.

## Results

Estimates of the couple response rates (*r*^{c}) and then the couple weights and the two proxy weights are derived by region/domain for each survey (see Table S2, Online Resource 1). Because the weights are normalized to sum to the sample size, no set of weights is consistently higher or lower than any other set.

Boxplots of the absolute values of bias of the estimates in SE units according to the weight used and type of analysis are shown in Fig. 2, panel a. Bias measured in this way is highest for means and proportions and low for all the regression coefficients. On the other hand, if the percentage bias indicator is used, the reverse is found: regression coefficients show somewhat larger biases, but they are artificially inflated, as noted earlier (not shown). Median levels of biases using the ALT and EST weights are lower than the respective levels using women’s or men’s weights in each set of analyses.

For example, in the Tanzania survey, the mean difference in years of schooling is estimated as 1.85 years with the couple weight but 1.79 and 1.76, respectively, with the female and male weights, giving respective differences of 3.3 % and 5.3 % (calculated with more significant digits). The estimates with ALT and EST weights are 1.87 and 1.88 years, respectively. As a second example, in the Cameroun survey, the percentage of couples in which both said they want no more children was 9.0 % with the couple weight; 8.6 % and 8.7 % using the female and male weights, respectively; or 5.3 % and 4.7 % as estimates of bias using those respective weights. The deviation measures for these estimates with female and male weights are 0.34 and 0.26, respectively.^{12}

We need to calibrate the deviation measure of bias. Although the measure is on the same scale as a *t* statistic for tests of differences of means or regression coefficients, it is not in that category because there is no hypothesis that means or coefficients are different. Rather, we wish to know how close the estimate with any other weight is to the estimate using the couple weight. To calibrate, we consider the joint distribution of this measure and the percentage difference measure. Specifically, when the new variable is less than 0.08, 90 % of the percentage differences are less than 5 %. (For comparison, when the new variable is less than 0.038, 95 % of the percentage differences are less than 5 %.) We then use this information to choose 0.08 as a cutoff, with values above that providing an alert to a bias of potential consequence.^{13}

The top panel of Table 5 shows results for bias using the combined criteria of estimates with a relative bias of greater than 5 % and absolute bias ratio to SE above 0.08. The percentages of estimates with considerable biases are trivial for means and proportions except when no weight is used. They are largest for binary and multinomial logistic regression coefficients, reaching about 20 % when either male or female weights are used. Over one-half of regression estimates using unweighted analyses have considerable biases; indeed, more than one-half had a percentage bias of above 5 % (not shown). The pattern seen with the deviation measure alone is reversed with the combined criteria, with estimates of means and proportions having higher bias with the deviation measure alone but lower bias than regression coefficients with the combined criteria (not shown).

Boxplots of percentage bias estimates for the SEs are shown in Fig. 2, panel b. The biases for SE estimates are much lower than the bias estimates for means, proportions, and regression coefficients. Across all four types of analyses, at least 75 % of SE estimates have an absolute bias of less than 2 %. Because couple weights can be expected to produce SE estimates that are larger than those derived with alternative weights, it is also informative to calculate ratios of SEs (i.e., SE(with given weight) / SE(with couple weight)). Boxplots of these estimates are shown in Fig. 3, panel a. Interestingly, the median values are all very close to 1.0: a tendency for SEs with the couple weight to be larger than SEs with alternative weights is not observed because in that case, ratios below 1.0 would predominate.

The bottom two panels of Table 5 and panel b of Fig. 3 (note the log scale for the figure) summarize the ratio of estimated MSEs to the estimated MSE with the couple weight for the four types of analyses (means and proportions, linear regressions, binary logistic regressions, and multinomial logistic regressions) using the five alternative weights (ALT, EST, male, female, and 1.0 (i.e., none)). Comparing the performance of male and female weights relative to the couple weight, the mean ratios are lower for estimates using female weights in three of the four analysis groups. On the other hand, the percentages of MSE ratios above 1.0 are similar with either female or male weights (third panel of Table 5). Overall, the MSE ratios using the proxy couple weights are the closest to those using the actual couple weights both in terms of mean MSE ratio and percentage of MSE ratios greater than 1.0. In unweighted analyses, the MSE becomes relatively much larger; this is almost entirely due to bias because, as one would expect, the SE estimates are lower with unweighted data than with couple data in 88 % of the cases (not shown).

Researchers often base conclusions from analyses only on results that show statistical significance. Therefore we considered cases where regression coefficient estimates were significantly different from 0 (at *p* < .05, *p* < .01, and *p* < .001 levels) with the couple weight but were not significant with one of the other weights, and vice versa. Of the 946 (of 1,733) coefficients that are significant with couple weights, the sensitivities (significant results) of female, male, ALT, EST, and weight = 1.0 are 0.987, 0.986, 0.989, 0.991, and 0.963, respectively. However, these include cases where, for example, the coefficient estimated with the couple weight is significant at *p* = .0007 and the coefficient with another weight is significant at *p* = .044. To narrow the comparison, we examine coefficients that are significant with the couple weight at *p* < .05 but not including those significant at *p* < .01 and consider correct classification if another weight gives an estimate that is significant at *p* < .05 or lower. With this criterion, the sensitivities drop to 0.937, 0.932, 0.948, 0.953, and 0.827 for the female, male, ALT, EST, and weight = 1.0, respectively.

Regarding specificity, for 787 coefficients with a *p* value greater than or equal to .05 with the couple weight, the specificities are 0.984, 0.977, 0.991, 0.989, and 0.881 for the female, male, ALT, EST, and weight = 1.0, respectively. In summary, both sensitivity and specificity of statistical significance of coefficients are above 97 % for both the male and female weights. Thus, if a researcher is interested only in statistical significance rather than the actual values of the estimates, the female and male weights provide good sensitivity and specificity for those traditional statistical tests. However, both the ALT and EST weights out-perform the female and male weights on these measures.

For comparisons with published analyses of couple data from Turkey, Table 6 displays the coefficient estimates from the multiple logistic regression predicting current contraceptive use employing the couple weights we derived and the parallel estimates using female weights as used in the published paper. Considering percentage bias, for 8 of the 27 coefficients, the relative bias of the published estimate is greater than 4 %. Regarding the statistical tests of coefficients, one coefficient is statistically significant when the women’s weights are used but loses significance when the couple weights are used, and one coefficient shows the opposite pattern. With regard to MSE, 8 of the ratios are below 100; 18 ratios are 100 or above, indicating that the couple weight out-performs the female weight.

From the same regressions of couple contraceptive use in Turkey but using other possible weights, the average percentage biases are least when the two proxies for the couple weight are used and are highest when no weight is used (Table 7). These results are true both when the stratification variables (urban/rural and region of the country) are included and when they are excluded from the regressions. It is noteworthy that bias estimates are very large if weights are not used, even when the stratification variables are included in the model.

### Simulations

Table 8 summarizes the 27 simulations with the created sample of about 100,000 derived from the Burkina couple data (1,146 couples × 87 replications = 99,702 cases). Most of the simulations produce estimates with the women’s or men’s weights that are very close to those when couple weights are used. Estimates where the coefficient with the couple weight was not significant and therefore close to 0, and thus yield a very large percentage bias, are not shown (these are indicated by –– in the table).

For 10 of the 18 estimates using women’s weights, the maximum percentage bias is below 1 %; the same was true for 11 of the estimates using male weights. One of the coefficients for the difference in years of schooling between spouses is not near 0, yet percentage differences using women’s or men’s weights are nontrivial (e.g., an average of 3 % using women’s weights). The median percentage bias for this coefficient across the 27 simulations is 3.1 % with female weights (with the maximum being 3.8 %, as shown in the table) but 1.1 % with male weights. Overall, bias is quite small with either female or male weights.

## Discussion

Many researchers in sociology, economics, demography, and other fields are interested in studies of couples in household surveys. Specifically, for studies of household decision-making, a couple’s sexual behavior, relationship quality, fertility preferences, infertility, domestic violence, interracial or interethnic marriages, work-family conflicts, and so on, analyses of couple data are often more appropriate than analyses of data from one partner (usually the woman) alone. When survey data are collected from both men and women in the same household in social, health, and demographic surveys, and the data from individual partners are matched into couples, researchers must determine the appropriate weight to use in couple analyses. Ideally, appropriate couple weights can be derived and assigned to each couple. In practice, however, this is rarely the case. Thus, researchers must determine how to construct an estimated couple weight.

This article outlines a way to obtain couple weights (where the necessary data are available) and two proxy couple weights (in cases when all eligible couples cannot be matched) that yield more accurate estimates in tabulations and regression analyses of couple data than is possible with the typical options of using weights derived for all males, weights for all females, or an average of these or some ad hoc combination of weights (as described in the introduction) when analyzing couple-level data.

Using data from 11 DHS surveys, we find that relative to the value when couple weights are used, the biases in estimates for couple data when female or male weights are used instead are usually small but can be substantial (over 5 %). Although one would expect that the MSE would be larger when couple weights are used because of larger variation in those weights, this proves to be false: about one-half or more of MSE ratios are above 1.0 depending on the alternative weight used, indicating that the MSE with couple weights in those cases is less than the MSE with any of the other weights. Also, in the regression analyses of contraceptive use with the Turkey data, most of the MSE ratios are above 1.0 with the women’s weight as Kulczycki (2008) used; that is, the couple weight generally has a lower MSE and thus performs better despite having larger variances when it is used. Furthermore, inappropriately using the female weight produces statistical significance for one coefficient where the couple weight did not, and vice versa, leading to potentially different conclusions from the same analyses. The same logistic regression analyses with Turkey data and the other weights show moderate levels of bias in coefficients.

Estimated weights ALT and EST yield regression estimates closer to those produced with couple weights. Because results could differ if response rates change, we simulate varying response rates and analyze how bias estimates change. As with the results using observed survey data, estimates of biases for the vast majority of coefficients in the multinomial logistic regression analyses are less than 3 % using either women’s weights or men’s weights.

Except for the 11 surveys used here, it is not possible to identify all couples (including those in which both partners respond, only one responds, and neither responds), in the DHS household questionnaires used in scores of surveys done by ICF International. Thus, the couple weights for many existing surveys cannot be calculated with the available data. However, the two proxy weights can be calculated for all DHS surveys, and either of these is a good alternative. For existing studies, the DHS recommendation to use male weights instead of female weights is supported by these analyses: male weights usually produce less biased results than female weights. In the absence of a couple weight, however, the proxy weights perform considerably better than the male weights. For other surveys that include couples (e.g., SIPP, PSID, and CPS), derivation of a couple weight following the logic given here would be appropriate and preferred to the current, rather ad hoc weights.

## Limitations

Limitations of these analyses need consideration. First, lacking any other information, we assume that the estimates using the couple weights are unbiased. However, this will not be the case whenever nonresponding couples have different outcomes than responding couples within weighting cells. The proportion of nonresponse in these DHS surveys is relatively low, so distortions when the assumption is false will typically be less than when nonresponse is higher. Also use of a couple weight removes unnecessary additional bias due to the inclusion of response rates for unmarried individuals in the calculation of female and male weights and the use of these weights with couple data, given that unmarried individuals do have different characteristics and outcomes than married persons.

A limitation of the particular analyses of the 11 surveys is that response rates for couples are based on small numbers in a few domains. Whereas the median number of couples per domain in the 11 surveys (99 domains) was 151, 11 had less than 50, and 4 had less than 20 couples (all four in the Kenya survey). However, these few cases have only a minimal influence in the analyses of couples for the entire survey of Kenya. The results presented in the boxplots are trivially changed when the Kenya data are excluded (not shown). The overall patterns and conclusions remain the same: the results are quite robust to having a few domains with small numbers of couples.

Another limitation is that the simulation pertains to only one multivariate analysis in one survey. However, we examine 18 regression coefficients, the range of response rates that we simulate cover more than 95 % of the response rates observed in the eleven surveys, and sampling variability is not an issue given that there were 100,000 cases. The pattern that emerges is that for most coefficients, different levels of response rates for single men and women do not distort estimates to any large extent, with only a few exceptions.

An unexpected finding is that using the ALT estimate to derive a couple weight (which assumes that no couples had nonresponse for both partners) produces estimates of means, coefficients, and standard errors that are not more biased than those estimated with a proxy couple weight derived using the Chandra-Sekar–Deming technique to estimate the number of couples with nonresponse for both partners. In most of the surveys, the number of couples in this category is relatively small. Specifically, of the 99 strata in the 11 surveys, 59 % have numbers of couples with both partners providing incomplete interviews that are less than 2 % of the number of couples with both interviews completed. Of course, it is the distribution of these couples across the strata within a country and their association with the outcome variable of interest that are important in determining the level of bias, but in these fairly extensive analyses, the addition of estimated counts of these cases does not improve estimation.

Where there is polygamy, the matter of multiple wives for one husband in the sample needs consideration. The same weight is assigned to each of the completed couples that includes the polygamous husband. Note that if any co-wife is not coresident in a household, then a couple cannot be matched. Therefore, the couples’ sample is correctly weighted for polygynous couples. (Handling the implicit hierarchal nature of the data when husbands in polygamous couples are included multiple times is a separate matter.)

## Conclusion

In summary, because response rates usually differ by marital status in demographic and health surveys and couple-level response is lowered whenever either of the partners is a nonrespondent, surveys that include couple-level data should include couple-level weights that can be derived as outlined in this article. As a particular application, we strongly recommend that DHS provide the relevant couple weight for future surveys so that researchers can use it in analyses of the couple data. Where it is not possible to calculate the actual couple weight because line numbers of partners are not included in the household form, results from these analyses show that creating proxy couple weights by estimating the universe of eligible couples in the sample is still better than using either the male or female weight when considering couple outcome variables.

## Acknowledgments

We thank Tom Pullum, Ren Ruilen, and Mahmoud Elkasabi of ICF International for comments on an earlier draft of the manuscript; Bryan Sayer for helping with the original formulation of this research; Qingfeng Li, Chuck Rohde, and Saifuddin Ahmed for giving advice on the equations; and Scott Zeger and Larry Moulton for advice on the simulations. Also thanks go to Abishek Singh and Visseho Adjiwanou for trying out these methods already. We are grateful that funding for this research was provided by Grant R03HD068716 from the National Institute for Child Health and Development.

### Appendix: Derivation of Couple Weight and Relationship to Household Weight Available in DHS

Here we derive a couple weight for the general sampling design of DHS of two-stage sampling within each strata—that is, sampling of clusters within strata and then households within a selected cluster. Algebra to derive the couple weight from the normalized household weight provided with the DHS surveys is also given. Women’s, men’s, and couple weights differ from the household weight and each other only because of different response rates of these groups.

### Notation

*i* = strata identifier

*j* = cluster identifier

*h* = household identifier

*l* = identifier of person or couple in the household

*n*^{h} = total number of completed households in the sample

*n*^{w} = total number of completed women in the sample

*n*^{m} = total number of completed men in the sample

*n*^{c} = total number of completed couples in the sample

*I* = number of strata

*J*_{i} = number of clusters in strata *i* that are selected

*H*_{ij} = number of households selected in cluster *j* in strata *i*

*z* = identifier for woman (*z* = *w*), man (*z* = *m*), or couple (*z* = *c*)

*L*_{ijh} = Number of eligible *z* in household *ijh*

However, by design in DHS, $pijh3$ = *p*^{3}, a fixed constant, given that the sampling design calls for a fixed proportion of households to be sampled for the male interview in addition to the household and female interviews.

*i*}

Whereas $pij1$, $pijh2$, and *p*^{3} are probabilities from the sampling design, $rih$ and $riz$are proportions estimated ex post facto. These latter proportions are typically computed at the strata level in DHS, but clearly the equations could be modified for other designs.

However, for many applications, it is desirable to have a normalized household weight (that sums to the sample size of households). So let the normalized weight be $wijhh$. Let *S* denote the set of all households with completed questionnaires. Then define the indicator function:

*1*[*h* ∊ *S*] = 1 if selected household *h* (in stratum *i* and cluster *j*) had a completed household interview, and 0 otherwise.

*p*

^{3}= 1.0.

Define *B* to be the set of all completed interviews of individuals/couples *z*, and let *1*[*l* ∊ *B*] = 1 if selected household *h* in stratum *i* and cluster *j* had a completed interview(s) of the *l*th individual/couple, and 0 otherwise; this is also 0 if there is no eligible woman/male/couple in the household.

#### Obtaining the Couple Weight From $wijhh$ and $ric$

*k*= (

*p*

^{3}×

*n*

^{h}/

*T*

^{h}); that is,

*n*

^{c}.

In our analyses, we utilize women’s, men’s, and the couple weights as well as two proxy couple weights. The two proxies estimate the proportion of couples completed from information available on individual in-union persons. Specifically, following the Venn diagram of Fig. 1, information on couples in which both partners were nonrespondents is not available in the DHS survey data (other than in these 11 surveys). However, in virtually all DHS surveys, it is possible to enumerate wives whose husbands are nonrespondents and husbands whose wives are nonrespondents because spouse’s line number is given in the completed individual questionnaires. Thus, one proxy (ALT) adds these two numbers to the completed couples to form the denominator of the couple response rate—that is, the estimate of $ric$. For the other proxy (EST), the Chandra-Sekar–Deming method is used to estimate the number of couples in which both partners were nonrespondents, assuming independence of the probabilities of nonresponse.

## Notes

^{1}

The husband’s weight is preferred rather than the wife’s weight because response rates are usually lower and more variable for men than for women, so the inclusion of the couple in the sample depends more on completing an interview with the husband than with the wife.

^{2}

The probability that an eligible couple resides in the household is not needed.

^{3}

For DHS surveys, probabilities *p*1 to *p*3 and *r*^{h} are already incorporated in the household weight, which ICF International provides with the household survey data. Therefore, using DHS data, it is necessary only to estimate *r*^{c} and then multiply it by the inverse of the household weight. To form the couple weight, this result is inverted and normalized to sum to the sample size for couples with completed information. Details of the algebra are shown in the appendix.

^{4}

In the DHS, the sampling domains in a survey usually correspond to regions of the country.

^{5}

The use of sampling weights in regression is open to debate. DuMouchel and Duncan (1983) and Deaton (1998) showed that coefficients estimated from both sample-weighted and sample-unweighted analyses are not consistent estimators in the general case (e.g., when coefficients vary across sampling strata). However, Deaton (1998) argued that for regressions that are meant to be descriptive, the weighted estimates are preferred. Also, for the results to retain representativeness at the national level, weights are essential. Winship and Radbill (1994) showed that if the weights are *solely* a function of independent variables, then unweighted analysis is more efficient. This condition can be met by inclusion of indicator variables for sampling strata in the model. Of course, the assumption that coefficients for other covariates do not vary across strata is also implicit.

^{6}

The value of 5 % is only somewhat arbitrary. Specifically, of the over 100 DHS surveys available in which couples can be matched, the largest sample was in Nigeria (2008) with 8,731 couples. An exception is The India National Family Health Survey of 2005/2006, which actually had 39,000 couples. Using the average design effect for the survey of 3.3, this yields an effective sample size of 8,731 / 3.3 = 2,646. One-half of the width of the 95 % confidence interval for a proportion in a sample of this size is given by $1.96\xd7p\xd71\u2212p/2,646$. Choosing *p* = .5 maximizes the estimated variance and gives a value of 0.019, or about 4 % (0.019 / 0.50). For surveys with smaller effective sample sizes, standard errors would be greater than this. Thus biases of less than 4 % or 5 % would nearly always be within sampling error for the usual DHS surveys. Other surveys with couples usually have the same order of magnitude or smaller sample sizes so the same calculation probably applies.

^{7}

The true bias of these survey estimates with any given weight is actually unknown because that would necessitate population-level data for comparison, which are not available in most developing countries. Even the couple weight will be correct only if all of the following are true:

The sampling was accurate: (a) the sampling frame was up-to-date; (b) the sampling probabilities were correctly calculated; and (c) the sample implementation was correctly done.

All eligible couples in selected households were correctly identified.

The characteristics of nonresponse households and nonresponse couples have the same distributions as those of sampled households and couples with completed questionnaires.

Assumptions 1a, 1c, and 2 can be checked by completing these steps again with independent interview teams, although it is still only reliability that is measured rather than validity. Assumption 3 is likely to be violated, although judicious formation of weighting cells can reduce the bias. However, given these assumptions, the couple weights are the correct weights for analyses of couple data rather than weights for all women or all men. For conciseness of language, in the exposition of results, we will treat the estimates using the couple weights as unbiased and often refer simply to bias in other estimates compared with the estimated value using the couple weight.

^{8}

Kulczycki (2008:131) noted that data were “weighted by the couple weight provided by DHS,” but this turned out to be wrong; the author had used female weights (personal communication, Andrzej Kulczycki, 2011). Also, because the DHS public-use couple data file was updated between the time of the published analyses and the present analyses (the number of couples was reduced from 1,971 to 1,906) and because there were several inconsistencies in coding in the article (e.g., his table 5 contains no variable to identify 399 couples with both partners younger than 30 years of age), we could not replicate exactly those results. However, our results with women’s weights are close.

^{9}

If response rates were to reach 100 %, then all the weights would be the product of some constant and the household weight. That is, separate weights for women, men, and couples would not be needed; only normalization to equal the sample size of the women, men, or couples would be needed.

^{10}

The simulation for the scenario with the original female and male response rates matched with the original analyses to three significant digits.

^{11}

For reasons of confidentiality, DHS does not retain the household sampling probabilities. However, in Burkina Faso the sampling was done such that the probabilities were identical for all households in a strata and with information from the survey report, they could be reconstructed, albeit tediously.

^{12}

To explore the extent to which our estimates of bias were dependent on the scale of covariates used in the regressions, we reestimated the regressions that have number of children ever born as a covariate but this time using an ordinal recoded covariate (parity 0–2 = 0; 3–4 = 1; 5+ = 2). The results were nearly identical for absolute percentage difference in standard errors, the ratio of standard errors, and the ratio of MSE. However, the absolute percentage differences in estimates (from the result with the couple weight) were different (not shown). With the ordinal variable, the median of these percentage differences was higher for the grouped variable for all weights except the female weight (an average of 16 % higher). Interestingly, the means of these percentage differences were consistently lower for the grouped variable. In summary, although the estimates of standard errors and MSEs were quite robust to the scale of covariates, the estimates of percentage bias alone varied somewhat according to the scale of the covariate. Because covariates in our analyses include both binary (urban/rural) and continuous (e.g., age) scales, the results summarized across surveys are probably close to what would be found with other variables and surveys. This also further strengthens the usefulness of the deviation measure.

^{13}

Of course, the values of the deviation measure depend on the sample size of couples. However, a regression of the proportion of the deviation measures above 0.08 by sample size across the 11 surveys does not produce a significant coefficient (*p* = .94).