## Abstract

Starting in 2006, respondents in the biennial U.S. Health and Retirement Study were asked to submit biomarkers every other wave and were notified of several results. Rates of undiagnosed high blood pressure and diabetes according to these biomarkers were 1.5 % and 0.7 %, respectively. An intent-to-treat analysis suggests that collection and notification had small effects on the average respondent and may have reduced health care utilization. Among respondents who received notification of potentially dangerous biomarker levels, subsequent rates of new diagnosis and associated pharmaceutical usage increased by 20 to 40 percentage points, an order of magnitude above baseline. High blood glucose A1C was associated with a 2.2 % drop in weight and an increase in exercise among respondents without a previous diagnosis of diabetes. Notifications appear also to have altered health behaviors by spouses, suggesting household responses to health maintenance. Biomarker collection seems to have altered circumstances for an interesting minority of HRS respondents.

## Introduction

The ability of biomarkers to enhance knowledge about the determinants of healthy aging, longevity, and disparities has spurred great interest (Weinstein et al. 2007). Biomarkers convey rich information not only to researchers but potentially also to survey respondents. Principles surrounding the protection of human subjects—and in particular, informed consent—may require a data collection team to inform the research subject about observed biomarkers when the withholding of such information could be harmful.1 The natural question is whether notifying individuals about their biomarker readings causes any changes in behavior or circumstances.2

A related topic concerns responses to learning HIV status, usually measured in a developing country (Boozer and Philipson 2000; Delavande and Kohler 2012; Godlonton and Thornton 2013; Gong 2015; Thornton 2008, 2012). Many of these studies have assessed this question by exploiting random variation in testing or in incentives to receive results. As one might expect from a behavioral economics perspective, the effect of knowledge depends on context and salience. In particular, behavioral change is typically concentrated among those for whom the knowledge was new or unexpected (Boozer and Philipson 2000; Gong 2015).

In this article, I examine behavioral responses to notifications of biomarker levels within the U.S. Health and Retirement Study (HRS), a panel survey of older Americans that began collecting samples midstream. Because sampling was restricted to a randomly selected half of the sample in the first year, I can assess the intent-to-treat (ITT) effects of collecting biomarkers on the average respondent with strong identification. These effects are interesting but small: much of the action occurs within subsamples of respondents who were surprised by the results, consistent with findings in the literature on HIV. Panel analysis techniques reveal that behavioral responses among these subgroups were substantial and striking, with reactions spread across spouses within households. Although there is no formal control group, observations during the panel of the treated before and after treatment provide a quasi-experimental study design that can be leveraged by a difference-in-differences approach.

Estimates of the ITT effects suggest that biomarker collection may have reduced average rates of doctor visits and prescription medication usage by approximately 1.5 %. This proportion is small relative to baseline average rates of 80 % to 90 %, and the reduction does not extend to the average number of doctor visits. Still, it is an interesting result that could reflect perceptions that biomarker collection substitutes for a physical examination, which may not be surprising given the high rates of insurance coverage and health care utilization measured among all HRS respondents. And to be sure, altering health insurance coverage or care for all Americans at these ages has not traditionally been a priority for policy.3 Rather, the focus has been either on lack of insurance (more common among younger Americans) or on understanding health disparities. The natural experiment leveraged by this study is not ideal for investigating the latter because the most interesting treatments—unexpected screenings outside normal range—are rare.

In contrast with the broad ITT effects of collection, responses by subgroups with biomarkers outside normal range—especially those who did not report a previous diagnosis of the underlying condition and were presumably surprised—were larger and consistent with medical advice. For respondents unaware of having high blood pressure, receiving notification of that condition raised the prevalence of diagnosis and medication usage two years later by approximately 20 percentage points. Effects for respondents with blood hemoglobin A1C higher than 7.0 (a standard threshold measurement used to indicate diabetes) were approximately 40 percentage points in increased diagnosis and medication usage for the previously undiagnosed. Self-reported weight appears to have fallen among this group of undiagnosed diabetics by approximately 2 %, while frequency of physical activity increased. Spouses appeared to react to the abnormal biomarker readings of their partners in addition to their own, revealing household-based approaches to health production. In particular, an undiagnosed spouse’s high A1C reading also increased own exercise, while a diabetic spouse’s reading appears to have reduced own weight, suggesting responses of a shared input such as diet, which is unfortunately not measured by HRS. Spouses of respondents with uncontrolled high blood pressure reduced their drinking and binge drinking along with respondents.

Finding that healthy behaviors are mediated or determined in part by household structure, and in particular by a spouse, is not unexpected. A rich literature has examined the protective effect of marriage, and evidence suggests that part of it reflects causality running from marriage into good health (Goldman 1993; Guner et al. 2014; Hu and Goldman 1990; Rendall et al. 2011; Stevenson and Wolfers 2007). Goldman and Smith (2002) showed that better self-management among the educated appears to explain a significant part of the socioeconomic gradient in health, and that marriage is associated with better health management for men, especially the highly educated. Falba and Sindelar (2008) reported that improvements in a wide range of healthy behaviors tend to move in tandem between spouses in the HRS, with no clear asymmetries across sex. Using a different panel data set, Reczek and Umberson (2012) identified variation in household health production techniques between heterosexual and homosexual unions and between the sexes. Umberson et al. (2010) surveyed the literature on social ties and health more broadly. This article adds to the literature by examining the effects of a natural experiment—namely, the beginning of biomarker collection—on household behavior, whereas earlier investigations are more descriptive in nature.

The results of this study are interesting in view of their implications for behavior and policy, and for interpreting dynamics in panel studies with biomarker collection. Estimated effects of abnormal biomarker notification can be statistically significant and meaningful for the individual but are often relegated to the 1 % or 2 % of the eligible sample who screened positive without prior knowledge of the condition. Biomarker collection and notification—or by extension, visits by trained medical professionals—thus have real but extremely circumscribed average effects on outcomes among Americans older than 50.4 Analysis of panel data sets that include biomarker collection and notification is likely prone to small amounts of bias unless the panel information is used to control for the effects of notification, which appear real but not as large as the effects of mode of interview on self-reports. The tight correlation between mode of interview and biomarker collection in the HRS implies that an analysis of either should account for both.

## Biomarkers in the Health and Retirement Study

The Health and Retirement Study (HRS) is a biennial panel survey of Americans aged 50 and older in households that is sponsored by the National Institute on Aging (NIA) and conducted by the University of Michigan’s Institute for Social Research (Juster and Suzman 1995). Originally begun in 1992 with a sample of Americans born between 1931 and 1941, the HRS was merged with a sister data set and expanded in its fourth wave in 1998 to represent Americans aged 50 and older; it has also added new birth cohorts in order to maintain representative coverage. The eighth wave of data collection in 2006 contained 18,469 respondents.

Starting in 2006, the HRS expanded to include biomarkers and several other new measures in enhanced face-to-face (EFTF) interviews conducted on randomly selected rotating halves of the panel each wave. Consenting respondents would thus submit biomarkers once every four years. During EFTF interviews, interviewers measured and collected biomarkers, including blood pressure, pulse, and saliva and blood samples; conducted tests of grip strength, breath, balance, and walking; and left behind a questionnaire on psychosocial topics. Weir (2007) described biomarker collection as occurring roughly in the middle of the EFTF interview, after the self-reports of health, height and weight, and disability. Further details are discussed in Online Resource 1.

A key feature of biomarker collection in the HRS was the commitment to notifying respondents about four biomarker results: blood pressure, A1C, total cholesterol, and HDL cholesterol. Notification of very high blood pressure occurred immediately, and notification by mail of all four results followed within about a month for all respondents who submitted biomarkers. When the minimum of three measures of blood pressure exceeded 160 mmHg systolic or 110 mmHg diastolic, HRS interviewers left behind a “high blood pressure card” similar to that depicted in Fig. S1 (Online Resource 1). The card, which in the figure shows the average readings for the subsample that received it in 2006, recommends that the respondent see a physician immediately, with bold and underline emphasis on the word “immediately.” Later, all recipients received a mailed notification letter reporting all four biomarker levels, the recommended normal ranges, and suggestions to see a physician if the biomarkers were outside normal range. A representative mockup of the notification letter is shown in Fig. S2 (Online Resource 1), which lists biomarker levels for the sample average and the thresholds as they were specified in 2006. The technical appendix (Online Resource 1) describes the thresholds and mailout notification process in more detail.

As shown in Table 1, conditions indexed by biomarker thresholds ranged between rare and common in the sample, and prior knowledge of conditions also varied. Of the 7,127 individuals who submitted biomarkers in the eighth wave of the HRS in 2006 and were interviewed again in 2008, 412 (5.8 %) received the high blood pressure card, and 387 (5.4 %) had A1C levels of 7.0 % or higher. A majority of individuals in these two subgroups also reported a preexisting diagnosis of the underlying disease in 2006, but 25 % and 12 % of them, respectively, had not, suggesting rates of undiagnosed high blood pressure and diabetes of 1.5 % and 0.7 %. Individuals with HDL cholesterol below the recommended threshold of 40 mg/dL were only 8.5 % of the sample, approximately one-half of whom were already on cholesterol medication. In contrast, the prevalence of high blood pressure above 120/80 mmHg and total cholesterol above 200 mg/dL—the thresholds listed in the notification letters—were much higher, often approaching one-half the sample. Those most at risk for screening outside of normal range on these four biomarkers were disproportionately male, African American, Hispanic, not homeowners, and did not report health insurance coverage.

### Treatment and Control Groups

Figure 1 provides a visualization of treatment and control groups in the HRS panel. The group of respondents C2006 shown at left submitted biomarkers in 2006; within this group are the partially overlapping subgroups A2006 and B2006, those who screened outside normal range and those who had a preexisting diagnosis, respectively. Subgroups D2006, E2006, and F2006 were assigned to the 2006 biomarker group but did not or could not comply with the treatment. With an ITT perspective, the average treatment effects of biomarker collection are revealed by comparing the entire 2006 biomarker group shown at left with the entire 2008 biomarker group shown at right, in which there are analogous subgroups that are not all observable in 2006. A generalized approach that will prove more useful is a panel fixed-effects (FE) regression using pooled data up to 2008:
$yit=αi+∑tDt+βITTbit−12006+XitB+εit,$
1
where the αi are individual fixed effects; the Dt are wave or time dummy variables; the Xit are additional time-varying controls, such as age and marital status; and εit is white- noise error. The parameter of interest is βITT, the ITT estimate of the effect of being randomly assigned to the 2006 biomarker group in the prior wave, measured by the indicator variable $bit−12006$.
To assess the effects of screening out of normal range, one would ideally like to compare the treated in group A2006 in Fig. 1 with controls who screened out of normal range but were not told. Unfortunately, this control group is the unobservable group A2008.5 A feasible estimation strategy leverages the panel nature of the data. With a pooled data set of all respondents, a panel FE estimator of the effects of screening outside normal range draws identification from the behavior during the panel of all subgroups, including the unobservable control group A2008 and the treated group A2006. In particular, past observations of A2006 serve as their own controls. This equation takes a similar form:
$yit=αi+∑tDt+∑k=04βkzkit−1+XitB+εit,$
2
where the zkit − 1 are indicators for the biomarker notifications sent out after the previous wave, diagrammed in Fig. S3 in Online Resource 1. The βk are not identified by the same randomization as βITT but instead by the panel variation within groups.

## Mode of Interview

A complication is that biomarker collection and telephone interviewing are inversely correlated in the HRS, and both appear to affect self-reported outcomes. Biomarkers are collected in person via EFTF interviews, but the HRS has historically been a telephone-based survey, with in-person interviews limited to first-time respondents, nursing home residents, and others for whom a telephone interview was inadequate. If this were always the case, disentangling the effect of mode of interview on self-reports would be problematic. Luckily, though, the 2004 wave was entirely unique in that HRS fielded face-to-face interviews for roughly 80 % of the sample in order to improve consent rates for Social Security linkages. This is revealed by Fig. 2, which plots the share of the groups assigned to biomarkers in 2006 and 2008 interviewed by telephone in each wave starting with 1998. Mode of interview shifted dramatically first in 2004, falling from approximately 80 % on telephone to 80 % in person, and then again in 2006 with the initiation of regular biomarker collection. Since then, the group not asked for biomarkers is almost exclusively telephoned each wave, while the other group receives EFTF.

Patterns in the data suggest that mode of interview affects several important self-reports in the HRS. Figure 3 shows averages and confidence intervals in self-reported weight since 1998 separately for the two groups who submitted biomarkers in 2006 and in 2008. The two track each other closely prior to 2006, the year when they sharply diverge. Before they submitted biomarkers during the 2006 wave, respondents told their EFTF interviewers that their weight had risen by about 0.25 kg; the rest of the sample on telephone interview indicated they had not gained any weight, on average. A similar pattern appears in 2008, when the group on telephone interview indicated they lost an average of 1 kg, while those submitting biomarkers in person said they had gained approximately 0.5 kg. Similar fluctuations are apparent in self-reported current smoking behavior. The most plausible explanation for this is that face-to-face interviews elicit more accurate responses about characteristics with a visible or otherwise apparent component.

When mode of interview and biomarker collection both affect self-reported outcomes and are themselves tightly correlated, not only are simple mean comparisons between the rotating biomarker groups confounded, but so too are simple differences in differences, as discussed in Online Resource 1. One could avoid this problem by comparing outcomes within rather than between rotating biomarker groups, but the ITT estimate could not be obtained in this way, and rigorously estimating the effects of screening outside normal range requires better control groups than the members of the 2006 biomarker group who did not screen outside normal range. A better alternative is to alter the panel FE regression equations to control for effects of interview mode, which is identified by the exogenous variation in the 2004 wave. Inserting an indicator for interview mode mit into the regression equations produces
$yit=αi+∑tDt+βITTbit−12006+γmit+XitB+εit,$
3
and
$yit=αi+∑tDt+∑k=04βkzkit−1+γmit+XitB+εit.$
4
A final modification is to interact the biomarker notification indicators zk with indicators of preexisting diagnoses at t − 1 of the health conditions associated with the biomarkers, such as high blood pressure or diabetes. With these interactions, Eq. (4) becomes
$yit=αi+∑tDt+∑c=01∑k=04βkczkit−1c+γmit+XitB+εit,$
5
where the treatments $zkc$ are now defined as screening outside normal range either with (c = 1) or without (c = 0) a previous diagnosis of the condition.

## Results

### The Sample Submitting Biomarkers in 2006

The rows in Table 1 list an array of salient characteristics of the 2006 biomarker group stratified across the columns according to their biomarker results.6 Of the five subgroups shown, three small groups of approximately 400 respondents each screened positive for “rare and dangerous” conditions: blood pressure high enough to receive the high BP card (above 160/110 mmHg), A1C above 7.0 %, or HDL cholesterol below 40 mg/dL. Roughly 2,400 and 3,800 screened positive for high BP (above 120/80 mmHg but not over 160/110) or high total cholesterol (above 200 mg/dL), respectively—shares that approach one-half the sample. In 2010, the HRS revised upward its normal-range thresholds for these two biomarkers.

The broad messages in Table 1 are that screening out of normal range on these four biomarkers is associated with many preexisting characteristics; undiagnosed health conditions apparently exist even though insurance coverage rates and usage tend to be high; and although health care utilization and behavior may not vary much in the sample, there is room for behavioral responses. The top of the table reveals familiar socioeconomic patterns in disease, with African American and Hispanic males overrepresented among those with very high blood pressure or high A1C. Screening out of normal range may also be associated with marital status, homeownership, and health insurance coverage. The middle of the table shows that self-reported rates of physicians’ diagnoses vary across these groups but are not universally reported by those who screened out of normal range on the relevant biomarker, and that average rates of new diagnosis for these conditions hover around 2 % to 6 % between biennial waves.7 Health care utilization is high across the board, with 90 % or more reporting at least one visit to a doctor, emergency room, or clinic in the past two years, and pharmaceutical usage rates higher than 75 %. The bottom of the table shows reduced self-reported health and increased disability among respondents with high A1C, which appear to correlate with weight and less exercise, but also less drinking and somewhat less smoking. Aside from that and some limited evidence that respondents with high blood pressure behave less healthily, there is much similarity across groups. Smoking is not very prevalent, at approximately 14 %, but exercise and drinking could probably be altered. Unfortunately, the HRS does not measure diet.

With five distinct types of morbidities represented by screens outside normal range for each of potentially two spouses in the household, the structure of comorbidity is challenging to summarize. Details are in the technical appendix (Online Resource 1).

### Mortality and Sample Attrition

The HRS tracks mortality well (Weir 2010), and attrition is low compared with other household surveys (Banks et al. 2010), but might respondents screening out of normal range on these biomarkers have experienced differential mortality or panel attrition? I test for this among the 2006 biomarker group by separately modeling death or nonresponse in 2008 as functions of indicator variables for screens out of normal range and an array of socioeconomic control variables using probit and logit specifications.8 After I control for self-reported health conditions, scores outside normal range are not significantly associated with either mortality or attrition after two years.9

I find evidence of a small amount of differential attrition across the two biomarker groups. When I pool both groups and examine presence in the 2006 and 2008 waves, I find that assignment to the 2006 biomarker group is not significantly associated with mortality, but it reduces nonresponse in 2008 by 0.75 % below an average nonresponse rate of 4.2 %.10 Although this dynamic would tend to inflate any treatment effects associated with submitting biomarkers in a direct comparison of treatment to control, the effect will be very small. The panel analysis appears to draw most of its identification from comparisons of the treated to their previous observations, which are unaffected by this dynamic.

### Predictors of Screens Outside Normal Range

Table 2 displays selected marginal effects of respondents’ characteristics on the probability of screening outside normal range in 2006 on the five biomarker categories either with or without a previous diagnosis of the disease. HRS directly asks about high blood pressure and diabetes but not about diagnoses of high or low cholesterol per se. To proxy the latter, I experimented with using diagnoses of heart problems—the closest match among the questions—and results were comparable to what I show here based on analyses that instead differentiate by whether the respondent was taking cholesterol medication.11

Differences according to preexisting knowledge of the disease are evident in Table 2, as are familiar correlations between socioeconomic status (SES) and disease incidence, especially when uncontrolled. I find large and significant effects of many covariates—including sex, race, Hispanic origin, and weight—on the probability of rare and dangerous screens resulting in the high BP card or high A1C among those with preexisting diagnoses. These respondents could be termed “noncompliers” because they know they have the underlying condition but are unable or unwilling to control the biomarker. In contrast, most characteristics have small and insignificant effects on the incidence of rare and dangerous screens among those who might be called the “undiagnosed.” Although the latter are not unpredictable, the patterns in Table 2 imply that such screens are probably more unexpected. Finally, the marginal effects of not having health insurance on undiagnosed moderately high blood pressure and on the cholesterol screens are interesting but do not reflect a universal trend.

Results here also bolster the argument that there is room for behavioral change. Unfortunately, they also reveal that unhealthy behaviors are more significantly linked to the noncompliers than to the undiagnosed.

### ITT Estimates of the Effects of Notification

Findings thus far suggest that any new information conveyed by biomarker notification is likely to be limited in scope within the sample. Tables 1 and 2 show that approximately 1.5 % of participants in the biomarker wave received a high blood pressure card and did not know they had high blood pressure, and 0.7 % screened positive for high A1C and did not already have a diabetes diagnosis. If these notifications are most salient, one would expect the average treatment effects of biomarker collection and notification to be small.

Table 3 confirms this, reporting insignificant effects of assignment to the 2006 biomarker group on 18 of 20 outcomes in 2008 obtained by estimating Eq. (3) with a linear panel FE estimator on the pooled sample of all HRS respondents observed between 1998 and 2008.12 Two outcomes appear to respond significantly: having seen a doctor and regularly taking prescription drugs since the previous wave, which fall by 1.5 and 1.2 percentage points, respectively.13 Both results pass a falsification test of regressing contemporaneous variables on the assignment rather than its lag. Simple difference-in-differences estimation, valid here because mode of interview is insignificant in these two cases, roughly confirms these findings. Given later results that indicate elevated medication usage among some subgroups after biomarker notification and no effects on doctor visits, it is hard to know what to make of these two outliers. Respondents may have viewed biomarker collection as a substitute for a doctor visit or routine physical.

In 13 regressions, mode of interview is significantly associated with the self-reported outcome. On average, these effects are not large. Self-reported weight is 0.3 % lower for those on telephone interview, significant at the 1 % level but approximately one-sixth the size of the average percentage understatement of self-reported weight relative to objective weight as measured in the EFTF (Weir 2007). Without controlling for mode of interview, estimating Eq. (1) with log weight as the endogenous variable attaches that highly significant estimate of –0.003 to lagged biomarker assignment, a clear case of omitted variable bias.

### Effects of Screening Outside Normal Range

Although the average treatment effects on the entire 2006 biomarker sample tend to be insignificant, a different picture emerges when I model effects on individuals who received notifications of screens outside normal range. I observe statistically significant and interesting responses among individuals who screened positive for either of two of the three rare and dangerous conditions—very high BP or high A1C—and especially among those without a preexisting diagnosis of the underlying condition.14 These results pass a falsification test, and they are accompanied by some interesting effects of spouses’ notifications.

Table 4 reports estimates of the marginal effects of biomarker notification interacted with preexisting conditions on 20 outcomes using panel fixed effects applied to Eq. (5). The largest effects of screens out of normal range are on condition diagnoses and related pharmaceutical use by the 2008 wave, particularly among the previously undiagnosed. The second-largest coefficient in the table is the 42.8 percentage point increase in diabetes diagnosis among those who screened positive for high A1C and who did not report having diabetes, followed by the 39.7 % increase in the rate of diabetes medication usage among that group. Among those who received the high BP card without a previous diagnosis of high BP, diagnoses and drug usage increase 17 % and 16.1 %, respectively. Respondents with high BP but below the card threshold were 5.9 percentage points more likely to have a high BP diagnosis, and 3.9 % more likely to be on BP medication by 2008—increases that roughly double the usual rates shown in Table 1. Patterns in other diagnoses and medication usage are less clear.

Notifications are correlated with some measures of health and disability, but many of these patterns seem to reflect preexisting characteristics rather than plausible effects of the notification. This is revealed by the falsification tests in Table 5 that model outcomes in 2006 rather than 2008 as functions of biomarker notifications from the 2006 wave. The marginal effects on contemporaneous diagnoses and medication usage are as they should be: positive for the group that had them, and negative for the group that did not. However, coefficients on the disability indexes for those with high BP but no card are similar in Tables 4 and 5, suggesting that those are not effects of notification. Positive and significant coefficients on 2008 disability measures among those with high A1C and a diabetes diagnosis in Table 4 are not present in 2006, however, nor is the protective effect against instrumental activities of daily living (IADL) disability for respondents with high A1C but without a diabetes diagnosis; these findings imply that those notification effects may be real.

Fewer effects for the groups with high BP but no card or for those with cholesterol screens out of normal range can pass the falsification tests shown in Table 5, but they support a 14 % to 15 % increase in cholesterol medication usage among those with such screens who had not already been taking it.15 Another robust effect is the 4.9 % additional increase in diabetes diagnosis among those with low HDL cholesterol who were already taking cholesterol medication, roughly a doubling of the usual incidence reported by Table 1 and striking in light of concerns that using statins to treat cholesterol may raise the risk of diabetes onset (Sattar et al. 2010). Statins are thought to be more effective at lowering high cholesterol rather than raising HDL cholesterol, which makes this result and the lack of an association with high screens on total cholesterol somewhat puzzling. However, as shown in Table 1, the subsample in 2006 with low HDL cholesterol was more likely to be taking cholesterol medication than those with high total cholesterol.

The most interesting results in Table 4 concern weight, drinking, and exercise among undiagnosed diabetics (i.e., those respondents with high A1C but without a diagnosis). The coefficient on log weight for this subgroup in 2008 is –0.022, significant at the 4.9 % level (t statistic of –1.97). Table 5 shows no sign of any contemporaneous association from the falsification test. In addition to large increases in diabetes diagnosis and medication usage, this group also reports 0.306 fewer drinking days per week; the other insignificant coefficients on their drinking behavior are all negative; and although they may report less-frequent vigorous physical activity, the frequencies of moderate and especially light activity appear to rise. These outcomes are consistent with practitioners’ recommendations and appear to be the clearest evidence of behavioral responses.

Other effects are concentrated among the noncompliers who received the high BP card and already knew they had high BP. Among this group, and also among those with high BP but no card who did not already have the diagnosis, the coefficient on smoking is negative and significant (−0.023; t statistic of –2.06). Coefficients on drinking are also negative but insignificant among the noncompliers with the high BP card, with the largest coefficient of any significance in the table (–0.523; t statistic of –1.91), appearing here on the number of binge drinking days (4+ drinks on one occasion) in the past 90 days, fully one-third of the average response. Frequency of light exercise also falls among this group, which may reflect a redistribution of household chores following receipt of the high BP card.

Adding spouses’ notifications and their interactions with spouses’ preexisting conditions to Eq. (5) and reestimating produces a set of coefficients on own-notifications that are similar to those in Table 4 because own and spousal notifications are largely independent (see Online Resource 1). The new set of coefficients on spousal notifications appears in Table 6. Reduced sample size means fewer significant results, especially among diagnoses and other variables that measure own status. When effects on diagnoses and disability are significant, they are often also similarly signed and significant in the falsification regression (not shown) and thus probably not real.

One of the results that passes the falsification test is the marginal effect of a known diabetic spouse’s high A1C screen on the respondent’s own log weight of –0.015, significant at the 1.2 % level (t statistic of –2.52). This result is not mirrored by any reaction in own physical activity, suggesting that this weight loss may have been obtained through shifts in diet, which the HRS does not measure. For respondents whose spouses screened high on A1C but did not have a preexisting diabetes diagnosis, there was no response of own weight, while the frequency of light exercise increased significantly, roughly as much as reported by the spouse in question, with a coefficient here of –0.425 (t statistic of –3.13) compared with −0.372 (t statistic of –2.01) shown in Table 4.

Other behavioral responses by spouses in Table 6 concern drinking behavior. Evidence suggests a spouse’s high blood pressure causes reductions in own binge drinking. The coefficient of –0.962 (t statistic of –2.71) in the first column, for spouses of respondents who received the high BP card and had already had a diagnosis of high blood pressure, is greater in size and significance than the own-coefficient of –0.523 (t statistic of –1.91) in Table 4. If binge drinking is a shared activity, joint shifts might be mechanical.

#### Robustness and Persistence

I conducted a variety of robustness checks using alternative estimation strategies, which are discussed in Online Resource 1. The broad conformity of results suggests that much of the identification stems from changes observed during the panel among respondents who screened outside normal range.

## Discussion

Notifications of biomarker screens outside normal range in the 2006 wave of the HRS triggered significant changes two years later in outcomes and behavior among select subgroups of respondents. After biomarker collection, HRS interviewers immediately notified respondents of very high blood pressure by leaving notification (a high BP card), and HRS later mailed out to respondents the full results of their blood pressure, hemoglobin A1C, total cholesterol, and HDL cholesterol readings, as well as indications of whether those readings were outside normal range. Three types of the five possible notifications were “rare and dangerous”: the high BP card received by 5.8 % of respondents, high A1C registered by 5.4 %, and low HDL cholesterol registered by 5.4 %. Two of these notifications—very high blood pressure (above 160/110 mmHg) and elevated A1C (above 7.0)—produced extensive and interesting effects on outcomes and behavior.

The largest and most statistically significant effects were on rates of new disease diagnosis, which by definition affected the previously undiagnosed. Respondents who already knew they had the disease and either were unable or unwilling to control the biomarker—termed noncompliers for simplicity—also responded but in different ways. Rates of diagnosis among the previously undiagnosed jumped 17 % for recipients of the high BP card and almost 40 % for high A1C screens. Large increases in usage of associated medications also followed. Undiagnosed cases among the 2006 biomarker group were rare—only approximately 1.5 % in the case of the high BP card and 0.7 % with high A1C. Subsequent patterns in self-reported doctor’s diagnosis suggest either that significant shares were false positives or that respondents or their doctors did not find the information salient. Applying the observed increases in diagnosis rates by 2008 to the prevalence of abnormal screens among those without a diagnosis produces estimates of the lower bound of undiagnosed disease prevalence in the 2006 sample overall of only 0.3 % for each.

Behavioral adjustment was apparent and strongest among undiagnosed diabetics with high A1C, and it also appeared among spouses of these and other respondents. The 0.7 % of respondents with high A1C and without a diabetes diagnosis reported losing an average of 2.2 % of their body weight by the following wave, and they also reported less drinking and more frequent exercise.

Trends in objective measures of weight, collected in 2006 and again in 2010, lend support to this finding but are not definitive. Spouses of undiagnosed respondents with high A1C also reported increased frequency of light exercise, roughly in line with what their partners reported. These patterns were not found among the 4.7 % with high A1C who were known diabetics: they suffered deteriorations in self-reported disability rates and health status.

Spouses of these diabetics reported reductions in their own weight of 1.5 %, while spouses of the undiagnosed with high A1C reported no weight loss. Exercise and drinking among these spouses of noncompliers was unaffected, which suggests that weight loss arrived via a change in diet, unmeasured by HRS. This suggests households may take a multistage approach to diabetes management, focusing at first on exercise and then later on diet. Signs of rising disability among those with uncontrolled diabetes imply that such a strategy may become more feasible than exercise promotion as the disease progresses.

In contrast, changes in behavior were more visible among noncompliers who received the high BP card than among the undiagnosed. This might be expected if high blood pressure is viewed as less serious at onset than is Type 2 diabetes, which seems plausible. Respondents already diagnosed with high blood pressure who received the high BP card appeared to significantly reduce their smoking—by 2.3 percentage points, or approximately one-fifth of the baseline prevalence—and also reduced their drinking intensity. Their frequency of light exercise actually fell, possibly reflecting a reallocation of household chores away from the respondent. Although they did not report picking up the slack, spouses of these respondents reported less binge drinking, mirroring the effects of the screen on own behavior and suggesting that binge drinking tends to be a joint activity.

Patterns in spousal reactions to respondents’ screens are consistent with earlier results in the literature on social ties and household determinants of health (Falba and Sindelar 2008; Guner et al. 2014; Reczek and Umberson 2012; Rendall et al. 2011; Umberson et al. 2010). We know that healthy behavior is correlated within households, and this study suggests that exogenous changes in knowledge about one’s health prompt joint reactions in healthy behavior by both spouses. Small subsamples and power constraints prohibit exploring whether it is spouses with higher education or SES more broadly defined who are better able to leverage this dynamic, as Goldman and Smith (2002) found in an observational setting.

Why uncontrolled high blood pressure appears to trigger changes in own behavior while uncontrolled diabetes does not remains an open question. Diabetic HRS respondents and their doctors may have set A1C targets above 7.0, or rising disability among those with uncontrolled diabetes may impede exercise. Smoking is addictive, which may explain why it does not appear to respond even though it probably should be a part of diabetes control (Gunton et al. 2002). Drinking is already reduced among this subsample, per Table 1, and may not be easy to lower any further.

As Tables 1 and 2 reveal, the characteristics of respondents with high A1C suggest that uncontrolled diabetes is a condition associated with minority identity and lower SES. Whether these characteristics proxy for constrained choices or different preferences or knowledge is a perennial question in health economics, and the present study offers no new insights. With added detail from restricted HRS data, some headway might be made in understanding the determinants of response and nonresponse to notifications. More work could also be done to examine psychological covariates of these behaviors, given that the EFTF interviews during biomarker waves ended with the interviewers’ leaving behind new lifestyle questionnaires on psychosocial topics. The impacts of biomarker notification on economic outcomes presumably via their intermediate effects on health may also be interesting to explore.

These data provide insights about preventable hospitalizations among older Americans, some of which are attributable to uncontrolled hypertension or diabetes, especially among poor communities, and to access to care (Bindman et al. 1995; Jiang et al. 2009). Although it does not ask separately about emergency room visits, the HRS does ask about the frequency of overnight hospitalization between waves, which was highest in 2006 among subsets of noncompliers with very high BP or high A1C and a preexisting diagnosis. Known diabetics with high A1C in 2006 were significantly more likely to report overnight hospitalization (panel FE coefficient of 0.096; t statistic of 3.58) and more stays (0.170; t statistic of 2.11) in 2008. However, it is unclear what to conclude from this besides motivation for a fuller assessment of their characteristics and what drives them to hospitalization compared with other diabetics.

Are behavioral responses to biomarker notification empirically important on average and thus policy relevant? The ITT estimates of the average impacts of biomarker collection in the sample are effectively zero, except insofar as they may reduce doctor visits and medication usage by 1 % or 2 %. This must be because screening does not generate much salient information on average. Undiagnosed “rare and dangerous” conditions are uncommon in the sample, and although uncontrolled conditions are less rare, it appears that notification of their uncontrolled nature does not change behavior much. Averaged across all respondents, a policy of collecting biomarkers and informing about results and normal ranges will have few tangible effects because those treated by salient information are a small subset, and their conditions do not change drastically. A more targeted intervention would have larger average effects, though. These findings suggest unmet surveillance and care needs among Americans aged 50 and older, which are concentrated among relatively small subgroups.

A key methodological question is whether effects on those treated by the notifications are in fact attributable to notification and thus to the intervention as defined by collecting biomarkers and notifying abnormal screens, or whether they instead reflect the effects of the biomarker levels themselves and thus would have happened regardless. Because all respondents who screen abnormally were also notified, there is no way to answer this question to the standards of a randomized controlled trial. My reading of the evidence is that these effects are indeed associated with the arrival of information and not the biomarker level. Key results pass the falsification test of contemporaneous association, and the robustness of results across specifications and samples lends further credence to the view that the effects are causal. Further examination of this question with the 2008 results may shed more light.

An unfortunate shortcoming of this study is that it has relatively little to say about the large and by some accounts widening health disparities in the United States (Chetty et al. 2016). The natural experiment it leverages identifies the ITT effects of collecting biomarkers, which are small and seem relatively uninteresting vis-à-vis health disparities, at least at ages 50 and older. It also identifies the effects of unexpectedly screening outside normal range, which can be large but are confined to small subgroups, rendering further analysis by SES underpowered. This study reveals extant disparities in disease control but can say little about their causes.

Implications of these results range from practical insights to new understanding of behavior and unmet health services needs in the U.S. population around the age at retirement, and ultimately to informing policy. Methodologically, this research contributes the insight that the strong correlation between biomarker collection and mode of interview will probably affect self-reported measures. Collecting biomarkers in a cross-sectional survey like the core NHANES will affect self-reports relative to a telephone survey but not relative to itself, if all respondents are interviewed in person. Panel studies like HRS, Add Health, and others that include biomarkers in some but not all waves are more likely to be affected.

## Notes

1

The National Health and Nutrition Examination Survey (NHANES) notifies participants of a wide array of biomarker levels, including overweight, blood pressure, and oral health. Exploiting the random assignment of respondents to morning exams, when biomarkers that require fasting could be collected, Singleton (2013) found evidence that retirement behavior reacts to biomarker notification in the NHANES. The National Longitudinal Study of Adolescent Health (Add Health) collected biomarkers in its third wave and notified respondents of results of testing for HIV, chlamydia, and gonorrhea. By contrast, the Demographic and Health Surveys (DHS) measure HIV status in developing countries but explicitly do not inform respondents about results, ostensibly to preserve anonymity and respondents’ safety. Instead, DHS participants are offered referrals for free counseling and testing.

2

Participation in a panel study can change some types of behavior under certain conditions (Halpern-Manners and Warren 2012). Here, the possibility that observation becomes a treatment seems more likely than in standard cases of panel conditioning because the motivating presumption of the IRB is that information about risky levels of biomarkers should affect behavior and outcomes.

3

Case and Deaton (2015) identified significant and rising excess mortality among white non-Hispanic Americans aged 45–54. Chronic pain and misuse of prescribed opioids—or “too much care”—could be the culprit, suggesting that increased screening might even exacerbate the problem. However, Meara and Skinner (2015) pointed out that increasingly elevated mortality among this subgroup is not confined to diseases related to excessive drinking and drug use; it is also spread across causes such as cardiovascular disease and diabetes, precisely the targets of screenings considered here.

4

Compared with the likely costs involved with a blanket extension of such services, the benefits seem minimal without targeting.

5

Although respondents in group A2008 submitted biomarkers two years later in 2008, it is unclear whether their 2008 biomarkers would be good proxies for what their 2006 biomarkers would have been. The persistence of biomarkers over the life course, which likely depends both on biological and on socioeconomic conditions, is unknown. If high blood pressure, A1C, and cholesterol are chronic and stable among those undiagnosed, and if those undiagnosed in the HRS tend to remain undiagnosed, then the 2008 biomarker levels would be good proxies for the 2006 levels among the key subgroup A2008B2008. However, it is easy to speculate otherwise. Even if it were feasible, a direct comparison of A2006 with A2008 in 2006 is polluted by mode-of-interview effects because the latter is almost exclusively assessed via telephone interview. See the next section for an expanded discussion.

6

I restrict attention to respondents who appeared in both the 2006 and 2008 waves. This restriction effectively drops approximately 10 % of the sample submitting biomarkers across the board, of whom typically 6 % had died and the other 4 % were in nonresponse. As I discuss shortly, there did not appear to be substantial differences in attrition or mortality across these groups defined by biomarker results by 2008.

7

For the current level of condition diagnosis here and in the panel FE analysis later, I use the raw responses, in which respondents can dispute records from past waves. In Table 1, the change in diagnosis is calculated using the current statements about present and past diagnosis. Differences between these data definitions are minimal and do not appreciably affect results.

8

I use the same covariates shown in Table 2 plus nine indicators of diagnosed health conditions: high blood pressure, diabetes, cancer, lung disease, heart problems, stroke, psychological problems, memory-related disease, and arthritis. Results are available upon request.

9

Rosero-Bixby and Dow (2012) revealed significant effects of biomarkers on mortality in a Costa Rican panel. However, scoring above particular thresholds of biomarkers did not affect mortality and other forms of attrition over two years in the HRS, even though the levels of A1C and total cholesterol were significant predictors of death by 2008. Presumably, the information contained in the thresholds alone was picked up well by other covariates, such as health conditions.

10

The cause is unknown and seems worthy of future inquiry, but the size of the effect is small enough to complicate investigation. One might have expected to see the reverse—that assignment to the 2008 biomarker group would have reduced nonresponse because of efforts by interviewers to obtain the measures.

11

Previous knowledge of high blood pressure and diabetes can be defined by whether the respondent is taking the associated medicine. Results based on that definition are similar to those reported here.

12

All regressions in Tables 36 are linear, including those modeling dichotomous outcomes. As shown in Table 1, these outcomes tend to be quite common in the sample, removing a typical concern about applying the linear probability model. An additional motivation for using the linear model, aside from its more straightforward asymptotic characteristics in the presence of fixed effects, is that interaction terms are better defined. Results of logit models were consistent with those of the linear probability model.

13

The average number of doctor visits since the previous wave does not respond significantly (not shown), which could be consistent with small changes in the prevalence of a common event like this.

14

I do not discuss the results of estimating Eq. (4) without condition interactions, which are close to the weighted averages of the effects shown in Table 4 but are less clear or interesting. They are available upon request.

15

The large and significant negative coefficients here on abnormal cholesterol screens among those who were already taking those medications are real. These subgroups report less than 100 % usage in 2008 for unknown reasons. Given that the notification thresholds in question are not extreme, one interpretation is that the notification is not a good indicator of a persistent condition in these cases.

## References

Banks, J., Muriel, A., & Smith, J. P. (
2010
).
Attrition and health in ageing studies: Evidence from ELSA and HRS
.
Longitudinal and Life Course Studies
,
2
,
101
126
.
Bindman, A. B., Grumbach, K., Osmond, D., Komaromy, M., Vranizan, K., Lurie, N., & . . . Stewart, A. (
1995
).
.
JAMA
,
274
,
305
311
.
Boozer, M. A., & Philipson, T. J. (
2000
).
The impact of public testing for human immunodeficiency virus
.
Journal of Human Resources
,
35
,
419
446
. 10.2307/146387
Case, A., & Deaton, A. (
2015
).
Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century
.
Proceedings of the National Academy of Sciences
,
112
,
15078
15083
. 10.1073/pnas.1518393112
Chetty, R., Stepner, M., Abraham, S., Lin, S., Scuderi, B., Turner, N., & . . . Cutler, D. (
2016
).
The association between income and life expectancy in the United States, 2001–2014
.
JAMA
,
315
,
1750
1766
.
Delavande, A., & Kohler, H-P (
2012
).
The impact of HIV testing on subjective expectations and risky behavior in Malawi
.
Demography
,
49
,
1011
1036
. 10.1007/s13524-012-0119-7
Falba, T. A., & Sindelar, J. L. (
2008
).
Spousal concordance in health behavior change
.
Health Services Research
,
43
,
96
116
. 10.1111/j.1475-6773.2007.00754.x
Godlonton, S., & Thornton, R. L. (
2013
).
Learning from others’ HIV testing: Updating beliefs and responding to risk
.
American Economic Review: Papers & Proceedings
,
103
,
439
444
. 10.1257/aer.103.3.439
Goldman, D. P., & Smith, J. P. (
2002
).
Can patient self-management help explain the SES health gradient?
.
Proceedings of the National Academy of Sciences
,
99
,
10929
10934
. 10.1073/pnas.162086599
Goldman, N. (
1993
).
Marriage selection and mortality patterns: Inferences and fallacies
.
Demography
,
30
,
189
208
. 10.2307/2061837
Gong, E. (
2015
).
HIV testing and risky sexual behavior
.
Economic Journal
,
125
,
32
60
. 10.1111/ecoj.12125
Guner, N., Kulikova, Y., & Llull, J. (
2014
).
Does marriage make you healthier?
(IZA Discussion Paper No. 8633).
Bonn, Germany
:
Institute for the Study of Labor
.
Gunton, J. E., Davies, L., Wilmshurst, E., Fulcher, G., & McElduff, A. (
2002
).
Cigarette smoking affects glycemic control in diabetes
.
Diabetes Care
,
25
,
796
797
. 10.2337/diacare.25.4.796-a
Halpern-Manners, A., & Warren, J. R. (
2012
).
Panel conditioning in longitudinal studies: Evidence from labor force items in the Current Population Survey
.
Demography
,
49
,
1499
1519
. 10.1007/s13524-012-0124-x
Hu, Y., & Goldman, N. (
1990
).
Mortality differentials by marital status: An international comparison
.
Demography
,
27
,
233
250
. 10.2307/2061451
Jiang, H. J., Russo, C. A., & Barrett, M. L. (
2009
).
Nationwide frequency and costs of potentially preventable hospitalizations, 2006
(Healthcare Cost and Utilization Project Statistical Brief No. 72).
Rockville, MD
:
U.S. Agency for Healthcare Research and Quality
Juster, F. T., & Suzman, R. (
1995
).
An overview of the Health and Retirement Study
.
Journal of Human Resources
,
30
(
Suppl.
),
S7
S56
.
Meara, E., & Skinner, J. (
2015
).
Losing ground at midlife in America
.
Proceedings of the National Academy of Sciences
,
112
,
15006
15007
. 10.1073/pnas.1519763112
Reczek, C., & Umberson, D. (
2012
).
Gender, health behavior, and intimate relationships: Lesbian, gay, and straight contexts
.
Social Science & Medicine
,
74
,
1783
1790
. 10.1016/j.socscimed.2011.11.011
Rendall, M. S., Weden, M. M., Favreault, M. M., & Waldron, H. (
2011
).
The protective effect of marriage for survival: A review and update
.
Demography
,
48
,
481
506
. 10.1007/s13524-011-0032-5
Rosero-Bixby, L., & Dow, W. H. (
2012
).
Predicting mortality with biomarkers: A population-based prospective cohort study for elderly Costa Ricans
.
Population Health Metrics
,
10
,
1
15
. 10.1186/1478-7954-10-11
Sattar, N., Preiss, D., Murray, H. M., Welsh, P., Buckley, B. M., de Craen, A. J., & . . . Ford, I. (
2010
).
Statins and risk of incident diabetes: A collaborative meta-analysis of randomized statin trials
.
Lancet
,
375
,
735
742
.
Singleton, P. (
2013
).
Health information and Social Security entitlements
(Center for Policy Research Working Paper No. 164).
Syracuse, NY
:
Syracuse University
.
Stevenson, B., & Wolfers, J. (
2007
).
Marriage and divorce: Changes and their driving forces
.
Journal of Economic Perspectives
,
21
(
2
),
27
52
. 10.1257/jep.21.2.27
Thornton, R. L. (
2008
).
The demand for, and impact of, learning HIV status
.
American Economic Review
,
98
,
1829
1863
. 10.1257/aer.98.5.1829
Thornton, R. L. (
2012
).
HIV testing, subjective beliefs and economic behavior
.
Journal of Development Economics
,
99
,
300
313
. 10.1016/j.jdeveco.2012.03.001
Umberson, D., Crosnoe, R., & Reczek, C. (
2010
).
Social relationships and health behavior across life course
.
Annual Review of Sociology
,
36
,
139
157
. 10.1146/annurev-soc-070308-120011
Weinstein, M., Vaupel, J. W., & Wachter, K. W. (
2007
).
Biosocial surveys
.
Washington, DC
:
.
Weir, D. (
2007
).
Elastic powers: The integration of biomarkers into the Health and Retirement Study
. In Weinstein, M., Vaupel, J. W., & Wachter, K. W. (Eds.),
Biosocial surveys
(pp.
78
95
).
Washington, DC
: