Abstract
The evaluation of innovative web-based data collection methods that are convenient for the general public and that yield high-quality scientific information for demographic researchers has become critical. Web-based methods are crucial for researchers with nationally representative research objectives but without the resources of larger organizations. The web mode is appealing because it is inexpensive relative to in-person and telephone modes, and it affords a high level of privacy. We evaluate a sequential mixed-mode web/mail data collection, conducted with a national probability sample of U.S. adults from 2020 to 2022. The survey topics focus on reproductive health and family formation. We compare estimates from this survey to those obtained from a face-to-face national survey of population reproductive health: the 2017–2019 National Survey of Family Growth (NSFG). This comparison allows for maximum design complexity, including a complex household screening operation (to identify households with persons aged 18–49). We evaluate the ability of this national web/mail data collection approach to (1) recruit a representative sample of U.S. persons aged 18–49; (2) replicate key survey estimates based on the NSFG, considering expected effects of the COVID-19 pandemic lockdowns and the alternative modes on the estimates; (3) reduce complex sample design effects relative to the NSFG; and (4) reduce the costs per completed survey.
Introduction
The vast majority of all persons between the ages of 18 and 49 who live in the United States now use the internet (Vogels 2021). The percentage of this population with internet access, via PCs or mobile devices, is approaching 95% (Pew Research Center 2021), even across historically marginalized groups such as persons who are poor (Couper et al. 2018). This widespread internet use has motivated rapid advances in applications of survey methodology to web-based data collection, revealing the many advantages of the web mode for population science (Biemer et al. 2022; Tourangeau et al. 2013). Perhaps the most important advantage is the privacy with which respondents can complete a web survey. In either face-to-face or telephone surveys, where interviewers are administering the survey content, their presence may cause respondents to provide more socially desirable responses to sensitive questions (Chang and Krosnick 2009; Fricker et al. 2005; Kreuter et al. 2008; Sakshaug et al. 2010). The privacy afforded by web surveys is crucial for demographic research on many potentially sensitive topics, such as sex and reproductive health, romantic relationships, earnings, assets and remittances, or attitudes and beliefs. The web mode also facilitates the use of complex questionnaires relative to any form of paper questionnaire, including mailed surveys.
Respondents can also access web surveys from virtually anywhere, which is crucial for contexts where in-person interviewing is not possible (e.g., the height of the global COVID-19 pandemic) and for any topics linked to respondent mobility. Furthermore—and critical for the growth of population research—web surveys have a much lower cost per interview than any other data collection mode. Scientific efforts to demonstrate that measurements from face-to-face surveys can be replicated in self-administered modes could introduce new reliable options for extending population research on family and fertility topics at a time when face-to-face surveys are becoming increasingly unsustainable. Data collection approaches that combine web surveys with follow-up of nonrespondents in alternative self-administered modes (e.g., mail) can be successful even when a pandemic or other disruptive shock prevents face-to-face interviewing.
Of course, significant advances in the application of data collection science to population research have always depended on careful consideration of both the strengths and weaknesses of each specific advance. Despite the attractive features of the web mode for survey data collection, there are important drawbacks to the approach. First, no frame of internet users exists from which to draw a sample, so probability samples are generally selected from commercially available lists of addresses, and a different mode (e.g., mail) is used to recruit sampled households to participate in the study. Second, researchers have less control over within-household selection procedures if the objective is to randomly select one eligible respondent from a household. This generally requires a two-step process of initially requesting a sampled household to complete a screening questionnaire, and then randomly selecting an eligible person from all identified eligible persons in the household. Finally, web surveys are known to have lower response rates compared with nonweb modes (Braekman et al. 2022; Daikeler et al. 2020; Manfreda et al. 2008; Shih and Fan 2008; Tourangeau et al. 2013).
Given these drawbacks, two crucial questions remain for demographic researchers:
Can web surveys be used to produce statistically efficient population estimates based on national probability samples in a cost-efficient manner?
Are web surveys characterized by more selection bias than other modes?
We apply recent advances in knowledge of the strengths and weaknesses of using the internet for scientific data collection to address these fundamental issues in the use of web surveys for demographic research. The survey methodology of maximizing response to data collection efforts under fixed constraints, while controlling the risk of nonresponse bias across measures within the same survey, has also evolved at a rapid pace, providing the tools required for conducting web-based data collection in a way that minimizes selection bias. The problem of reduced response rates in web surveys can be remedied in practice with sequential mixed-mode designs, where alternative nonweb modes such as mail and telephone are used to follow up with nonrespondents, thus increasing response rates while decreasing the nonresponse bias in estimates (Axinn et al. 2015; Millar and Dillman 2011; Olson et al. 2021; Tourangeau et al. 2014).
In the current study, we evaluate the general and relatively nascent sequential mixed-mode approach known as “push-to-web,” which first invites national samples of households to participate in a web survey, via mailed invitations that ask one or more individuals from a sampled household to respond on the web, and then follows up with mailed paper questionnaires and/or telephone reminders. Some national surveys based on probability samples have started to explore the feasibility of using these types of sequential mixed-mode approaches that allow for the self-administration of survey content (e.g., the National Household Education Surveys (Brick et al. 2012; Brick et al. 2011; Han et al. 2013; Montaquila et al. 2013), the Residential Energy Consumption Survey (Biemer et al. 2018; Biemer et al. 2016; Zimmer et al. 2015), the American National Election Studies (DeBell, Amsbary et al. 2018; DeBell, Maisel et al. 2018), the Panel Study of Income Dynamics Transition to Adulthood Supplement (Sastry and McGonagle 2023), and the European Values Study (Luijkx et al. 2021); for an overview, see Olson et al. 2021). In general, these initial studies have found support for the use of such approaches and subsequently implemented them to varying extents.
Web surveys are a key feature of the future of data collection for many good reasons. This article presents an evaluation of the quality of a web/mail survey approach for nationally representative demographic research. We present results from a web/mail approach to conducting national survey data collection and compare them to results from a face-to-face national survey of population health: the National Survey of Family Growth (NSFG). Using the NSFG as a comparison allows us to test maximum design complexity, including a complex household screening operation (necessary to identify the subset of households that contains eligible persons aged 18–49). We evaluate the ability of this national web/mail data collection approach using address-based sampling and sequential mixed-mode design to (1) recruit a representative sample of U.S. persons between the ages of 18 and 49; (2) replicate key survey estimates based on the NSFG; (3) reduce complex sample design effects relative to the NSFG; and (4) reduce the costs per completed survey.
Background and Guiding Framework: Advancing Data Collection for Population Science
Population research has a long history of using the census approach for collecting data (Baffour et al. 2013), but the field is characterized by decades of evolution in data collection science as the demand for new and more detailed population data grows. The advances we describe are motivated by this continuous increase in the demands for more population data. Within population science, the increased demand for data includes the broadest possible consideration of data sources, ranging from anthropological demography, to focus groups, to historical demography (Kertzer and Fricke 1997; Knodel 1997; Saito 1996). The field of demography has embraced new information from every source but has explicitly recognized that each specific data collection approach has its own strengths and limitations. The potential for complementarity across these strengths and limitations motivates continuous advances in each and explicit focus on the potential benefits of simultaneous use of more than one data collection method (Axinn and Pearce 2006). Here, our focus is on the strengths and limitations of tools for collecting general population survey data.
To accomplish this, we harness the full range of methodological advances in survey data collection from the twenty-first century. We begin by explicitly confronting the conflict between continuously increasing demands for more population-scale survey data versus the rising costs of the primary survey methods used at the turn of the century. Then, we carefully consider the evolution of web survey tools for general population research. Next, we present key advances in web survey methods that improve the ability to fully represent the general population: sequential mixed-mode survey methods. Finally, we close this section with an explicit presentation of our objectives for the current study.
The Demand for More Population Data Versus the Increasing Inefficiency of Interviewer-Administered Data Collection
National-scale U.S. population surveys began in the 1950s, with one of the first collaborations between demography and survey methodology led by Ronald Freedman and Leslie Kish to conduct the U.S. national study of the Growth of American Families, the original predecessor to the U.S. National Survey of Family Growth (Whelpton and Freedman 1956). Just as demographers’ use of nationally representative surveys grew tremendously since that time, so did the science of survey methodology (Groves et al. 2009). Here we review the key conflict between demographers’ continuously increasing demand for more data and the rising costs of the survey methods used in the twentieth century: cutting costs always risks increasing errors.
A fundamental principle in the collection of data for social science research is that survey costs and errors are strongly and inversely connected (Groves 2004). Reduction of error usually involves increasing costs, and reduction of costs usually involves increasing error. As the field of survey methodology has evolved, dozens of breakthroughs have identified specific tools that can be used to minimize changes in error while reducing costs. Breakthroughs related to the use of web surveys for samples of the general population are particularly important because web surveys reduce costs substantially relative to face-to-face surveys (Tourangeau et al. 2013). For example, research on web surveys demonstrates that careful use of a second mode of contact, such as telephone or mail, for a subset of respondents who are unable to participate (e.g., because of lack of access to the internet) or who choose not to participate via the web can significantly lower the nonresponse error in web surveys (Axinn et al. 2015; Bandilla et al. 2014; Vannieuwenhuyze 2014). These design advances have now cumulated to make large-scale sequential mixed-mode designs a robust alternative to large-scale face-to-face surveys (Braekman et al. 2022; Couper 2013).
The potential of web-based, sequential mixed-mode designs to yield statistical results consistent with more expensive face-to-face surveys is especially important given the large financial resources dedicated by federal funding agencies to survey data collection each year (Klausch, Hox et al. 2015; Klausch, Schouten et al. 2015; Presser and McCulloch 2011). Survey data collection represents a significant expenditure for governments worldwide. For example, 10 federal statistical agencies spent more than 1.3 billion dollars on surveys in the United States in 2004 (Presser and McCulloch 2011; see also https://www.whitehouse.gov/omb/information-regulatoryaffairs/statistical-programs-standards/), and federal budgets for statistical agencies have essentially remained flat since this time (Pierson 2020). Although the U.S. government dedicates significant financial resources to fund these surveys each year, it is important to note that many other studies, such as randomized controlled trial studies, also collect data using survey methods (e.g., Formica et al. 2004; Ginde et al. 2013; Meyers et al. 2003; Wilson et al. 2009). In addition, many other survey data collections are funded by the National Science Foundation (NSF), the National Institutes of Health, or private organizations, meaning that even more financial resources are dedicated to collecting survey data every year. The search for robust scientific methods of data collection that cost less than in-person surveys is also important in other countries. Research on data collection alternatives that produce high-quality scientific information in a more cost-efficient manner is clearly needed, especially in light of the COVID-19 pandemic, which completely halted—at least temporarily—in-person interviewing activities worldwide.
The rising costs of face-to-face survey data collection are of particularly high significance to all U.S. federally funded research on the general population. In an era of growing survey nonresponse, the average effort required to complete a face-to-face survey interview—or the average “hours per interview” an interviewer must work to complete each interview—has grown dramatically in the last 25 years. Survey management data from six years of the NSFG (Wagner et al. 2021) illustrate this trend in Figure 1. Even within the same survey, following the same design over time, the increase in hours per interview was approximately 3.3% per year.
Some of the largest components of this increase in hours per interview are lifestyle changes (e.g., multiple partners working, long commutes) that make it harder to find people at home, along with the general public's resistance to participating in survey interviews, which forces interviewers to spend more time finding respondents (often by driving significant distances), explaining the reasons for the interview request, and answering all questions raised by each respondent. Thus, the total costs of face-to-face survey interviewing for representative samples of the general population have grown severalfold. Telephone survey interviewing is facing a very similar problem, especially because response rates in telephone surveys have dropped precipitously (Olson et al. 2021). Rising costs mean that the vast majority of demographic researchers working on smaller research teams are unable to conduct nationally representative population studies using interviewer-administered data collection, as it would be impossible to continue absorbing these cost increases within the fixed budget caps associated with many federally funded projects. More cost-efficient alternatives that still yield high-quality data are needed.
The Evolution of Web Survey Tools
As internet access has spread throughout the U.S. population, conducting survey interviews via the web has become an attractive method for lowering data collection costs (Couper 2008; Tourangeau et al. 2014). Aside from the cost savings, the advantages of web surveys include portability, flexibility, and privacy—web surveys allow respondents to complete surveys at whatever time and location are convenient and private for them. These properties extend to multiple devices, including personal computers, laptops, tablets, and smartphones, further providing respondents with more options for convenience with little difference in measurement error between the devices (given appropriate instrument design; e.g., see Couper et al. 2017; Lugtig and Toepoel 2015; Mavletova and Couper 2013).
Unfortunately, the advantages of the web mode can be offset by the serious disadvantage of low response rates and the potential for both coverage bias and nonresponse bias to mislead investigators. Sampling frames of up-to-date email addresses are generally not available to researchers outside of specific contexts, and even if such a frame were available for a specific population, individuals without internet access would not be included in the frame, thus introducing a risk of coverage bias (Couper et al. 2018). This situation means that large-scale web surveys are often conducted via address-based sampling (to ensure high population coverage of the sampling frame) combined with mailed invitation letters inviting individuals to complete surveys online.
When carefully designed, web surveys have many positive characteristics. Some studies have found that web surveys have less measurement error than surveys administered using other modes and can yield higher rates of reporting potentially sensitive information relative to telephone interviewing (Kreuter et al. 2008; Sastry and McGonagle 2023) and more accurate reporting on sensitive items involving undesirable characteristics (Chang and Krosnick 2009; Fricker et al. 2005; Sakshaug et al. 2010). In general, the self-administered nature of web surveys consistently produces less response bias arising from social desirability than interviewer-administered modes (Holbrook and Krosnick 2010; Kennedy et al. 2016; Lindhjem and Navrud 2011). On the other hand, some studies have indicated that web surveys may produce higher rates of “don't know” responses than interviewer-administered surveys (Heerwegh and Loosveldt 2008). Measurement differences between face-to-face and web surveys call for careful consideration at initial design stages depending on the survey content (Heerwegh 2009; Nielsen 2011). Careful design of the web content to maximize data quality is therefore very important (Couper 2008).
The focus of the present study is on one-time cross-sectional national surveys, the feasibility of replacing a large and complex face-to-face survey that includes a household screening operation (to determine eligibility) entirely with a web/mail survey, and use of these data to estimate the health and fertility characteristics of the U.S. population aged 18–49.
Advances in Sequential Mixed-Mode Web Surveys
Recent research has demonstrated that a sequential mixed-mode approach can compensate for the lower response rates known to be generated by web surveys and strengthen the web survey approach for robust, representative measurement (Axinn et al. 2015; Tourangeau et al. 2014). By focusing data collection effort with a second alternative mode (e.g., mail) on those cases that do not initially respond to web surveys, data collection can efficiently compensate for the web-specific noncoverage and nonresponse and create more robust statistics that more closely represent the full population (Millar and Dillman 2011). In theory, the use of the second alternative mode will appeal to a complementary set of sampled cases for whom the first mode was not appealing, ultimately producing survey estimates with increased precision and reduced bias.
Some studies have suggested that when individuals are simply given a choice of responding by web or mail, the overall response rate is lower than when only a single response option is offered at a time (Medway and Fulton 2012). Other research has demonstrated that giving respondents a choice between web and mail may increase nonresponse in the absence of an extra incentive for completing the survey online (Biemer et al. 2016). For this reason, our proposed methodology features a sequential mixed-mode design, in which respondents are first asked to complete the survey over the web, and only when that invitation is not successful are they invited to complete the survey by mail. A prior analysis performed by the American National Election Studies found that additional effort and follow-up applied to the mail/web sample were important for improving representativeness (DeBell, Amsbary et al. 2018; DeBell, Maisel et al. 2018), and we have also found evidence of this in prior methodological evaluations (Wagner et al. 2023; West et al. 2023).
Four Objectives of the Current Study
Our study focuses on the efficient conversion of a complex, lengthy, cross-sectional, national face-to-face survey of family formation and reproductive health to a web/mail format for the collection of the same survey data from a national probability sample of persons between the ages of 18 and 49. No prior studies have attempted to convert a major national health survey like the NSFG into a web/mail format and then employ a sequential mixed-mode design for both the screening and main data collection stages to field the fully self-administered survey in a cross-sectional national sample of households. The only analogous examples with national scope (in the United States) of which we are aware can be found in educational (Brick et al. 2012; Brick et al. 2011; Han et al. 2013; Montaquila et al. 2013) or political (DeBell, Amsbary et al. 2018; DeBell, Maisel et al. 2018) contexts. Many investigators in the health sciences would benefit from this more cost-efficient approach to the large-scale collection of health survey data, but the approach has never been rigorously evaluated in this context.
For each key topic that we study, the overarching objective is to document the strengths and limitations of this approach. It is clear that every tool for collecting population data has strengths and limitations, but by documenting the specific strengths and limitations of this web/mail sequential mixed-mode approach, we can understand better how to combine it with either auxiliary forms of survey data or nonsurvey alternative data sources to provide the strongest possible population inferences (Axinn and Pearce 2006; Groves et al. 2009).
Objective 1: Evaluate the characteristics of the respondents recruited using this type of approach and the population that they would represent prior to any types of adjustments to base sampling weights.
By carefully documenting the key demographic characteristics of respondents recruited using the web/mail approach relative to population benchmarks, we can learn which parts of the general population are likely to be under- or overrepresented by this approach. This documentation provides the means to make statistical adjustments to increase the comparability across multiple sources of data or to increase the comparability to total population measures (in this case the U.S. Census). Such documentation also provides the evidence needed to adjust future data collections using this approach, such as oversampling of population subgroups to ensure sufficient numbers of cases for meaningful subgroup analyses and identification of potential subgroups for targeted nonresponse follow-up strategies.
Objective 2: Compare weighted survey estimates generated from this web/mail approach to those computed from a national face-to-face survey (the NSFG) measuring the same content in a similar time frame.
At the time that the current study was designed (2017–2019), no one could have anticipated the global COVID-19 pandemic. The most recent publicly available NSFG data set was collected using in-person interviewing prior to the pandemic (2017–2019), while the data collected using our web/mail approach were collected from a different sample, using different modes, at the height of the pandemic (April 2020–April 2022). Our approach, relying on web and mail exclusively, made a national data collection during the pandemic feasible (because in-person interviewing was not feasible) but also introduced a confounding factor in these comparisons, namely, the possibility that the health problems and social isolation introduced by pandemic-related lockdowns would systematically affect measures of reproductive health and family formation. Many measures of personal health were likely changed by the pandemic (Birimoglu Okuyan and Begen 2022; Penninx et al. 2022), and measures of reproductive health, including contraceptive use and sexual activity, may have also changed (Phelan et al. 2021).
In addition, the use of self-administered web and mail modes (including the use of audio computer-assisted self-interviewing, or ACASI, for especially sensitive items in the NSFG) is expected to engender more honest reporting of socially undesirable behaviors that are captured via sensitive survey questions (e.g., drug use and risky sexual behavior; see Heerwegh 2009; Hope et al. 2022), relative to in-person interviewing. Although we did not design the current study to test for specific hypothesized differences in estimates, as our original focus was on the extent to which we could replicate results from the NSFG, we later evaluate significant differences in estimates that we observed between the two approaches in the context of these broader expectations regarding COVID-19 and mode effects.
Finally, related to this second objective, the use of survey weights to better estimate descriptive demographic features of the general population is considered a best practice (Heeringa et al. 2017), especially when the weights, which are often adjusted for differential nonresponse across subgroups, are correlated with the variables of interest. This analysis allows us to assess the extent to which application of weights to the web/mail, sequential mixed-mode approach successfully adjusts for representation of the general population to achieve results comparable to national face-to-face surveys of the same topics. To the extent that weighting is successful, it is an important tool for creating comparability across data sources. To the extent the weighting fails, it indicates that higher-effort adjustments will be required to achieve that comparability. Key examples of such adjustments include greater oversampling of some subgroups of the population or the addition of other modes targeting underrepresented subgroups of the population.
Objective 3: Evaluate design effects on survey estimates driven by the complex probability sampling designs employed in each case.
Just as all sources of data are characterized by error, all sample designs introduce the potential for sampling error or inefficiency. This analysis comparing different complex sample design effects allows us to document differences across surveys created by using different sample designs. Identifying such differences is particularly important for comparing survey methods for representing the general population, because the high costs of face-to-face interviewing motivate highly area-clustered sample designs, but the web/mail approach removes this limitation. Large reductions in design effects by using the web/mail approach increase efficiency, motivating greater use of this method. Small changes do not.
Objective 4: Compare the data collection costs per completed survey associated with these two approaches.
As mentioned earlier, it is well-known that all survey design choices require attention to trade-offs between survey errors and survey costs (Groves 2004). The same is true here, but to make choices between survey approaches (face-to-face vs. web/mail), we need information about both the error differences and the cost differences. Even if we demonstrate that a sequential web/mail approach reduces the difference in error, we know that it also increases the cost relative to a web-only survey. To make a choice between these methods, demographers need evidence about the relative costs of alternative approaches.
Methodology
Overview of the National Survey of Family Growth
Data collection for the 2017–2019 portion of the National Survey of Family Growth took place between September 2017 and September 2019. A national probability sample of households was first screened by an in-person interviewer for age eligibility (someone aged 15–49 present in the household). If a household contained an eligible person, one eligible person was randomly selected and invited to participate in the main survey, which featured in-person interviewing and self-administration of more sensitive questions (with the interviewer still present) using ACASI. Interviewers asked questions about reproductive health, fertility histories, and family formation, and interviews generally lasted 60–80 minutes. The final response rate during this data collection period was 63.4%, but despite this relatively high response rate, the NSFG may still be affected by nonsampling errors, including interviewer effects, nonresponse bias that is not fully corrected by the survey weighting, and measurement error owing to in-person interviewing.1 The NSFG could not continue its in-person data collection operations during the height of the COVID-19 pandemic.
Overview of the American Family Health Study
We initiated data collection in April 2020 with a national address-based probability sample of more than 41,000 U.S. addresses. We called this new study the American Family Health Study (AFHS; for additional details, see afhs.isr.umich.edu). The AFHS used a sequential mixed-mode web/mail protocol for push-to-web household (HH) screening to identify eligible persons aged 18–49. One eligible individual was then randomly selected from each HH with eligible persons present and invited (either by a mailed letter or email, if provided in the screening questionnaire) to complete a 60-minute web survey on the same reproductive health and family formation topics measured in the NSFG, using a second sequential mixed-mode web/mail protocol that encouraged them to respond to the “main” survey via the web. As part of this protocol, individuals who did not respond to the full 60-minute web survey were subsequently invited to complete a shorter paper questionnaire that was sent by mail. This reduced-length questionnaire included primarily items that were asked of all persons and did not require filter questions or complex skip logic. Conversion of the NSFG content to self-administered web and mail formats was not a trivial exercise; see the online Appendix II for more details on this process.
The national probability sample selected for the data collection was split into two replicate subsamples (each of which was itself a national probability sample). This methodological approach enables the refinement of data collection protocols based on lessons learned from earlier replicate subsamples. In the AFHS, we modified our data collection procedures in the second replicate, given the results of experiments and other experiences from the first replicate; for additional details, see Zhang et al. (2023). Data collection for the first replicate subsample continued until June 2021, and the second replicate subsample was fielded between September 2021 and April 2022.
We note that the onset of the AFHS data collection essentially coincided with the onset of the COVID-19 pandemic, but our general mail and web protocol enabled data collection to proceed despite the fact that interviewer-administered data collections needed to cease their operations at this time. We also worked with the U.S. Postal Service to carefully track delivery rates for our mailed research materials and found no evidence of unusual delays or high prevalence of failed deliveries in any areas of the country (Nishimura et al. 2024). Additional details regarding the AFHS sample design and data collection methodology can be found elsewhere (see https://afhs.isr.umich.edu/about-the-study/afhs-methodology/).
AFHS Screening Protocol
The AFHS screening questionnaire was designed to collect a list of persons aged 18 or older in the household. In the first phase of this protocol, we selected a stratified probability sample of addresses, oversampling addresses predicted to have an age-eligible (18–49 years old) person present (as determined from commercial data from Marketing Systems Group; see West et al. 2015) and located in high-density minority areas (as determined from the American Community Survey). All census block groups were divided into four mutually exclusive domains using estimated population density according to race and ethnicity: (1) <10% Black and <10% Hispanic; (2) >10% Black and <10% Hispanic; (3) <10% Black and >10% Hispanic; and (4) >10% Black and >10% Hispanic. Addresses in the latter three domains were oversampled at rates 2.3 to 2.6 times as high as for the first domain. Sampled households received a mailed invitation (including a $2 cash incentive) addressed to the resident of a particular state, inviting an adult member of the household to complete a screening questionnaire online (available in English or Spanish). In the second phase of this protocol, a follow-up reminder was sent one week after the mailed invitation in the form of a postcard.
In the third phase of screening, a follow-up mailing that included a paper version of the screening questionnaire was sent two weeks after the postcard. In the fourth phase of screening, 28 days after the initial invitation, a random subsample of 5,000 nonfinalized sampled addresses were sent a priority mailing with a final invitation to complete the screening questionnaire and an additional $5 incentive; this step served to significantly increase the response rate to the screening invitation (Wagner et al. 2023). Information obtained from completed screening questionnaires was used to identify eligible persons within the sampled households. If there was only one eligible person in the household, or the person completing the screener was randomly selected to take part in the main survey, then that person was immediately invited to complete the main AFHS survey online. If there was more than one eligible person, one person was randomly selected and then invited—by either email or mailed letter—to complete the main survey.
AFHS Main Data Collection Protocol
Once an eligible respondent was randomly selected from a sampled household completing the screening questionnaire, the main data collection protocol was initiated. The main protocol differed slightly depending on whether the screener respondents were also selected to be the main respondents and whether they provided their emails or text-enabled phone numbers to be contacted in the main stage. Figure 2 summarizes the two alternative sequences of contact attempts and how they are divided into four distinct phases:
Phase 1: An initial invitation to complete the main survey online was sent by email or mail to the selected respondent, and the letter promised a $70 token of appreciation once the completed survey was received. During this phase, respondents responded to the initial push-to-web invites without receiving any follow-up contact attempts. The main survey could be completed in English or Spanish.
Phase 2: Two weeks later, selected cases who had not yet responded were followed up by either a postcard or email reminder (if the selected respondent provided an email address). For those for whom we had an email or text-enabled phone number, they received an additional email or text reminder in the third week. During this phase, follow-up contact attempts were made, but the costs of these attempts were relatively low.
Phase 3: In the fourth week of follow-up, we mailed a substantially shortened paper version of the questionnaire but still encouraged the respondent to complete the survey online. For eligible nonrespondents for whom we did not have an email or text-enabled phone number, we mailed an additional paper questionnaire in the sixth week in a U.S. Postal Service priority mailer. For those for whom we had an email or phone number, we sent an email or text reminder in the fifth week. Therefore, during this phase, active nonrespondents were provided the additional option of responding by mailing back a completed version of the reduced questionnaire, even though they were still encouraged to respond via the web.
Phase 4: After four or six weeks, our calling center staff made reminder telephone calls to nonrespondents with telephone numbers available from either commercial data sources linked to our sampling frame or the initial screening questionnaire (83% of these nonrespondents had telephone numbers available, although some (12%) of these were found to be invalid during the reminder calls). These staff did not administer the survey but rather encouraged the nonrespondents to self-administer the survey and provided any information that would assist them in doing so. During this phase, telephone reminders were the final attempts to convert nonrespondents, and these calling efforts served to significantly increase response rates (West et al. 2023).
The AFHS also embedded an experiment examining the effects of modular survey design (West et al. 2023) during the main data collection stage. Household residents selected from the completed screening questionnaires were randomly assigned to either a full survey condition or a modular design condition. In the former condition, sampled individuals were asked to complete the entire 60-minute questionnaire in one sitting, and could take breaks and return to the survey if desired. In the latter condition, the questionnaire was divided into three modules of roughly equal length, and sampled individuals were invited to complete the three modules at their leisure, with a two-week break between the invitations to complete the modules. The data collection protocol used for each of the three modules is the same as depicted in Figure 2.
For this study, the full versus modular design is considered a feature of the AFHS data, but not an analytical focus. Overall, we found that the modular design was not effective at increasing response and completion rates (West et al. 2023) and that both the survey responses and sociodemographic measures collected from the modular design condition were quite similar to those collected in the full-survey condition (see online Appendix I). Thus, we combined the data from the two conditions and computed three sets of survey weights that combined the full-survey respondents with respondents to either module 1, modules 1 and 2, or modules 1, 2, and 3. These weights accounted for differential probabilities of selection, differential probabilities of completing both the screener and main surveys (either particular modules of the main survey or the full survey), and calibration adjustments to known population totals from the American Community Survey (ACS). We then used these weights (where the weight used in the analysis depended on the module in which a particular question was asked) to produce one set of AFHS estimates. The attrition due to the modular design resulted in variations in the sample sizes across AFHS items. Because respondents had to complete prior modules to proceed to the later modules, the sample sizes for items in module 1 were the largest, followed by module 2 and then module 3 (West et al. 2023).
Response Rates
The first national sample replicate of the AFHS obtained an overall response rate in the screening stage of 15.0% and a conditional American Association for Public Opinion Research RR42 cooperation rate among sampled eligible persons of 66.0% in the main stage. The second national sample replicate obtained a screening-stage response rate of 17.8% and a conditional main-stage cooperation rate of 62.4%. For individuals randomly assigned to the modular condition in replicate 1, completing at least two sections of the questionnaire in the first 20-minute module was counted as a partial response. These two rates resulted in net RR4 response rates of 9.9% and 11.1% for replicates 1 and 2, respectively. See online Appendix III of the supplemental materials for detailed descriptions of these response rate calculations.
Analytic Approach
We sought to compare the 2020–2022 AFHS with the 2017–2019 NSFG (restricted to persons aged 18–49) regarding (1) respondent sample composition, including rates of item-missing data on selective sensitive measures; (2) key survey estimates; (3) complex sample design effects; and (4) costs per completed case.
Respondent Sample Composition (Objective 1)
First, in terms of respondent sample composition, the AFHS and the NSFG were compared, separately for males and females, in their estimated distributions on race and ethnicity, education, age, and marital status, using design-adjusted Rao–Scott chi-square tests. For the AFHS, these demographic distributions were weighted by the base sampling weights, whereas for the NSFG, the distributions were weighted by the final sampling weights (i.e., including nonresponse and calibration adjustments based on control totals provided by the U.S. Census Bureau). We did not consider nonresponse or calibration adjustments for the base AFHS sampling weights in this initial analysis so that we could evaluate the features of the sample respondents and the target population that they would represent prior to any additional weighting adjustments. We also computed weighted estimates describing the target population using the publicly available 2021 microdata from the ACS for comparison purposes, given that the NSFG estimates are based on the 2017–2019 time period. Second, we examined how estimated AFHS demographic distributions changed as the main data collection proceeded across the four phases depicted in Figure 2, and we once again considered only the base sampling weights when computing AFHS estimates to examine respondent sample composition.
Next, we compared the two surveys by rates of item-missing data on selected sensitive measures. Many of these more sensitive questions were self-administered by respondents using ACASI in the NSFG, so we did not expect large differences. In both surveys, we first determined the number of individuals eligible to provide a response on each sensitive measure (e.g., those who have never had sex are not asked questions about specific sexual behaviors) and then computed the proportion of respondents who either refused to answer or responded with “don't know” (if provided as an option). These rates of item-missing data for the two surveys were compared statistically using Rao–Scott chi-square tests.
Key Survey Estimates (Objective 2)
Second, we identified 42 and 89 key measures in the male and female surveys, respectively (see the figures in the Results section and online appendix tables). These 131 measures, each capturing important data on critical domains of family reproductive and health behaviors, were selected using the following criteria:
The measures had acceptable variability in the response values, together with low rates of missing data;
The measures were also analyzed in recent descriptive reports for males and females prepared by the National Center for Health Statistics using NSFG data (see https://www.cdc.gov/nchs/nsfg/nsfg_products.htm);
The measures were also collected in the condensed mail questionnaire used for nonresponse follow-up; and
The measures have also been analyzed in previous studies using AFHS data (e.g., Axinn et al. 2021).
Both AFHS and NSFG estimates were weighted by the final survey weights (where again, the AFHS weight depended on the module in which a variable was located), and design-adjusted standard errors were computed for the weighted estimates to account for the complex sampling features inherent to each study. In the NSFG, these sampling features included stratification, cluster sampling, and weighting, and Taylor series linearization was used for variance estimation (per National Center for Health Statistics guidelines3). In the AFHS, bootstrap replicate weights were used to (1) fully account for uncertainty in the adjustment of the base sampling weights for nonresponse and (2) capture gains in efficiency of the estimates owing to calibration of the weights to known population features (Valliant 2004). These replicate weights fully captured the stratified sampling and weighting inherent to the AFHS design (Heeringa et al. 2017). In fully accounting for all of the sample design information available for each of the two surveys, we used independent-samples t tests to compare the AFHS and NSFG estimates and determined what fraction of weighted estimates had standardized differences that were significant—that is, were more than two pooled standard errors (i.e., pooled SE = ) away from zero.
Complex Sample Design Effects (Objective 3)
Third, we computed the distributions of estimated complex sample design effects associated with each of the weighted estimates compared between NSFG and AFHS. These so-called design effects are specific to each survey estimate and capture the inflation in the variance of each of the weighted estimates relative to a simple random sample of the same size, as a result of the complex sampling features associated with each design and the use of the final survey weights in estimation. We hypothesized that the AFHS sample design, which involved only stratification and weighting adjustments, would reduce design effects relative to the NSFG, which relied on a clustered area probability sample to save on the costs of data collection.
Costs per Completed Survey (Objective 4)
Finally, we compared the data collection cost per completed survey in each of the two studies (conditional on the fixed infrastructure and development costs associated with each project), focusing on the final four quarters of data collection in the 2010–2020 NSFG so that the costs per completed interview would be as comparable as possible.
Results
Objective 1: Comparisons of Respondent Sample Composition
Table 1 presents the weighted estimates of sociodemographic distributions based on the full set of AFHS respondents (to the full survey and the initial module). We reiterate that we use the base sampling weights only (without adjustments for nonresponse or calibration) when computing the distributions based on the AFHS respondents.
The fully weighted distributions based on the NSFG were largely consistent with those based on the ACS in 2021, while the estimated distributions for selected variables based on the base sampling weights in the AFHS were less consistent with the ACS benchmarks. Indeed, we found evidence of significant associations between samples (AFHS vs. NSFG) and both race/ethnicity and education (p < .01), suggesting that the distributions on these sociodemographic measures varied for the two surveys. Because AFHS estimates were based on the base sampling weights only, the AFHS approach recruited significantly more respondents that were non-Hispanic White and higher educated (for both males and females). The youngest group of female respondents (18–24) was also underrepresented in the AFHS.
These results are certainly not unique to the AFHS and are quite common in web surveys of large populations (Baker et al. 2010; Boas et al. 2020; Dillman et al. 2014; Sha et al. 2017; Simmons and Bobo 2015; Tourangeau et al. 2013; Wells et al. 2019). The results imply that nonresponse adjustments to the base AFHS sampling weights may be needed to correct for potential biases in AFHS estimates owing to the race/ethnicity and education response differentials. These results also imply that a weakness of the AFHS approach is less sample yield from more marginalized subgroups, which would inhibit analyses focused on comparing different subgroups in terms of relative advantage (given reduced sample sizes in the disadvantaged subgroups). Whether there would be bias in survey estimates based on the AFHS because of these differentials depends on the associations of race/ethnicity and education with the measures of substantive interest collected in the AFHS (Kennedy et al. 2016).4
The adjustment of sampling weights to account for (1) differential nonresponse across subgroups and (2) calibration of the adjusted weights to population control totals tends to introduce more variability in the final survey weights and thus has the potential to increase estimated design effects of survey estimates (i.e., inflate the variance of the weighted estimates). We later illustrate that the variance inflation introduced by these weighting adjustments does not consistently result in larger design effects than seen in the NSFG (where much of the variance inflation arises from the cluster sampling performed).
We also evaluated changes in the estimated sociodemographic distributions based on the AFHS respondents across the four phases of main data collection. The distributions were largely stable, but there was some evidence that they better matched those of the benchmark 2021 ACS as data collection proceeded. Specifically, the proportion of “high school or less” male respondents changed from 13% at phase 1 to 20% by the end of phase 4, getting closer to the ACS benchmark (although weighting adjustment would still be needed). Similarly, the proportions of 18–24, 25–34, and 35–49 male respondents changed from 19%, 37%, and 44%, respectively, in phase 1 to 22%, 33%, and 45%, respectively, by the end of phase 4, better matching the ACS benchmarks. These results suggest that the later phases of the AFHS data collection, which included the receipt of more mail responses (West et al. 2023), contributed to reducing the underrepresentation of selected sociodemographic subgroups.
When comparing rates of item-missing data on the selected sensitive items measured for females and males in both surveys (Table 2), the rates of item-missing data were generally quite similar for both surveys (as expected, given that most of these items were self-administered using ACASI in the NSFG). We did not find any evidence of significant differences between the surveys in item-missing data rates on these selected variables for males or females.
Objective 2: Comparisons of Key Survey Estimates
The key survey estimates based on the AFHS and NSFG data, including 95% confidence intervals for the parameters being estimated, are compared using dumbbell plots in Figures 3–6. These figures illustrate the general consistency of the AFHS estimates with the NSFG estimates. The point estimates produced by the two studies were remarkably similar to each other in general, with Pearson correlations ranging from .981 (estimated proportions for females) to .999 (estimated means for females). However, the figures also present evidence of nonoverlapping confidence intervals for selected estimates; overall, we found that 80 of the 131 estimates analyzed (61.1%) were statistically similar in the two studies. We now focus on some of the larger differences observed, given that the large effective sample sizes in the two studies provided a great deal of statistical power for detecting even small differences.
Some of the larger differences observed were consistent with our broader expectations related to the effects of the COVID-19 lockdowns and the modes. For example, among both males and females, the mean number of months working for pay in the past 12 months was lower in the AFHS than in the NSFG (see Figures 5 and 6), which may have been a function of the so-called “Great Resignation” from jobs for pay during the pandemic. Among females, reports of receiving a birth control prescription, a checkup related to birth control, a sterilizing operation, counseling about birth control, and counseling about sterilization in the past year were all significantly lower in the AFHS (Figure 4), possibly reflecting fewer in-person medical visits because of the pandemic lockdowns (Birkmeyer et al. 2020; Jeffery et al. 2020; Nourazari et al. 2021; Rennert-May et al. 2021).
For both men and women, a significantly higher proportion of people indicated “difficulty with doing errands alone” in the AFHS (Figures 3 and 4), which would be considered a socially undesirable response (indicating that one is not entirely self-sufficient) and may have also indicated fears about being in public during the lockdowns. Furthermore, significantly lower proportions of individuals responded as being in “excellent health” or receiving various tests or treatments for STDs in the AFHS as compared to the NSFG; these differences are likely effects of the lockdowns. In addition, the AFHS produced significantly higher estimates related to daily drinking behaviors (consistent with work by Nordeck et al. (2022) and Rodriguez et al. (2020), suggesting that more drinking occurred during COVID).
Not all large differences in estimates observed were consistent with our broader expectations. Among males, the NSFG produced much higher estimates of the proportion ever cohabiting, which is a behavior that is not particularly sensitive to and unlikely to be affected by the COVID-19 lockdowns, and ever having been tested for HIV, which is possibly sensitive but is also a longer term measure that is unlikely to be affected by the lockdowns (see Figure 3). Among females, the NSFG produced a much higher estimate of the proportion using birth control at first intercourse, which is possibly sensitive and may be viewed as socially desirable behavior, but reasons for this large difference are not clear (see Figure 4). We also once again observed a higher estimated proportion of ever having been tested for HIV in the NSFG, along with ever having had a clinical breast exam, neither of which are particularly sensitive behaviors (see Figure 4). Whether these selected large differences arose from an inability of the weighting adjustments to correct for selection bias, measurement error, or general societal trends is presently unclear and warrants future investigation.
Importantly, the two replicates of the AFHS data collection occurred at different stages of the COVID-19 pandemic. Final survey weights were created for users of the publicly available AFHS data for working with replicate 1 exclusively (which included the modular design experiment) or working with replicates 1 and 2 combined (the full AFHS sample); weights were not created for replicate 2 as a stand-alone dataset. To check the possibility that estimates during replicate 1 (at the height of the pandemic) may have varied relative to the full AFHS sample, we compared weighted AFHS estimates based on replicate 1 (2020–2021) to those based on the full sample across both replicates (2020–2022) to see whether estimates were stable over the course of two years. We found only minor shifts in these estimates and no significant differences. Future attempts to perform additional comparisons with survey data that were also collected from national samples using web/mail approaches during the pandemic would be helpful for understanding whether the differences in selected estimates reported here were arising as a result of the pandemic or simply from the change in data collection mode.
Objective 3: Comparisons of Design Effects Owing to Complex Sampling
The final AFHS estimates following the weighting adjustments had substantially lower complex sampling design effects than the NSFG estimates. This was expected owing to the lack of cluster sampling in the AFHS sample design. Figure 7 shows the distributions of the estimated design effects computed for all of the estimates generated from each of the two studies: it illustrates the significant advantages of address-based sampling that involves only stratification and weighting for unequal probabilities of selection and nonresponse adjustment/calibration when using the web/mail approach for national data collection. The general distribution of the estimated design effects is shifted rather dramatically toward higher values for the NSFG, and most of the AFHS design effects are tightly clustered around 1.5 to 2.5. From a perspective of statistical efficiency, the AFHS approach can reduce the expected inflation in the variance of estimates due to complex sampling.
Objective 4: Comparisons of Costs per Completed Survey
Our cost comparison conditions on the presence of an established data collection infrastructure for each project (e.g., programmed computer-assisted personal interviewing instruments for the NSFG, a converted and programmed web instrument for the AFHS, development of respondent materials, general interviewer training), meaning that we do not account for these costs and consider only the costs associated with actual data collection activities. These costs include study-specific interviewer/call center staff hiring and training, interviewer management and support, travel (NSFG only), telephone charges, postage, printing, and respondent payments. We note that the fixed cost of interviewers (in the NSFG) could be reduced as a relative portion of the overall project budget if they worked longer, but survey organizations are having a very difficult (and costly) time in the current survey research environment retaining high-quality interviewers.5
When analyzing all of the data collection costs from the four most recent quarters of the NSFG (with 5,731 completed surveys) and the first replicate of the AFHS (which was closest in time to the last four quarters of the NSFG, with 998 completed surveys), we find that the NSFG cost was about $717 per completed survey, while the AFHS cost was about $417 per completed survey, corresponding to a cost savings of about $300 per completed survey. For the second replicate of the AFHS (with 1,371 completed surveys), the cost per completed survey was quite similar (about $406 per completed survey), resulting in an overall cost per completed survey of $410.63 for the entire study. Put differently, using the results from the first replicate and the average design effects for the two approaches computed as a by-product of Objective 3, we would need to spend $717 × 3.0 = $2,151 per effective completed survey in the NSFG and $417 × 1.5 = $626 per effective completed survey in the AFHS to obtain a sample yielding the same effective sample sizes achieved by these two studies. That is, the cost to produce the same variance estimates (for similar point estimates) would be more than three times as much for the NSFG as for the AFHS.
Thus, there are significant cost savings associated with the AFHS approach. We note that this comparison of costs has not been adjusted for inflation, meaning that these differences should be considered a lower bound on the possible cost savings. In these four quarters of the NSFG, total data collection costs were dominated by field/interviewer management (61%), interviewer travel (25%), and respondent incentives (7%). The AFHS costs were largely dominated by respondent mailings (50%) and respondent incentives (29%). These cost differences could offset the aforementioned limitation of the AFHS approach in terms of lower recruitment rates for more marginalized subgroups, in that these subgroups could be oversampled at reduced cost (provided that a given study requires more respondents from these subgroups).
Discussion
Summary of Results
This study demonstrates that a mixed-mode data collection approach employing web and mail modes of data collection exclusively with a national address-based sample, and including a screening phase to identify households with age-eligible individuals for the study, was able to produce estimates related to reproductive health and family formation similar to those from the in-person NSFG at a significantly lower cost per completed case and with greater statistical efficiency. Revisiting our four main objectives, we summarize our findings as follows:
Objective 1: The web/mail approach tends to recruit more non-Hispanic White and higher educated individuals to complete the survey, but this was not unique to this study, and careful weighting approaches can compensate for this potential source of nonresponse bias. Rates of item-missing data on sensitive survey questions were small and very similar between the two studies.
Meaning: Although the web/mail approach does underrepresent some subgroups of the general population, statistical weighting adjustments can help in increasing comparability to the total population. Future data collections using this web/mail approach could leverage the reduced costs associated with the approach (see Objective 4 below) to employ greater oversampling of selected population subgroups and help improve the number of cases available for analysis, and continued use of alternative modes of data collection (mail, telephone reminders) to follow up with nonrespondents should further help in improving sample composition (West et al. 2023). While greater oversampling may produce more variable survey weights, we demonstrated that net design effects associated with this approach are substantially reduced relative to face-to-face interviewing (see Objective 3 below), largely owing to the lack of a need for cluster sampling. Future research should continue to evaluate the design effects associated with this approach, especially in the case of significant oversampling. Future research could also consider more intensive follow-up procedures with cases in these subgroups or possible use of increased incentives if response rates from the subgroups tend to be lower (e.g., Wagner et al. 2023).
Objective 2: A majority (61.1%) of the estimates produced by the web/mail approach were statistically similar to those produced by the NSFG, with some of the larger differences in estimates likely introduced by the COVID-19 pandemic.
Meaning: We found high consistency in the point estimates produced by the two studies (with Pearson correlations above .98), and some of the smaller differences in estimates that still emerged as significant may have been due to the increased statistical power engendered by the large effective sample sizes. An investigation of the larger differences identified suggested possible effects of the COVID-19 pandemic lockdowns (e.g., far fewer reports of being in “excellent health” in the web/mail approach), but not all large differences had a clear explanation. Further research is needed to understand the sources of the large differences observed that were not necessarily consistent with our broad expectations related to COVID-19 and the alternative data collection modes (e.g., fewer reports of cohabitation in the web/mail approach). Did these differences arise as a result of nonignorable nonresponse bias in one of the two surveys, which weighting may not have corrected? Were measurement errors introduced by the different approaches? Was the web/mail approach simply picking up on general societal trends, or were these differences arising by random chance? Ongoing evaluations of the effects of using a web/mail approach need to apply a total survey error perspective and further investigate why such large differences might be arising.
Correct application of adjusted survey weights created comparable estimates across the two data sources for most measures. This means that web/mail sequential mixed-mode surveys have the potential to replicate the results of face-to-face surveys. However, not all measures were precisely replicated. The substantive domains in which measures were not replicated overlap with areas of life known to have been altered by the COVID-19 pandemic. Across all measurement domains considered here, differences or similarities may be due to social or economic changes rather than measurement changes. Best practice is to always run simultaneous data collections using alternative methods to deduce the extent to which differences are produced by the alternative methods of measurement (Axinn and Pearce 2006).
Objective 3: Design effects on the variances of estimates due to complex sampling were generally a fraction of those found in the NSFG, largely owing to the absence of area cluster sampling in the web/mail approach.
Meaning: The analysis comparing different complex sample design effects revealed large reductions in design effects engendered by using the web/mail approach. This represents an important increase in statistical efficiency, motivating greater use of this method. Despite the significantly lower response rates for the web/mail approach (9.9% and 11.1% for the two AFHS replicates, respectively, versus 63.4% for the 2017–2019 NSFG), which introduce a need for the variable weighting adjustments to do more “work” given the potential selection bias, overall design effects were still significantly lower for the AFHS, primarily because of the absence of cluster sampling.
Objective 4: The cost per completed survey was about $300 less in the web/mail approach compared with the NSFG approach when considering all data collection activities.
Meaning: The data collection costs per completed survey interview were much lower when using the web/mail sequential mixed-mode approach, although the costs were not negligible. For many specific topics, demographers may be able to use this method to create robust population measures even when they cannot afford more expensive face-to-face surveys.
Collectively, these findings suggest that applying the web/mail sequential mixed-mode data collection approach to a national probability sample can yield significant efficiency benefits while producing data that are largely similar to those created by a face-to-face interviewing approach. Some of the larger observed differences reflected broad expectations related to mode effects and effects of the COVID-19 pandemic lockdowns, but not all large differences could be easily explained, motivating additional future work in this area.
We acknowledge that future work also needs to consider comparisons of subgroup estimates between the NSFG and web/mail approaches like the AFHS. A related challenge is the absence of data from minors younger than 18 in the AFHS; the NSFG did measure minors between the ages of 15 and 17, as working with parents or guardians to obtain consent was possible given the face-to-face mode. Techniques for obtaining consent from minors when using strictly web/mail approaches require future consideration and evaluation, given the importance of this subgroup to demographic research. In addition to comparisons of subgroup estimates, comparisons of estimates describing relationships between variables will also be needed in future work (e.g., Axinn et al. 2021) to support the large-scale adoption of this type of web/mail approach.
Demographic Research Potential of This New Approach
The application of state-of-the-art responsive survey design tools to web surveys, along with the correct application of weighting adjustments for differential nonresponse, can potentially produce nationally representative population estimates that are similar to those generated by face-to-face surveys at a fraction of the cost, in less time, with more confidentiality. This potential creates many new opportunities for advancing population science. Alternatives to face-to-face surveys are clearly needed, given rapidly rising costs, declining response rates, and the difficulty of hiring and retaining interviewers. The COVID-19 pandemic has further accelerated that trend. The high privacy afforded by self-administration can engender more honest reporting, especially for potentially sensitive questions. The speed and efficiency of the web/mail approach can allow scientists to conduct family and fertility surveys that are harmonized to selected key measures from prior surveys (such as the NSFG), but at far lower cost. The lower costs and statistical efficiency of this approach provide the means for more frequent measurement, larger sample sizes for a similar cost, or more targeted studies, each of which has the potential to significantly enhance family and fertility research.
All of the same potential advantages apply to other areas of demographic research. Even as alternative forms of data on actual behaviors become more widely available, organic data on employment, school enrollment, purchases, health care use, or changes in address cannot be used to reflect an individual's plans, preferences, attitudes, or expectations that are likely to drive subsequent long-term decisions about work, education, consumption, health, or migration. Individual-level self-reports are a crucial component of the population-scale prediction of change and variation in the factors shaping longevity, health, migration, wealth, and well-being. A web/mail approach can potentially provide an accurate representation of the full population while taking advantage of the cost, speed, and relative privacy of self-administered surveys. The success of this approach is an important opportunity for many different areas of population research.
This approach can also be optimized for studies focusing on specific subgroups of the population. The combination of a first-step screening questionnaire done at a very large scale, followed by an in-depth study of a specific topic, affords many new opportunities for population science. In this article, we demonstrated age-based screening with oversampling of specific race/ethnicity subgroups to mimic the NSFG face-to-face design. A similar approach can be used to examine other age groups of the population or other race/ethnicity subgroups, but the approach can also be used to screen for other key subgroups of the population. Those could be sexual or gender identity subgroups, physical ability subgroups (such as the disabled), or personal experience subgroups (such as the divorced population). In addition, because this approach features an address-based sample selection procedure, it can be adapted to general population studies of smaller geographic areas, such as states, counties, or municipalities. Together, this flexibility in the application of the design to subgroups of the population opens many new opportunities for general population research.
Ongoing methodological research to continuously improve this approach and adapt it to evolving circumstances in society will be required. We have shown that a nascent version of this approach can produce estimates that are similar to those of a rigorously designed and executed face-to-face survey of the U.S. population. Further research on understanding the larger differences in estimates that we did observe is necessary, freed of the confounding factors introduced by the COVID-19 pandemic lockdowns. Future improvements in the approach will serve to make it even more valuable for population-scale research on a broad range of demographic topics.
Acknowledgments
This work was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) of the National Institutes of Health (grant R01HD095920) and by an NICHD center grant to the Population Studies Center at the University of Michigan (P2CHD041028). We also acknowledge the efforts of Andrew Hupp, Colette Keyser, Raphael Nishimura, Deji Suolang, Lingxi Li, and the Survey Research Operations team at the Institute for Social Research in making the American Family Health Study a reality.
Notes
For more details, see https://www.cdc.gov/nchs/data/nsfg/NSFG-2017-2019-UG-MainText-508.pdf.
The American Association for Public Opinion Research RR4 calculation used for the conditional main-stage cooperation rates in this study is RR4 = (I + P) / ((I + P) + (R + NC + O) + e(UH + UO)), where I = completed surveys, P = partially completed surveys, R = refusals and breakoffs, NC = noncontacts, O = others, UH = unknown if household/occupied housing unit, UO = unknown other, and e = estimated eligibility rate based on households in which eligibility could be determined.
Efforts to adjust for potential nonresponse bias via survey weighting, especially for web surveys that tend to have lower response rates, have the potential to inflate the variance of survey estimates. Adaptive survey design strategies have the potential to correct some of this bias during data collection or as a part of the data collection strategy (Peytchev et al. 2022; Rosen et al. 2014; Zhang 2022) and when combined with postsurvey weighting may further increase the efficiency of estimates (Särndal and Lundquist 2019; Zhang and Wagner 2024). Novel adaptive design approaches for web/mail data collections using national probability samples certainly need additional research consideration.
Personal communication from a September 2023 workshop sponsored by the Committee on National Statistics (CNSTAT) and titled “Examining the Effect of Interviewers on Longitudinal Survey Response Rates and Approaches to Improve the Hiring and Retention of High-Quality Interviewers” (a publicly available report is available from Shanna Breil ([email protected])).