The Additional Effects of Adaptive Survey Design Beyond Post-Survey Adjustment: An Experimental Evaluation

Abstract

Adaptive survey design refers to using targeted procedures to recruit different sampled cases. This technique strives to reduce bias and variance of survey estimates by trying to recruit a larger and more balanced set of respondents. However, it is not well understood how adaptive design can improve data and survey estimates beyond the well-established post-survey adjustment. This paper reports the results of an experiment that evaluated the additional effect of adaptive design to post-survey adjustments. The experiment was conducted in the Detroit Metro Area Communities Study in 2021. We evaluated the adaptive design in five outcomes: 1) response rates, 2) demographic composition of respondents, 3) bias and variance of key survey estimates, 4) changes in significant results of regression models, and 5) costs. The most significant benefit of the adaptive design was its ability to generate more efficient survey estimates with smaller variances and smaller design effects.

Keywords

adaptive survey design post-survey adjustment experiment bias and variance representativeness

The general population is increasingly reluctant to respond to survey requests and costs of collecting survey data are continuously rising. An important challenge facing surveys is to develop methods that encourage participation under budget constraints. Responsive and/or adaptive survey design (RASD) is a recruitment method that operates with these goals (Chun, Heeringa, and Schouten 2018; Groves and Heeringa 2006; Tourangeau et al. 2017). This paper reports the results of an experiment that evaluated the effect of adaptive survey design on selected survey outcomes and estimates.

Generally speaking, RASD refers to changing and tailoring recruitment procedures and protocols for different sample cases (Tourangeau et al. 2017). One example strategy is differential incentives – higher incentives are assigned to sample cases that are more reluctant to respond (Brick and Tourangeau 2017; Jackson, McPhee, and Lavrakas 2020; Link and Burks 2013; Peytchev, Pratt, and Duprey 2020; Singer, Groves, and Corning 1999; Singer and Ye 2013). To identify reluctant cases, survey practitioners rely on auxiliary information known for the entire sample and the outcomes of previous surveys (Tourangeau et al. 2017). Some other strategies include offering different modes to different cases, following up with different contact efforts, assigning better interviewers to harder-to-recruit cases, and prioritizing the pursuit of some cases over others (Bergmann and Scherpenzeel 2020; Brick and Tourangeau 2017; Coffey, Reist, and Miller 2019; Luiten and Schouten 2013; Rosen et al. 2014; van Berkel, van der Doef, and Schouten 2020; Wagner et al. 2012; West, Chang, and Zmich 2021).

The practices of RASD are grounded in leverage-salience theory (Groves, Singer, and Corning 2000; Schouten, Peytchev, and Wagner 2017). This theory posits that people are heterogeneous in their interests in survey attributes. For any particular survey, some like the topic, some value the opportunity of sharing their opinions, and others are motivated by incentives. People are swayed to participate when the survey request is presented in a way that matches the particular interests of the person. Under this theoretical framework, a standardized and homogeneous recruitment protocol is not optimal. Instead, survey practitioners should take active steps to tailor and adapt recruitment procedures to cater to the interests of different sample cases. This is the crux of RASD.

RASD is also regarded as a budget-friendly technique to reduce survey biases during recruitment (Brick and Tourangeau 2017; Groves and Heeringa 2006). In an ideal world, nonresponse error may be minimized if expensive methods are used on the entire sample to maximize response rates¹ (Brick and Tourangeau 2017); but in practice this is often cost-prohibitive. Working with a budget, RASD aims to cost-effectively reduce biases by distributing more resources to where they are more needed. RASD recognizes that some cases are more valuable to survey estimates than others because they are more likely to be under-represented in the response data, and their under-representation risks introducing biases into survey estimates (Brick 2013; Luiten and Schouten 2013; Schouten et al. 2017).

Taken together, RASD is a recruitment method that aims to improve survey data and survey estimates. Successful implementations of RASD increase the response rate and balance response propensities across sample subgroups. The former may contribute to reducing variances and the latter to minimizing biases of survey estimates (Brick and Tourangeau 2017; Schouten et al. 2016).

These potential benefits of RASD on survey estimates, however, are not unique. They overlap with well-established post-survey adjustment methods (Kalton and Flores-Cervantes 2003). For example, adjustment methods like calibration can correct for biases by matching the distributions of adjustment variables of the response dataset to those of the target population (Kish 1995). Calibration also potentially reduces variances if the adjustment variables are highly correlated with the survey variables of interest (Little and Vartivarian 2005). The overlapping objectives of RASD and post-survey adjustment raise an important question (Beaumont, Bocci, and Haziza 2014; Brick and Tourangeau 2017; Särndal and Lundquist 2019; Tourangeau et al. 2017):

What is the added benefit of RASD above and beyond post-survey adjustment?

Since post-survey adjustment is almost always implemented as the final stage of data production, RASD would be worth the cost and effort only if it can improve the quality of survey estimates beyond the capacity of post-survey adjustment. Evaluating the additional benefits of RASD is the focus of this paper.

Additional Benefits of RASD Beyond Post-Survey Adjustment

In the current literature, evidence of the additional benefits of RASD mostly comes from theoretical and simulation studies (Särndal and Lundquist 2014, 2017, 2019; Schouten et al. 2016). These studies suggest that combining RASD and post-survey adjustment can lead to smaller biases and variances of survey estimates than post-survey adjustment alone (Särndal and Lundquist 2014, 2017, 2019).

In terms of variance, if RASD promotes sample representativeness during recruitment, it would diminish the reliance on weighting and reduce variation in the final survey weights, which in turn would lower estimated sampling variances. In terms of bias, RASD can supplement post-survey adjustment because the latter is limited for only addressing biases that are related to observable auxiliary variables². In contrast, sample balancing during recruitment has the potential of correcting biases under the missing not at random (MNAR) situations (i.e., nonignorable nonresponse) (Brick and Tourangeau 2017; Särndal and Lundquist 2014, 2017, 2019).

To illustrate, suppose that men are less likely to participate than women if recruited with a standard protocol. Post-survey adjustment fixes the gender imbalance by assigning large weights to male respondents, but a likely side-effect is inflated sampling variances. Additionally, the small number of male respondents means that the data are less likely to capture a full range of variation in other unobserved characteristics related to men. This corresponds to an MNAR situation and is out of the scope of typical applications of post-survey adjustment. Implementing RASD during the recruitment may be useful for compensating both limitations. By proactively recruiting men with more intensive effort, RASD may increase the proportion of male respondents. With a more balanced dataset, the nonresponse adjustment weights are less extreme, reducing the risk of inflated variance. With a larger number of male respondents, the data may have a better chance of balancing the distributions of the unobserved features, reducing the risk of biases.

These theories and simulation studies have laid the foundation for the value of RASD. However, they mostly assumes that RASD has been efficiently designed and implemented, which may not always be true in practice. The auxiliary information available for RASD may be incomplete, resulting in recruitment effort being used on the “wrong” cases (Burger, Perryck, and Schouten 2017; Zhang 2022). Also, the strategies designed to motivate respondents may be ineffective and, therefore, not leading to any changes in the data composition (Lavrakas, Jackson, and McPhee 2018). Under these practical constraints, RASD may not be able to deliver its theoretical benefits. In fact, adding inadequate RASD into the data production may backfire and exacerbate biases, as shown by a simulation study (Zhang 2022).

The discussion of the additional benefits of RASD needs to be settled by empirical evidence based on real surveys (Chun et al. 2018). However, such evidence is scarce in the current literature because several key elements are needed to address this issue. First, the focus should be on whether RASD is useful in addition to post-survey adjustment, but to the best of our knowledge, research in real surveys has only investigated whether RASD is useful by itself, for good reasons. Systematic research on RASD has only started in the recent decades (Groves and Heeringa 2006; Wagner 2008), so naturally an initial step was to establish whether tailoring recruitment procedures could have the direct results as intended. But as the technique has matured, it has become important to incorporate RASD as a link in the production process and evaluate its performance against well-established post-survey adjustment methods (Brick and Tourangeau 2017; Särndal and Lundquist 2019; Tourangeau et al. 2017).

Second, an experimental design is needed to compare the results of combining RASD and post-survey adjustment with the results of post-survey adjustment alone. Unfortunately, this experimental design can be in conflict with the practical goal of achieving the best survey outcomes. Survey organizations often opt for implementing RASD on the full sample to improve the recruitment, rather than holding out a control group to be recruited by the theoretically less efficient standard protocols (Axinn, Link, and Groves 2011; Bergmann and Scherpenzeel 2020; Peytchev et al. 2020; Rosen et al. 2014; van Berkel et al. 2020). Without a control group, even though researchers may still be able to analytically tease out the effect of RASD on biases (Peytchev et al. 2020; van Berkel et al. 2020), it is no longer possible to study the effect of RASD together with post-survey adjustment.

Third, to assess the results of combining RASD and post-survey adjustment, the evaluation metrics need to focus on errors in survey estimates. However, much research has used response rates and sample representativeness (e.g., the R-indicator; Schouten, Cobben, and Bethlehem 2009) to evaluate the performance of RASD (Bergmann and Scherpenzeel 2020; Coffey et al. 2019; Jackson et al. 2020; Luiten and Schouten 2013; Lynn 2016; Wagner et al. 2012). While these survey-level indicators succinctly communicate the general quality of the response data, they fall short of capturing the quality of survey estimates, especially after post-survey adjustment³. To examine the combined effect of RASD and post-survey adjustment, the evaluation metrics should be the bias and variance of survey estimates.

The Current Research

Motivated by these gaps in the literature, we conducted an experiment where all key elements needed to evaluate the added benefits of adaptive design were assembled. The experiment enabled comparisons of survey outcomes and weighted survey estimates between the data collected with and without the adaptive design. Notably, our study context also differs from the previous application of RASD in two important respects.

First, the current adaptive design was implemented in a low-budget survey. This is an important difference from many other applications in large-scale well-funded surveys, which typically yield high response rates and leave limited room for improvement by RASD (Peytchev et al. 2020). For example, a 2016 subsample of the National Household Education Survey experimented with an adaptive design on incentives; its total sample contained 206,000 addresses, and the final response rate was about 64% (Jackson et al. 2020). The 2015 National Survey of College Graduates tested a dynamic adaptive design on modes and contact effort; it has a rotating panel design with 124,000 cases for any given round, and the final response rate was between 65–71% (Coffey et al. 2019). In contrast, lower-budget surveys work with smaller sample sizes and commonly have less than 30% response rates (Link and Burks 2013; Mercer et al. 2015). The literature has called for broadening the audience of RASD and researching its application on surveys with limited budgets.

Second, our operationalizations of RASD differ from many other strategies that are viable mostly in well-funded surveys. For example, because we were working with a short fielding period and a small staff, we were not able to dynamically adjust the recruitment protocols in response to the data composition in real-time (e.g., Axinn et al. 2011; Coffey et al. 2019; Murphy, Biemer, and Berry 2018; Wagner et al. 2012; West et al. 2021). Instead, we experimented with a set of strategies that were designed before the recruitment started. We combined the strategy of differential incentives with the strategy of tailoring invitation materials in mailings. The former is quite common (e.g., Han, Montaquila, and Brick 2013; Jackson et al. 2020; Link and Burks 2013; Peytchev et al. 2020; Singer et al. 1999; Singer and Ye 2013), but the latter is a relatively novel technique (Lynn 2016).

The tailored mailings highlighted different aspects of the survey for sample cases who have different characteristics. This strategy mirrors a classic technique used by experienced interviewers. Just like interviewers adapt their behaviors and languages to the perceived features of sample units (Groves and McGonagle 2001), the mailings adapt the text and images. Of course, the adaptive mailings are less flexible than interviewers because the decisions have to be made for subgroups of samples based on limited auxiliary information. Nonetheless, the strategy of tailoring invitation content was found to be effective in motivating reluctant respondents in a panel survey in UK (Lynn 2016). Other evidence indirectly suggests that tailored mailings can be a viable option for attracting respondents (Fumagalli, Laurie, and Lynn 2013; Liu et al. 2016; Lynn 2017; Christensen, Lynn, and Tolstrup 2019), but the technique has not been widely tested.

To experimentally evaluate the added benefits of adaptive design beyond post-survey adjustment, we compared the survey outcomes and survey estimates in five ways:

Response rate. This is the most direct indicator of the effect of adaptive design. The adaptive design would generate a higher response rate, if adaptive protocols were more attractive than the standard protocols as intended.

Sample composition. The goal of adaptive design is to motivate respondents who otherwise are unlikely to respond to the survey request. If the goal was successfully achieved, the adaptive design would recruit a more balanced sample of respondents than the standard design.

Errors in univariate estimates. The ultimate goal of survey data collection is to produce unbiased estimates with small variances. The effects of adaptive design are the most meaningful if survey estimates based on the adaptive design data have smaller biases and variances.

Conclusions drawn from regression models. For multivariate analyses, we focus on comparing the conclusions drawn from the regression models based on the adaptive design and the standard design data. If adaptive design recruits a more balanced sample than the standard design, its data may better capture theoretical associations. However, theories about the effect and necessity of nonresponse adjustment on multivariate analysis are not conclusive (Peytchev 2013; Axinn et al. 2011). We do not have strong expectations about the added effects of adaptive design on the regression models, and this part of our analysis is exploratory.

Costs. Any benefits of adaptive design should be evaluated juxtaposed with the costs. We discuss a few factors that made the adaptive design more expensive than the standard design.

Methods

Wave 12 of the Detroit Metro Area Communities Study (DMACS) is the vehicle of the current experiment. Launched in 2016, DMACS is a panel survey of representative samples of adult residents in Detroit. Wave 12 was fielded between January and March 2021. The questionnaire asked about experiences with the COVID-19 pandemic, perception of neighborhoods, assessment of city services, health and healthcare, and employment.

Overview of the DMACS Survey

DMACS used address-based samples. Wave 12 included a refreshment sample ( $n_{r e f r e s h} = 9329$ ) and an established panel sample consisting of panelists who responded at least once to the previous waves ( $n_{e s t a b l i s h e d} = 1730$ ). The refreshment sample was a stratified random sample. It was stratified by thirteen geographic areas to oversample the Hispanic population and to ensure stable sample sizes in Strategic Neighborhood Funds (SNF) neighborhoods. The panel sample was recruited with several different stratification designs in the past waves.

The established panelists were invited to participate in the current wave by emails, text messages, and mailings; the new cases were invited only by mailings. The timing of the contact attempts is summarized in Table 1. Sampled individuals could participate through a self-administered web survey or an interviewer-administered telephone interview. About 13% of responses were collected by the telephone mode; the mode choice did not differ between the experimental and control group ( $χ^{2} (1)$ = 1.04, $p$ = 0.31). (Offering this additional telephone mode helped to recruit respondents that could not be reached by the web mode, as explained in detail in Appendix I.)

Table 1.

Timing of Contact Attempts for the Panel Sample and the Refreshment Sample.

	Panel sample		Refreshment sample
Day 1	Email/text: initial invite	Day 1	Mail: initial letter invite
Day 8	Email/text: reminder #1
Day 9	Mail: initial letter invite
Day 21	Email/text: reminder #2	Day 22	Mail: reminder postcard #1^a
Day 42	Email/text: reminder #3 Mail: reminder postcard #2	Day 42	Mail: reminder postcard #2
Day 59	Survey close	Day 59	Survey close

The erroneous implementation of adaptive design happened here for postcard #1.

Experiment

The panel and the refreshment sample were separately randomized into an experimental group and a control group with a 70%/30% split⁴. Cases in the experimental groups were recruited with an adaptive design, and cases in the control groups were recruited with a homogeneous design. Below, we first describe the experiment on the refreshment sample. The experiment on the panel sample was largely the same with minor modifications.

Refreshment Sample - Adaptive design in the experimental group. For the 70% of cases in the experimental group, the adaptive design includes three strategies:

Promising a higher incentive to sample cases who have lower predicted response propensities (extra $5). This strategy aims to differentially encourage cases that are less likely to respond. The goal is to reduce biases by smoothing response propensities across sample subgroups.

Highlighting different aspects of the DMACS survey to different groups of sample cases in the invitation materials; and

Providing region-specific COVID information to motivate responses by putting the survey request in context. Strategy 2 and 3 both try to frame the survey in a way that is likely to be relevant to the different sampled individuals. The goal is to increase the overall response rates by making the survey request attractive.

These three strategies were bundled in the adaptive design, meaning that we estimated their combined effect but not the individual effect of each strategy.

The key to implementing the three strategies is the ability to categorize sampled cases into subgroups based on distinguishable characteristics prior to the recruitment. We performed the categorization at the areal level and divided the city of Detroit into four regions (Figure 1).

- The Southwest (purple) has a high proportion of Hispanic population.

- Downtown (blue) is the commercial area including Wayne State University.

- Both the East (green) and the West (yellow) are general residential areas.

Figure 1.

Dividing the city of detroit into four regions for adaptive design.

The areal division was based on two criteria. First, the regions need to have distinguishable features. We use variables from the Census Planning Database⁵ (e.g., percentage of population with no high school diploma) and the National Neighborhood Data Archive⁶ (e.g., proportion of high-density developed areas) to cluster Detroit block groups into different types of neighborhoods. Second, the regions need to be geographically homogenous and meaningful because we use the division to develop region-specific COVID messages (strategy 3). We reorganize the block group clusters to become geographically contiguous. Appendix II explains the analysis and steps we took in detail.

In the adaptive design, sample cases in the four regions received letters and postcards with different languages and images. The key design elements and their adaptive objectives are listed in Table 2. For example, cases in the Southwest were offered an increased incentive ($30 instead of $25) because the region had a lower than average response rate in a previous DMACS wave (strategy 1). Then, because the Southwest region is distinguishable for its high concentration of ethnic minorities, we described DMACS as “an ongoing survey that asks residents how the city can best meet the needs of people of many races and ethnicities who live in Detroit” in the invitation letter mailed to cases in this region (strategy 2). We also designed the postcard fronts to reflect the multicultural feature of the region (strategy 2). Further, we analyzed data on health-related topics from a previous DMACS wave and discovered that residents in the Southwest had a significantly lower insurance coverage rate. We included this result together with a map highlighting the region as a COVID-relevant message at the bottom of the invitation letter (strategy 3). Appendix III explains the analysis we performed and the design of the recruitment protocols in each region in detail.

Table 2.

Design Elements of the Adaptive Design in the Experimental Group and the Homogeneous Design in the Control Group.

Adaptive design (experimental group)				Homogeneous design (control group)
Southwest	Downtown	East	West
Initial letter
Operationalization of strategy 1
$30 promised incentive		$25 promised incentive		$25 promised incentive
Operationalization of strategy 2: In the first paragraph, the DMACS was introduced slightly differently. (The control group language was taken from a previous wave.)
“DMACS is an ongoing survey that asks residents how the city can best meet the needs of people of many races and ethnicities who live in Detroit. The current survey is designed to learn about how Detroiters experience neighborhoods, safety and crime, COVID prevention and treatment, and related topics.”	“DMACS is an ongoing survey that asks residents what they feel are the important issues related to residential and commercial growth in Detroit. The current survey is designed to learn about COVID prevention and treatment, neighborhood satisfaction, safety and crime, and related topics.”	“DMACS is an ongoing survey that asks residents about neighborhoods, quality of life, and other topics important to Detroiters and their families. The current survey is designed to learn about community priorities, city services, safety and crime, and COVID prevention and treatment.”		“DMACS is an ongoing survey that asks Detroit residents about their priorities and feelings on important topics from crime and policing to job security to neighborhood development. The topic of this survey is COVID prevention and treatment, as well as Detroiters’ perceptions of changes taking place in their neighborhood and around the city.”
Operationalization of strategy 3: The image and fact report a region-specific finding about COVID-19, placed at the bottom of the one-page letter. (The bold markings were shown to the sample cases.)
According to our most recent survey from the end of October 2020, a substantially higher percentage of Detroiters in Southwest Detroit (31%) are not covered by any insurance or health care plan compared to the rest of Detroit (14%).	According to our most recent survey from the end of October 2020, a substantially higher percentage of Detroiters in Downtown and Midtown (80%) considered COVID-19 to be a very serious problem for their communities, compared to 67% in the rest of Detroit.	According to our most recent survey from the end of October 2020, 10% of Detroiters on the Eastside considered getting medication a major challenge, compared to only 5% in the rest of Detroit.	According to our most recent survey from the end of October 2020, 42% of Detroiters on the west side reported having friends or family members who died from COVID-19, compared to only 30% in the rest of Detroit.	None
Postcard #1 front
Operationalization of strategy 2: The word clouds in the background highlight different words; the maps in the bottom-right corner are different.

Postcard #2 front
Operationalizing strategy 2: The images are selected to be relevant to the lifestyle of the different region. (The control group uses a meaning-neutral image.)

Note. a The image shows the original design, but this design was not used in practice because sample cases in the East group incorrectly received the Downtown postcard.

We note a mistake that happened during the fielding of the adaptive design for the refreshment sample. The printing company mistakenly used the Downtown template for the East group in reminder postcard #1. Appendix III provides more details on this mistake. The wrong postcard was mailed to 1,954 addresses.

Refreshment Sample - Homogeneous design in the control group. For the 30% of cases in the control group, a homogeneous design was used. All cases were recruited with the same materials listed in the rightmost column of Table 2.

Panel Sample. The subsections above describe how the adaptive design was operationalized for the refreshment sample. The adaptive design for the panel sample was similarly implemented with the same objectives. Panelists in the four regions were again promised different incentives and received invitation materials that depicted the surveys differently. The minor modifications were because the recruitment channels were different—panelists could receive text messages and emails. For example, in the text messages, the sentences describing DMACS were shortened, and the map image associated with the COVID-19 findings could not be included through the messaging service.

The sample size breakdown by experimental grouping and region is reported in Table 3.

Table 3.

The Sample Breakdown by Experimental Grouping and Regions.

	Adaptive design (experimental group ^a)				Homogeneous design (control group ^a)
	Southwest	Downtown	East	West	Homogeneous design (control group ^a)
Panel	81	113	419	594	523
Refreshment	928	900	1927	2788	2786
Combined	7750				3309

Note. ^aThe randomization of the experimental and control group was stratified by the four regions.

Post-Survey Adjustment

The goal of the current research is to evaluate the impact of adaptive design on survey data beyond the effect of post-survey adjustment. To this end, we performed post-survey adjustment separately on the data collected under the adaptive design and the homogeneous design. We then computed survey estimates based on the two weighted samples.

We adopted the post-survey adjustment procedure from the DMACS data production. The adjustment includes two stages. The first stage uses the technique of post-stratification to account for the over-/under-sampling and the uneven response rates across geographic areas (strata). Specifically, respondents of the thirteen geographic areas are calibrated to the corresponding population sizes. The second stage uses raking to weight the respondent samples to match with the 2019 ACS-based population distributions on gender and age, education, race and ethnicity, and household income.

Analytical Plan

The adaptive design experiment was evaluated in terms of five outcomes: 1) response rates, 2) demographic composition, 3) bias and variance of key survey estimates, 4) changes in significant results of regression models, and 5) costs.

First, we compared the unweighted response rates of the experimental and control group for the refreshment sample and the unweighted conditional response rates⁷ of the experimental and control group for the panel sample (AAPOR Response Rate 1; AAPOR 2016).

Second, we compared the experimental and the control data in their representativeness in demographic features. The data collected from the panel and the refreshment sample are combined because the two samples were designed to be used together for representing the Detroit population. The analyses of sample representativeness are based upon the first stage weights; that is, weights that are post-stratified to geographic-area population totals.

Specifically, the representativeness was evaluated on five demographic variables: gender, age, education, race and ethnicity, and household income. The distributions of these variables based on the experimental and the control data were separately compared with the population distributions based on the 2019 American Community Survey (ACS). The unrepresentativeness is captured as deviation in the sample distribution through an imbalance (IMB) score (Brick et al. 2021; Särndal and Lundquist 2019):

I M B = \sum_{j = 1}^{J} \frac{{(P_{j} - W_{j})}^{2}}{W_{j}}

where j indicates a specific category of each demographic variable;

P_{j}

indicates the estimated proportion in category

j

based on the sample data; and

W_{j}

indicates the estimated population proportion in category j based on the 2019 ACS. A larger IMB score indicates a less representative sample distribution.

To compare the experimental group IMB with the control group IMB, we derived a variance estimator of IMB with a first degree Taylor polynomial linear approximation. The detailed derivation of the variance estimator is included in Appendix IV. The standard error of IMB is then computed as the square root of the estimated variance.

v a r (I M B) = \sum^{J} {(2 \frac{P_{j}}{W_{j}} - 2)}^{2} \frac{P_{j} (1 - P_{j})}{n} + 2 \sum_{k, j \in J; k < j} (2 \frac{P_{k}}{W_{k}} - 2) (2 \frac{P_{j}}{W_{j}} - 2) \frac{- P_{k} P_{j}}{n}

S E (I M B) = \sqrt{v a r (I M B)}

Third, we compared the mean and variance estimates of key survey variables based on the experimental data and the control data. Again, respondents from the panel and the refreshment sample are combined. The key survey variables include the percentage of the population that are homeowners, speak a language other than English, have access to a computer, have access to the internet, are divorced, and have no insurance.

To estimate the variance of the mean estimates and the variance of the sampling variance estimates, we draw 5,000 bootstrapping samples on the experimental data and the control data separately. (For the analysis presented here, bootstrapping is more straightforward than analytically derived variance estimator for accounting for the unequal sample sizes of the experimental and control data for estimating the variance of variance estimates, and comparing the results of multivariate analysis. Details are explained below.) The bootstrapping was stratified by the four Detroit regions. On each of the bootstrap samples, we conducted post-survey adjustment and computed weighted survey estimates. We then pooled the estimates across repetitions to get distributions of mean estimates and sampling variance estimates.

To evaluate the biases in DMACS mean estimates, we used 2015–2019 ACS estimates as benchmarks. We emphasize that the ACS estimates are only for reference. They are not a gold standard comparison because ACS differed from DMACS in time, item wording and, sometimes, target populations (see Appendix V for details).

Other than biases, we also compared variances estimates based on the experimental data and the control data. Recall that the experimental group got 70% of the total sample and the control group got 30%, so the experimental data naturally have smaller sampling variances. To account for the influence of the unequal sample sizes, we took the ratio between the variances estimated on the experimental data and the control data. Across the bootstrap replicate samples, if the ratios were consistently smaller than $3 / 7$ , it suggested that the adaptive design tended to result in more efficient survey estimates.

An alternative way to compare sampling variances is through design effects. For an estimator $θ$ , the design effect is the ratio of the estimated variance of $θ$ accounting for complex survey design to a hypothetical variance of $θ$ as if the survey design were a simple random sample (SRS), i.e., $\frac{v a r (θ_{c o m p l e x})}{v a r (θ_{S R S})}$ . We used the design effect function in survey package in R. The SRS variance in the denominator is based on a with-replacement formula. We compared the design effects of the estimates based on the experimental and the control data across bootstrap replications. Smaller design effects indicate more efficient estimates.

Fourth, besides univariate estimates, we evaluated whether the adaptive design affected multivariate analysis. We focused on three regression models, predicting 1) the intention of getting the COVID-19 vaccine, 2) neighborhood satisfaction, and 3) the likelihood of being a personal owner of the current residence, respectively. These outcome variables were important topics in the DMACS survey. The goal of the model building is methodological. The predictors are chosen to capture relationships of different magnitudes, not to answer substantively significant and unexplored questions. In each model, two types of predictors were included—the ones that intuitively should be related to the outcome variables, and the ones whose associations with the outcome variables are uncertain. We investigated how conclusions change for the predictors that are related or unrelated to the outcome variables.

Because social scientists often use statistical significance to draw conclusions from regression analysis, we compared whether the data collected under the adaptive design and the homogeneous design led to different conclusions about the significance of the predictors. Because the adaptive design was implemented on the larger portion of the sample than the homogeneous design (70% vs. 30%), the former has more power to detect significant results. To account for the influence of the unequal sample sizes, we created pseudo standard errors (SE) for the homogeneous design data by dividing the estimated SE by the square root of the sample size ratio (i.e., pseudo-SE = $SE / \sqrt{\frac{7}{3}}$ ). This adjustment is similar to the above-described approach for comparing sampling variances. The pseudo-SEs were used to determine the significance of the predictors based on the homogeneous-design data.

We relied on the bootstrap samples to estimate variations in the conclusions of the regression models. The significances of the predictors were pooled across repetitions to indicate how stable the results were.

Finally, we compared the costs of the adaptive design and the homogeneous design. We report the monetary cost per respondent. We also discuss some factors that cannot be easily distinguished between the two designs but influenced the total survey costs (Olson, Wagner, and Anderson 2020).

Findings

Response Rates

Overall, DMACS collected responses from 2,237 Detroit residents. The panelists responded at a much higher rate than the new cases. Table 4 reports the (conditional) response rates and the corresponding confidence intervals. For the panel sample, the conditional response rate of the experimental group has a small but non-significant edge over the control group (0.739 vs. 0.706). For the refreshment sample, the difference between the experimental and the control group is negligible (0.105 vs. 0.103).

Table 4.

AAPOR Response Rate 1^a of the Panel and the Refreshment Sample by Experimental Randomization.

	n	RR	CI		n	RR	CI
Panel sample
Experimental	1207	0.739	[0.713, 0.763]	Control	523	0.706	[0.664, 0.735]
Southwest	81	0.741	[0.629, 0.829]	- Southwest	35	0.657	[0.477, 0.803]
Downtown	113	0.699	[0.605, 0.780]	- Downtown	51	0.784	[0.643, 0.882]
East	419	0.776	[0.732, 0.814]	- East	178	0.719	[0.646, 0.783]
West	594	0.721	[0.682, 0.756]	- West	259	0.687	[0.626, 0.742]
Refreshment sample
Experimental	6543	0.105	[0.098, 0.113]	Control	2786	0.103	[0.092, 0.115]
Exp. -no East	4616	0.102	[0.094, 0.111]	Control – no East	1961	0.101	[0.087, 0.116]
Southwest	928	0.109	[0.090, 0.131]	- Southwest	391	0.084	[0.060, 0.118]
Downtown	900	0.132	[0.111, 0.157]	- Downtown	378	0.132	[0.101, 0.172]
East ^b	1927	0.112	[0.099, 0.127]	- East	825	0.105	[0.086, 0.129]
West	2788	0.090	[0.080, 0.101]	- West	1192	0.097	[0.081, 0.116]

Note. n indicates the sample size; RR indicates the response rate; CI indicates the confidence interval.

We use AAPOR response rate 1 because we do not have good information to estimate the eligibility rate for the eligibility-unknown cases. The response rates are conditional response rates for the panel sample.

The adaptive design was implemented erroneously for the East group in refreshment sample. The comparison between control group and experimental group thus includes a version in which the East group was excluded.

Except for the overall response rates, Table 4 also breaks down the survey outcomes by region. The adaptive design achieved slightly higher response rates than the homogeneous design, with three exceptions—the Downtown panel, the Downtown refreshment, and the West refreshment samples. However, no differences in response rates were statistically significant, as shown by the largely overlapping confidence intervals of the experimental and the control conditions. Particularly for the panel samples in the Southwest and the Downtown region, the point values of the response rate are not very informative because of the small sample sizes.

Demographic Composition

Another purpose of adaptive design is to improve the representativeness of the respondent sample. Table 5 reports the imbalance (IMB) scores of the demographic variables based on the experimental and the control data.

Table 5.

Imbalance (IMB) Scores of Demographic Variables Based on the Experimental Data and the Control Data.

	Adaptive design (experimental group)			Homogeneous design (control group)			p-value^a
	IMB	SE	CI	IMB	SE	CI	p-value^a
Gender * Age (8 categories)	0.215	0.022	[0.172, 0.259]	0.191	0.034	[0.125, 0.257]	0.553
Education (4 categories)	0.242	0.026	[0.190, 0.294]	0.334	0.054	[0.227, 0.440]	0.125
Race and ethnicity (5 categories)	0.050	0.013	[0.023, 0.076]	0.052	0.023	[0.007, 0.098]	0.940
Income (5 categories)	0.041	0.011	[0.020, 0.062]	0.023	0.013	[-0.003, 0.049]	0.291

Note. ^a The p-values are based on two-tailed t-tests using pooled SE calculated as $\sqrt{S E_{1}^{2} + S E_{2}^{2}}$ .

One noticeable difference in IMB is in the education distribution. The data collected under the adaptive design had a smaller IMB than the data collected under the homogeneous design (0.242 and 0.334), though this difference is not statistically significant (p = 0.125). That is, based on the currently realized samples, the adaptive design produced a more balanced distribution in education than the homogeneous design. This balance is likely to have a favorable impact on the later weighting step because less variable weights are needed for the adaptive design. However, because the difference in IMB is not statistically significant, the current evidence cannot reject the null hypothesis that the difference was due to chance.

Key Survey Estimates (Univariate Analysis)

Figure 2 illustrates the bootstrapping results of the weighted estimates based on the experimental data (collected with the adaptive design) and the control data (collected with the homogeneous design). The left column of Figure 2 shows the distributions of weighted estimates across the $5000$ repetitions. The vertical lines at the center of the distributions are the means of the distributions, and the horizontal bars are the 95% confidence intervals. The black vertical lines are the benchmarks based on the ACS data. As shown by the location of the center of the distribution, the control estimates are closer to the ACS benchmarks than the experimental estimates, except for the variables on homeownership and language. However, the confidence intervals of the experimental and control distributions largely overlap. The current results provided little evidence that the adaptive design reduced biases more than the homogeneous design.

Figure 2.

Left: weighted mean estimates of survey variables based on bootstrapping the experimental data and the control data. Middle: Ratio of estimated variances based on bootstrapping the experimental data and the control data. Right: Design effects.

The middle column of Figure 2 shows the ratios of the experimental variances and the control variances across repetitions. The vertical line is at $3 / 7$ . The percentage values in the graphs indicate the proportion of the ratios smaller than $3 / 7$ . For the variables on language, access to a computer, access to the internet, and insurance, the distributions clearly locate to the left of the $3 / 7$ line. These results suggest that the variances based on the adaptive design data tend to be smaller than the variances based on the homogeneous design data.

The right column of Figure 2 reports variable-level design effects. Smaller design effects indicate larger effective sample sizes, as well as less variance inflation due to the complex sample design. The design effects of the experimental data are generally smaller than those of the control data. Thus, the results echo the pattern drawn from the variance ratios in the middle column—the adaptive design tended to result in more efficient estimates than the homogeneous design. Alternatively, the differences in design effects can be interpreted in terms of effective sample sizes (i.e., $Effective sample size = \frac{sample size}{design effect}$ ). Given the same sample size, for the variable “other language”, for example, the effective sample size of the adaptive design would be $1.6 (= \frac{2.68}{1.68})$ times larger than that of the homogeneous design, again suggesting the efficiency of the adaptive design data.

Coefficients of Regression Models (Multivariate Analysis)

Three multiple/logistic regression models were fitted to the weighted experimental data and weighted control data, separately. In each model, two types of predictors were included—the ones that intuitively should be related to the outcome variables (i.e., intuitive predictors), and the ones whose associations with the outcome variables are uncertain (i.e., uncertain predictors).

For the outcome variable on COVID-19 vaccine intention (7-point Likert scale), the predictors include demographic variables and trust of information from different sources. Three variables are considered intuitive predictors: Older residents should express a higher intention of getting vaccinated because COVID-19 is more lethal to the older population. People who distrust doctors and the U.S. government should be less inclined to get vaccinated because the vaccine is a medical product and is authorized by the U.S. Food and Drug Administration. On the other hand, the uncertain predictors include other demographic features, trust in faith leaders, trust in co-workers, schoolmates, and other acquaintances, and trust in contacts on social media; their associations with vaccine likelihood are less clear.

For the outcome variable on neighborhood satisfaction (7-point Likert scale), the intuitive predictors are perceived neighborhood reputation and how safe one feels to walk in the neighborhood. The uncertain predictors are homeownership and whether one has access to a computer at home.

The last outcome variable on personal homeownership (binary variable) was constructed by matching panelists’ names with taxpayer names of the panelists’ addresses listed in property tax records. If a panelist's name is included as a taxpayer in the records, he/she is considered a personal owner of the residence. The intuitive predictors are income and age: Higher income and older people should be more likely to be a personal owner of their home than lower-income and younger people. The uncertain predictors are gender and race and ethnicity.

Table 6 reports the results of the regression models across repetitions. The Average Coefficient and the Average SE columns report the coefficients and standard errors averaged across bootstrap samples. The % of Significance columns report the percentage of the 5000 bootstrap samples whereby the predictors emerged as significant. The intuitive predictors are bolded.

Table 6.

Results of Multiple Regression Models Based on the Experimental Data and the Control Data.

	Adaptive design (experimental group)			Homogeneous design (control group)			Diff. % of sig.
	Average Coefficient	Average SE	% of sig.	Average Coefficient	Average SE^a	% of sig.^a	Diff. % of sig.
Model 1 (Linear Regression): Outcome = Likelihood of getting COVID-19 vaccine (1 = not at all likely, 5 = very likely)
Age	0.039	0.005	100%	0.038	0.005	100%
Distrust doctor	-1.105	0.293	93%	-0.775	0.359	55%	Δ
Distrust US govt	-1.320	0.223	100%	-1.405	0.208	100%
Distrust faith leader	0.264	0.201	27%	0.338	0.205	42%
Distrust acquaintance	-0.404	0.222	44%	-0.709	0.212	82%	Δ
Distrust social media	0.062	0.180	7%	-0.111	0.193	22%
Male	0.257	0.165	35%	0.755	0.183	91%	Δ
Race (reference = White)
Black	-1.182	0.250	100%	-0.745	0.249	75%
Two or more	-0.613	0.603	21%	-0.817	0.847	29%
Other	-0.296	0.344	13%	-0.863	0.517	42%
Hispanic	-0.181	0.320	9%	-0.975	0.369	70%	Δ
Education (reference = less than high school)
High school	0.347	0.269	24%	-0.179	0.292	26%
Some college	0.449	0.301	31%	0.018	0.292	20%
College +	0.710	0.323	59%	0.440	0.362	30%	Δ
Income (reference = <10k)
10–29k	0.280	0.227	24%	0.006	0.257	17%
30–49k	0.101	0.255	7%	0.167	0.269	19%
50–99k	0.813	0.265	87%	-0.069	0.306	17%	Δ
100k +	0.917	0.419	57%	0.020	0.382	19%	Δ
Model 2 (Linear Regression): Outcome = Neighborhood satisfaction (1 = very dissatisfied, 7 = very satisfied)
Reputation	0.998	0.048	100%	0.947	0.062	100%
Walk unsafe	-0.993	0.123	100%	-0.937	0.147	100%
Owner	0.242	0.093	74%	0.026	0.095	17%	Δ
Access to computer	-0.252	0.124	54%	0.185	0.119	39%
Model 3 (Logistic Regression): Outcome = Personal homeownership (1 = yes, 0 = no; logistic regression)
Income (reference = <10k)
10–29k	0.992	0.277	97%	0.811	0.380	57%	Δ
30–49k	1.408	0.281	100%	1.745	0.383	99%
50–99k	1.992	0.292	100%	1.610	0.399	97%
100k +	1.897	0.355	100%	1.061	0.488	58%	Δ
Age	0.050	0.005	100%	0.054	0.006	100%
Male	0.044	0.176	6%	0.677	0.203	83%	Δ
Race (reference = White)
Black	-0.216	0.247	14%	0.363	0.345	24%
Two or more	-1.198	1.062	13%	0.015	0.950	24%
Other	0.200	0.485	6%	1.573	0.585	75%	Δ
Hispanic	-0.062	0.423	5%	0.621	0.538	33%

Note. ^aThe standard errors of the control data are pseudo SEs that are scaled down to account for the smaller sample size of the control data. The significances are based on the pseudo SEs.

We focus on comparing the % of Significance ⁸ based on the experimental data and the control data. To summarize the results, for a somewhat arbitrary threshold, we consider the two percentages to be “notably different” if the ratio between them is larger than 1.5 and the difference between them is larger than 30 percentage points. We use Δ to mark the notable differences in the rightmost column of Table 6. The Δ shows that the conclusions can be unstable depending on whether they are based on the adaptive design data or homogeneous design data. The unstable results are mostly from uncertain predictors with two exceptions:

Two intuitive predictors led to different conclusions based on the experimental and control data. One is the association between distrust of doctors and vaccine intention, and the other is between 100k+ income and homeownership status. In both cases, the coefficients based on the control data are smaller than those based on the experimental data (−0.775 and 1.061, as opposed to −1.105 and 1.897). Correspondingly, the control-data coefficients are less likely to reach significance than the experimental-data coefficients (55% and 58%, as opposed to 93% and 100%). Of course, we do not have benchmarks to know which set of coefficients is closer to the true values. But it is not unreasonable to expect these two associations to exist. Thus, we suggest that the performance of the adaptive design on these two predictors is better than the homogeneous design because the adaptive data capture the associations in a more stable manner.

Costs

The most obvious difference between the adaptive design and the homogeneous design is the amount of incentives. The cost of incentives per respondent is $26.7⁹ and $25 for the adaptive and the homogeneous design, respectively. To put these numbers in context, incentives accounted for roughly half of the total fielding cost. (However, the total fielding cost of DMACS is not generalizable to cross-sectional surveys because more than half of the respondents were established panelists and collecting data from established panelists is much cheaper than recruiting new cases.)

Besides incentives, we cannot quantify other sources of costs in monetary or nonmonetary units (e.g., hours) because the operations of the adaptive and homogeneous design were not separated. Nonetheless, we are aware that the adaptive design added to the total survey costs in a few ways. First, the adaptive design clearly increased the staff workload. The managers needed to carefully manage different versions of documentations and materials, one for each adaptive subgroup. Second, there is a cost in designing and translating multiple versions of invitation materials, though this is a relatively small part of the total costs. Third, printing several templates is more expensive than printing a single template because the smaller batches did not reach the threshold for discounted prices like the overall sample would. These factors are not unimportant for a small survey with a small staff.

Discussion

In the current literature, there is a debate about the added benefits of implementing responsive and adaptive recruitment (Brick and Tourangeau 2017; Särndal and Lundquist 2019; Schouten et al. 2016; Tourangeau et al. 2017). Responsive and adaptive design is worth the effort only if it improves survey estimates beyond the capacity of the standard practice of post-survey adjustment. Evidence of this discussion has been limited to theoretical and simulation studies (Brick and Tourangeau 2017; Särndal and Lundquist 2014, 2017, 2019; Schouten et al. 2016). The literature has called for extending the analysis and gathering empirical evidence from real surveys.

In this paper, we report the results of an experiment implemented in a real survey that evaluated the benefits of an adaptive design additional to post-survey adjustment. The analysis focused on five outcomes: response rates, sample composition, univariate survey estimates, conclusions drawn from multivariate analysis, and survey costs.

The adaptive design yielded slightly higher response rates in the panel subsample, obtained a more balanced distribution in education, captured some multivariate associations more stably and, most importantly, generated more efficient survey estimates with smaller variances and smaller design effects. But the adaptive design did not further reduce survey biases. Overall, at the cost of a more troublesome fielding, the performance of the adaptive design had a small edge over the homogeneous design.

We make a few remarks based on the current results. First, in terms of response rates, the adaptive design had a small benefit (though non-significant) in the panel sample but not in the refreshment sample. The increased response rate among the established panelists is consistent with the effect of tailored letters in the Understanding Society Innovation Panel (Lynn 2016). We do not have a clear explanation for why the adaptive design was not successful in the refreshment sample. One possibility is that the established panelists paid closer attention to the invitation materials because they are more interested and have established trust in DMACS. Since tailored materials affect responses only if people read the content, the materials may not be effective on a potentially disinterested population in the refreshment sample. This is only speculation. More research is needed to understand how applicable the technique of tailoring invitation materials can be for a panel sample and a new sample.

Second, the adaptive design reduced the variances but not the biases of the univariate survey estimates. Interestingly, such results echo a recent simulation study, which demonstrated that the benefit of adaptive design on variances is more robust than on biases (Zhang 2022). The simulation was motivated by a concern that adaptive designs are not perfectly designed and implemented. In suboptimal scenarios where input information is inadequate, adaptive design may lose its ability to reduce biases beyond post-survey adjustment, but may still lower variances. This reasoning provides a potential explanation for the current results. The adaptive design in DMACS was far from ideal. The strategy development was at the regional level because auxiliary information on the entire sample was thin. The limited input information might have curbed the benefit of the adaptive design on bias reduction.

Third, conclusions drawn from regression models are not always the same based on the data from the adaptive design and the homogeneous design. The models were set to include intuitive and uncertain predictors. For the intuitive predictors, data from both designs were generally able to identify the associations as significant, though the adaptive design captured some associations more stably than the homogeneous design. For the uncertain predictors, the regression results based on the two designs are different. Though we do not know which results are more accurate, the differences have interesting practical implications. Social scientists typically do not study associations that obviously exist (like the intuitive predictors) but instead investigate relationships that have a certain degree of uncertainty (mimicked by the uncertain predictors). The current data show that conclusions about uncertain relationships can be influenced by the recruitment design. This observation is not uncommon. Axinn and colleagues (2011) have documented that the differences in coefficients based on the regular and the responsive design data can be so large that the signs of the coefficients were reversed. Understanding how research conclusions can be substantively influenced by RASD is an important topic for future research.

Fourth, this research contributes to the literature by extending the application of adaptive design to a small-budget survey. Since small surveys often cannot afford expensive data collection methods to improve survey outcomes, developing cost-effective RASD techniques could make important contributions to improving the data quality of small surveys.

Fifth, since the benefits of the adaptive design are small compared to the homogeneous design, one may opt for the adaptive design if it is mostly cost neutral. The adaptive design raised the cost of incentives per respondent from $25 to $26.7 (a 6.8% increase) and it increased the workload of the staff. On the bright side, the additional costs in human resources may decrease once the survey establishes an infrastructure to implement the adaptive strategies. In fact, after this experiment, DMACS has continued using adaptive strategies to retain underrepresented panelists.

Sixth, the adaptive design was implemented with an error. One of the subgroups received the wrong reminder postcard due to a mistake that happened during the mail merge. Fortunately, 69% of respondents in that subgroup responded before the mistake occurred and thus we do not expect it to have a major impact on the results. However, this incident shows how adaptive design can complicate survey implementation and increase the cost of testing and quality control.

We note that the current study design and outcomes are limited by a few additional factors. The adaptive design bundled three strategies because the sample size was not large enough to support a full factorial design on the strategies. This bundled design was sufficient for our objective. We intended to assemble the best adaptive design (of course, under practical constraints) to evaluate whether the design could benefit survey estimates beyond post-survey adjustment. However, as a result, we were not able to tease apart the effect of each adaptive strategy on the panel and the refreshment sample. Future research with a larger sample can test different strategies individually. The results would be constructive for developing the technique of tailoring invitation materials.

Next, since the current adaptive design was implemented in the city of Detroit, the target population was relatively homogeneous. There is limited room for strategy development in a homogeneous population because adaptive strategies depend on distinctive features of sample subgroups. RASD may achieve better results in a heterogeneous population. It would be interesting for future research to extend the application to a national context, where the sample, for example, might be distinguished by rural and urban features.

Finally, the current research question was addressed by comparing the results of combining adaptive design and post-survey adjustment with those of post-survey adjustment alone. Answers to this question are influenced by the choice of the post-survey adjustment method. We used calibration adjustments. For other surveys, if additional steps like propensity-score nonresponse adjustment are included in the weighting procedure, we caution that the efficacy of post-survey adjustment may change, which would in turn influence the added benefits of adaptive design.

Despite these caveats, our experiment is among the first to provide real-survey evidence on the utility of adaptive design, while accounting for the effects of post-survey adjustment. We investigated the effect of adaptive design along multiple dimensions. The adaptive design produced modest improvements in data quality and estimates, but also raised costs slightly. These findings are an important first step for understanding the cost and benefit tradeoffs of implementing adaptive survey design.

Supplemental Material

sj-docx-1-smr-10.1177_00491241221099550 - Supplemental material for The Additional Effects of Adaptive Survey Design Beyond Post-Survey Adjustment: An Experimental Evaluation

Supplemental material, sj-docx-1-smr-10.1177_00491241221099550 for The Additional Effects of Adaptive Survey Design Beyond Post-Survey Adjustment: An Experimental Evaluation by Shiyu Zhang and James Wagner in Sociological Methods & Research

Footnotes

Acknowledgments

We thank the Detroit Metro Area Communities Study (DMACS) for supporting the current experiment. We are grateful to the DMACS team, Dr. Elisabeth Gerber, Dr. Jeffrey Morenoff, Sharon Sand, Caroline Egan, and Lydia Wileden.

Authors’ Notes

The wave 12 data of the Detroit Metro Area Communities Study are available at the Inter-university Consortium for Political and Social Research (ICPSR): .

The R program that was used to analyzed the data is available at: .

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Shiyu Zhang

James Wagner

Supplemental Material

Supplemental material for this article is available online.

Notes

Author Biographies

Shiyu Zhang PhD is a candidate in the Michigan Program in Survey and Data Science at the Survey Research Center at the University of Michigan. Her dissertation focuses on understanding the added effects of adaptive design beyond post-survey adjustment. Her recent publications include “Benefits of adaptive design under suboptimal scenarios: A simulation study” (Journal of Survey Statistics and Methodology, 2022) and “What parcel tax records tell us about homeownership measurement in surveys” (Survey Research Methods, in press).

James Wagner Ph.D. is a Research Professor at the University of Michigan's Survey Research Center. His research is in the area of nonresponse and methods for addressing it during data collection. In particular, he has focused on the use of responsive and adaptive designs for controlling nonresponse. He has also worked on statistical decision rules for supporting these types of designs. He is co-author of a book (2017) entitled Adaptive Survey Design. Another recent publication is “Applying Responsive Survey Design to Small-Scale Surveys: Campus Surveys of Sexual Misconduct” (Sociological Methods and Research, 2021).

References

American Association for Public Opinion Research. 2016. Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys, (Ninth Edition). Retrieved from: https://www.aapor.org/Standards-Ethics/Standard-Definitions-(1).aspx.

Andridge

R. R.

Little

R. J.

. 2011. “Proxy Pattern-Mixture Analysis for Survey Nonresponse.” Journal of Official Statistics 27(2):153.

Axinn

W. G.

Link

C. F.

Groves

R. M.

. 2011. “Responsive Survey Design, Demographic Data Collection, and Models of Demographic Behavior.” Demography 48(3):1127-49.

Beaumont

J. F.

Bocci

Haziza

. 2014. “An Adaptive Data Collection Procedure for Call Prioritization.” Journal of Official Statistics 30(4):607-21.

Bergmann

Scherpenzeel

. 2020. “Using Field Monitoring Strategies to Improve Panel Sample Representativeness: Application During Data Collection in the Survey of Health, Ageing and Retirement in Europe (SHARE).” Survey Methods: Insights from the Field. 10.13094/SMIF-2020-00003.

Brick

J. M.

2013. “Unit Nonresponse and Weighting Adjustments: A Critical Review.” Journal of Official Statistics 29(3):329-53.

Brick

J. M.

Kennedy

Flores Cervantes

Mercer

. 2021. “An Adaptive Mode Adjustment for Multimode Household Surveys. Journal of Survey Statistics and Methodology 00:1-24.

Brick

J. M.

Tourangeau

. 2017. “Responsive Survey Designs for Reducing Nonresponse Bias.” Journal of Official Statistics 33(3):735-52.

Burger

Perryck

Schouten

. 2017. “Robustness of Adaptive Survey Designs to Inaccuracy of Design Parameters.” Journal of Official Statistics 33(3):687-708.

10.

Christensen

A. I.

Lynn

Tolstrup

J. S.

. 2019. “Can Targeted Cover Letters Improve Participation in Health Surveys? Results from a Randomized Controlled Trial.” BMC Medical Research Methodology 19(1):1-8.

11.

Chun

A. Y.

Heeringa

S. G.

Schouten

. 2018. “Responsive and Adaptive Design for Survey Optimization.” Journal of Official Statistics 34(3):581-97.

12.

Coffey

Reist

Miller

P. V.

. 2019. “Interventions on-Call: Dynamic Adaptive Design in the2015 National Survey of College Graduates.” Journal of Survey Statistics and Methodology 8(4):726-47.

13.

Fumagalli

Laurie

Lynn

. 2013. “Experiments with Methods to Reduce Attrition in Longitudinal Surveys.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 176(2):499-519.

14.

Groves

R. M.

Heeringa

S. G.

. 2006. “Responsive Design for Household Surveys: Tools for Actively Controlling Survey Errors and Costs.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 169(3):439-57.

15.

Groves

R. M.

McGonagle

K. A.

. 2001. “A Theory-Guided Interviewer Training Protocol Regarding'survey Participation.” Journal of Official Statistics 17(2):249-65.

16.

Groves

R. M.

Singer

Corning

. 2000. “Leverage-saliency Theory of Survey Participation: Description and an Illustration.” Public Opinion Quarterly 64(3):299-308.

17.

Han

Montaquila

J. M.

Brick

J. M.

. 2013. “An Evaluation of Incentive Experiments in a two-Phase Address-Based Sample Mail Survey.” Survey Research Methods 7(3):207-18.

18.

Jackson

M. T.

Mcphee

C. B.

Lavrakas

P. J.

. 2020. “Using Response Propensity Modeling to Allocate Noncontingent Incentives in an Address-Based Sample: Evidence from a National Experiment.” Journal of Survey Statistics and Methodology 8(2):385-411.

19.

Kalton

Flores-Cervantes

. 2003. “Weighting Methods.” Journal of Official Statistics 19(2):81-97.

20.

Kish, L. 1995. Survey Sampling. New York: Wiley.

21.

Lavrakas

P. J.

Jackson

McPhee

. 2018. “The use of Response Propensity Modeling (RPM) for Allocating Differential Survey Recruitment Strategies: Purpose, Rationale, and Implementation.” Survey Practice 11(2). https://doi.org/10.29115/SP-2018-0023 .

22.

Link

M. W.

Burks

A. T.

. 2013. “Leveraging Auxiliary Data, Differential Incentives, and Survey Mode to Target Hard-to-Reach Groups in an Address-Based Sample Design.” Public Opinion Quarterly 77(3):696-713.

23.

Little

R. J.

Vartivarian

. 2005. “Does Weighting for Nonresponse Increase the Variance of Survey Means?” Survey Methodology 31(2):161-8.

24.

Little

R. J.

West

B. T.

Boonstra

P. S.

. 2020. “Measures of the Degree of Departure from Ignorable Sample Selection.” Journal of Survey Statistics and Methodology 8(5):932-64.

25.

Liu

Kuriakose

Cohen

Cho

. 2016. “Impact of web Survey Invitation Design on Survey Participation, Respondents, and Survey Responses.” Social Science Computer Review 34(5):631-44.

26.

Luiten

Schouten

. 2013. “Tailored Fieldwork Design to Increase Representative Household Survey Response: An Experiment in the Survey of Consumer Satisfaction.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 176(1):169-89.

27.

Lynn

2016. “Targeted Appeals for Participation in Letters to Panel Survey Members.” Public Opinion Quarterly 80(3):771-82.

28.

Lynn

2017. “From Standardised to Targeted Survey Procedures for Tackling non-Response and Attrition.” Survey Research Methods 11(1):93-103.

29.

Mercer

Caporaso

Cantor

Townsend

. 2015. “How Much Gets you how Much? Monetary Incentives and Response Rates in Household Surveys.” Public Opinion Quarterly 79(1):105-29.

30.

Murphy

Biemer

Berry

. 2018. “Transitioning a Survey to Self-Administration Using Adaptive, Responsive, and Tailored (ART) Design Principles and Data Visualization.” Journal of Official Statistics 34(3):625-48.

31.

Olson

K. M.

Wagner

Anderson

. 2020. “Survey Costs: Where Are We and What Is the Way Forward?.” Journal of Survey Statistics and Methodology 9(5):921-42.

32.

Peytchev

2013. “Consequences of Survey Nonresponse.” The ANNALS of the American Academy of Political and Social Science 645(1):88-111.

33.

Peytchev

Pratt

Duprey

. 2020. “Responsive and Adaptive Survey Design: Use of Bias Propensity During Data Collection to Reduce Nonresponse Bias.” Journal of Survey Statistics and Methodology. 10(1): 131-48.

34.

Rosen

J. A.

Murphy

Peytchev

Holder

Dever

Herget

Pratt

. 2014. “Prioritizing low Propensity Sample Members in a Survey: Implications for Nonresponse Bias.” Survey Practice 7(1):1-8.

35.

Särndal

C.-E.

Lundquist

. 2014. “Accuracy in Estimation with Nonresponse: A Function of Degree of Imbalance and Degree of Explanation.” Journal of Survey Statistics and Methodology 2(4):361-87.

36.

Särndal

C.-E.

Lundquist

. 2017. “Inconsistent Regression and Nonresponse Bias: Exploring Their Relationship as a Function of Response Imbalance.” Journal of Official Statistics 33(3):709-34.

37.

Särndal

C.-E.

Lundquist

. 2019. “An Assessment of Accuracy Improvement by Adaptive Survey Design.” Survey Methodology 45(2):317-37.

38.

Sauerbrei

Schumacher

. 1992. “A Bootstrap Resampling Procedure for Model Building: Application to the Cox Regression Model.” Statistics in Medicine 11(16):2093-109.

39.

Schouten

Cobben

Bethlehem

. 2009. “Indicators for the Representativeness of Survey Response.” Survey Methodology 35(1):101-13.

40.

Schouten

Cobben

Lundquist

Wagner

. 2016. “Does More Balanced Survey Responseimply Less non-Response Bias?” Journal of the Royal Statistical Society: Series A (Statisticsin Society) 179(3):727-48.

41.

Schouten

Peytchev

Wagner

. 2017. Adaptive Survey Design. Boca Raton, FL: CRC Press.

42.

Singer

Groves

R. M.

Corning

A. D.

. 1999. “Differential Incentives: Beliefs About Practices, Perceptions of Equity, and Effects on Survey Participation.” Public Opinion Quarterly 63(2):251-60.

43.

Singer

. 2013. “The use and Effects of Incentives in Surveys.” The ANNALS of the American Academy of Political and Social Science 645(1):112-41.

44.

Tourangeau

Michael Brick

Lohr

. 2017. “Adaptive and Responsive Survey Designs: A Review and Assessment.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 180(1):203-23.

45.

van Berkel

van der Doef

Schouten

. 2020. “Implementing Adaptive Survey Design With an Application to the Dutch Health Survey.” Journal of Official Statistics 36(3):609-29.

46.

Wagner

2008. Adaptive survey design to reduce nonresponse bias. PhD thesis, University of Michigan, Ann Arbor, USA.

47.

Wagner

West

B. T.

Kirgis

Lepkowski

J. M.

Axinn

W. G.

Kruger-Ndiaye

S. K.

. 2012. “Use of Paradata in a Responsive Design Framework to Manage a Field Data Collection.” Journal of Official Statistics 28(4):477-99.

48.

West, B. T., W. Chang, and A. Zmich. 2021. “An Experimental Evaluation of Alternative Methods for Case Prioritization in Responsive Survey Design.” Journal of Survey Statistics and Methodology 00:1-22.

49.

Zhang, S. 2022. “Benefits of Adaptive Design Under Suboptimal Scenarios: A Simulation Study.” Journal of Survey Statistics and Methodology 00:1-31.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

8.89 MB