Abstract
Responsive survey design is a technique aimed at improving the efficiency or quality of surveys by using incoming data from the field to make design changes. The technique was pioneered on large national surveys, but the tools can also be applied on the smaller-scale surveys most commonly used by sociologists. We demonstrate responsive survey design in a small-scale, list-based sample survey of students on the topic of sexual misconduct. We investigate the impact of individual incentive levels and a two-phase responsive design with changes to mode of contact as approaches for limiting the potential of nonresponse bias in data from such surveys. Our analyses demonstrate that a two-phase design introducing telephone and face-to-face reminders to complete the survey can produce stronger change in response rates and characteristics of those who respond than higher incentive levels. These findings offer tools for sociologists designing smaller-scale surveys of special populations or sensitive topics.
Responsive survey design is a methodological advance that optimizes survey data collection during the fieldwork in order to produce the best data possible. These tools were pioneered on large-scale national surveys—the high costs of such surveys motivated investments into scientific advances in the tools for maximizing data quality within fixed budgets. Particularly important, the growing reluctance of the general population to participate in surveys has led to steady declines in response rates, increasing the risk that survey results do not fully represent the population. Most sociologists, however, conduct surveys on a smaller scale, including regional or local surveys, surveys of special populations, surveys of organizations, and surveys of students. This article describes the application of responsive survey design principles to such small-scale studies, 1 provides a demonstration of the method, and shows how responsive survey design can be used to significantly change our understanding of important sociological topics.
The specific application we use is a small-scale, list-based sample survey of sexual misconduct among college students to illustrate the importance of this method for high-priority sociological topics. Many sensitive topics are the substantive focus of sociological research—research on sexual misconduct and assault typifies such research. Likewise, student surveys have been, and continue to be, an important element of sociological research on a wide range of topics, including sexual behaviors; drinking and substance use; mental health; and diversity, equity, and inclusion (Cohen 2018; Guo et al. 2015; McClintock 2010; Smith and Moore 2000; Uecker 2015). Moreover, many sociological studies targeted to important subpopulations use list-based sampling approaches similar to the student survey approach. Thus, application of the responsive design method on a campus survey of sexual assault serves to illustrate the wide-ranging applicability of responsive survey design in sociological research.
Conceptual Framework
To both provide important background for the specific case we investigate and to place this case in a broader context that highlights its wide-ranging relevance, we begin with a conceptual framework. This framework first explains the theoretical foundations of responsive survey design methods, then the context of campus surveys of sexual misconduct, and finally issues surrounding the conceptualization of controlling nonresponse bias for such surveys. The control of potential sources of bias is a focal issue facing all such surveys and the fundamental motivation for the methods we discuss.
Responsive Survey Design
As noted above, a key motivation for the creation and increasing application of responsive survey design is the increasing risk that survey results do not fully represent the population because of a growing reluctance to participate in surveys. Numerous studies and reports document steadily declining response rates in face-to-face surveys (Peytchev 2013; Williams and Brick 2018). Response rates to telephone surveys are much lower and have been declining at precipitous rates (Brick and Williams 2013; Curtin, Presser, and Singer 2005; Dutwin and Lavrakas 2016). The growing ability to use web surveys widely—because of expanding internet use in the general population (Couper et al. 2018), combined with the low cost of web surveys—has made this mode highly desirable. However, just as with telephone surveys, response rates for web surveys remain significantly lower than face-to-face surveys (Manfreda et al. 2008; Shih and Fan 2008). Low response rates increase the potential for nonresponse bias to characterize survey results, making control of that bias a high priority for surveys across modes.
The theoretical foundation of responsive survey design rests upon a social science driven understanding of population heterogeneity in the reasons for reluctance to participate in surveys. Early research noted the population heterogeneity in the motivations to participate in surveys (Groves, Cialdini, and Couper 1992). Groves, Singer, and Corning (2000) provided a theoretical underpinning for this approach, which they described as leverage-saliency theory. Under this theory, sampled persons have different “leverages,” that is, different aspects of the survey design that may appeal to them (or repel them) more strongly than other sampled persons. The survey, for its part, may make specific features of the design more “salient” to different persons. A key task of survey design is to match the salient design feature to the person for whom that feature has a stronger leverage. In a responsive survey design, the leverages of sampled persons are discovered through a series of interactions with them. This requires that surveys be conducted in a series of complementary phases. The idea is that this complementarity of design features should produce a balanced response as these phases appeal to different subgroups who have different leverages.
These multiple phases may involve experimentation in early phases in order to narrow the range of protocols supplied in later phases. Then, in later phases, the design takes on the following four-step procedure (Groves and Heeringa 2006). First, risks associated with survey errors, such as nonresponse bias, are identified. Second, measurable indicators of this risk are then developed for each risk. For example, an indicator for the risk of nonresponse bias might be the stabilization of survey estimates. This is an indication that the current design has reached its full capacity. Third, a change in design is planned for the point at which the indicator has reached a specified level. This change might involve a change in incentives, modes, or any other design feature. If the change involves a much more costly protocol (e.g., doubling the incentive or introducing face-to-face contact attempts to a telephone survey), then it may be cost-effective to select a subsample of nonrespondents to the current phase for follow up in the next phase. Known as “two-phase” sampling, this is a probability-based procedure for ensuring fully representative sampling (Hansen and Hurwitz 1946). In the fourth stage, data from the different phases are combined to produce a single estimate.
Prior research on the associations between a two-phase responsive survey design and the estimates produced from a survey demonstrate that a change in protocol in the second phase can produce significantly different survey estimates (Axinn, Link, and Groves 2011; Peytchev, Baxter, and Carley-Baxter 2009). This advance provides an important tool to assess the potential consequences of nonresponse bias in surveys and provides robust statistical estimation of more population-representative information from surveys. However, thus far, applications of responsive survey design have been limited in important ways. Most involve large national studies. Many focus on expensive face-to-face surveys (Axinn et al. 2011; Groves and Heeringa 2006; Wagner et al. 2012), some on less expensive telephone surveys (Peytchev et al. 2009) or mixed-mode web-telephone surveys (Finamore, Reist, and Coffey 2013), and none on small-scale web surveys. Many of the alternative protocols do not translate easily to small-scale surveys. Changing the incentive is an option that has been used (DeBell et al. 2017; Wagner et al. 2012) but may be beyond the budget for many small-scale surveys. Other surveys have attempted to prioritize cases for additional effort (Finamore et al. 2013; Peytchev et al. 2009; Wagner et al. 2012). However, for a web survey, additional email or text invitations are essentially no cost (or no marginal cost), so that prioritization does not seem to be helpful. Using multiple modes for interviewing is another approach that might be a useful feature to change in a responsive design (Biemer et al. 2018; Finamore et al. 2013). However, it is often expensive to do so since there are large fixed costs for setting up interviewing in another mode, and mode changes may introduce measurement error, especially for sensitive topics. Changing the mode of contact through which an invitation to participate is delivered might be a helpful change but is usually not a very powerful design change in terms of promoting response (Millar and Dillman 2011) because this usually only involves combinations of email, text messaging, and postal mail. In inexpensive small-scale web-survey designs, adding a protocol change with a more expensive mode of contact is understudied. As a result, advances in responsive survey design for small-scale surveys with inexpensive modes of data collection, such as web surveys and phased protocol changes in modes of contact, are each a high scientific priority.
Campus Surveys of Sexual Misconduct
National attention to the high prevalence of sexual assault in the United States and the federal government’s investigation of sexual assaults on university campuses stimulated a recent round of widespread discussion about the collection of data on sexual assault and misconduct from students (White House Council on Women and Girls 2014). This discussion includes the possibility of mandating such surveys, a policy some states have already adopted. Some surveys of college students invite all students to participate (Cantor et al. 2015). Other surveys, including the one we report here, invite participation from individual students who have been randomly sampled to represent the full campus population. Some campus surveys limit participation to a single opportunity, while others issue an open invitation that allows the same students to participate in the survey multiple times. Such an “open invitation” approach prevents the possibility of assessing the representation of the target population because a single person can be represented many times in the resulting measures.
Because experience of sexual misconduct is believed to be widely underreported (National Research Council 2014), both the research community and administrators of higher education are concerned about representing the full breadth of sexual misconduct experiences on campus. Surveys on many topics already use some standard methods for improving participation in campus surveys: encouragement from authoritative sources, prenotification letters, lottery style incentives, and web surveys optimized for multiple devices. Individualized incentives can improve participation rates, but they are also expensive, creating a need for more information on the optimal level of incentive. Increasing incentives tends to increase participation, but the returns are diminishing (Mercer et al. 2015). Randomization of individual incentives provides the strongest possible assessment of the data quality consequences of different incentive amounts. A recent incentive experiment on a survey compared US$10, 2 $25, and $40 incentives (Krebs et al. 2016). As expected, the higher incentives had diminishing returns such that investigators suggested an incentive amount between $20 and $30. However, it is worth noting that the impact of these incentives varied across the schools (Krebs et al. 2016:182). They also found that the experimental arm with higher incentives ($25 vs. $10) did produce lower estimates of sexual assault among females.
However, consistent with the evolution of theories of respondent reluctance to participate in surveys, financial incentives are not the only tool with the potential to control nonresponse bias in campus surveys of sexual misconduct. Though these surveys are nearly all conducted as web surveys—which enhance privacy and control costs—the costs of more expensive modes can be controlled by integrating these options into a classic two-phase responsive survey design (Axinn et al. 2011; Groves and Heeringa 2006; Hansen and Hurwitz 1946). This is particularly important for controlling potential nonresponse biases in student surveys because failure of the survey to reach the selected respondent at all is a key threat to the representative qualities of these surveys. Web surveys are frequently launched with email invitations. Failures to accurately deliver the email to the respondent, respondent delays in reading their email, and respondent deletion of email without opening the link to the survey each cumulate to add sources of nonresponse unlikely to be randomly distributed in the target population. This nonrandom response can be correlated with the subject matter of interest. For example, in general, students who are less engaged in campus life are also likely to be less engaged in reading or responding to email from their school. If victims of sexual misconduct on campus become more withdrawn from campus life, survey estimates dependent on email contact may underrepresent those experiences. Likewise, if the introduction of the topic seems irrelevant to the selected students, either because they do not consider themselves targets of the survey request or because they have no exposure to the topic of the survey, these students may also be systematically underrepresented.
By contrast, more expensive modes of invitation, such as phone call reminders to participate in a survey, create the opportunity to contact students who did not receive, open, or read their email. Such calls also allow a trained professional interviewer to interact with the sampled person, hear their concerns, and directly address them. In this way, while not conducting the interview itself, the interviewer performs their usual tasks related to recruitment. For example, this might include explaining why the specific individual is eligible to participate and invited to complete the survey. In the context of a web survey, this change to include telephone contact attempts has a high likelihood of motivating a significantly different portion of the student population to participate in a campus student survey.
Our study investigates the impact of two key design features on participation in a campus student survey of sexual misconduct. The design features are incentive amount and the sequential deployment of multiple modes of recruitment to a web-based survey. These features were deployed across phases of a survey in a responsive survey design. Students were randomized to receive either $15 or $30 as tokens of appreciation for completing the survey. We examine response rate differences across these two incentive levels. We assess differences in the composition of students responding to the two different incentive levels, as well as measurement differences in reported experiences of sexual misconduct. Together, these analyses provide a comprehensive picture of the representation and substantive consequences of using higher individualized incentives in campus student surveys related to sexual misconduct.
We also assess the use of sequential mixed-mode recruitment. Our survey design uses web-based data collection in the first phase following normal best practices, and then adds interviewer-assisted (telephone and face-to-face) recruitment to web-based interviewing in the second phase to increase participation in the survey, while keeping the mode of measurement constant. We also had the option of selecting a single incentive amount for the second phase based on the results of the experiment conducted in the first phase. We examine response rate differences across these two phases, assess differences in the composition of students responding to the two different phases, and assess measurement differences in reported experiences of sexual misconduct by phase. We also compare the costs of adding the second phase design to the costs of increasing the individual incentives. Together, these results provide sociologists designing small-scale surveys evidence of the consequences of investing in these additional tools to improve survey performance.
Limiting Potential Nonresponse Bias in Surveys of Sensitive Topics
Surveys also hold strong promise for the construction of measures that scientifically represent the full variability in the target population—such as the population of students at a specific campus. However, nonparticipation in surveys (survey nonresponse) could bias our understanding of the true extent of sexual assault. Confidential surveys may increase reporting relative to administrative records (reduction in measurement error), but if surveys systematically fail to reach nonrandom subpopulations, the nonresponse bias creates a different source of error. Moreover, the direction of this potential nonresponse bias is unclear. On the one hand, as hypothesized for official reports to authorities, it is possible that some victims will choose not to respond to a survey. On the other hand, it is also possible that those who are not sexually active, or who have never experienced assault or harassment, will be less motivated to participate in the survey. In order to understand the full range of student experience with sexual misconduct or similar sensitive topics, survey protocols that prioritize equally high response rates across students of all types have an important advantage.
Nevertheless, such surveys will not generate data for those who do not respond to the survey. Instead, it is possible to evaluate differences between responders and nonresponders with respect to sexual assault by comparing responders and nonresponders with respect to the characteristics associated with these experiences that are available on the full sample (e.g., gender, race, ethnicity). Additionally, as is common practice in evaluation of alternative survey protocols, investigators can examine changes in estimates due to different experimental treatments or phases of design.
Motivations to participate in a survey vary. Limiting the risk of nonresponse bias is achieved by offering protocols that are effective at recruiting students covering the full spectrum of motivations. For some students, the topic may be salient. For other students, this is not the case. For this latter group, an incentive may be necessary to motivate participation (Groves, Presser, and Dipko 2004; Roose, Lievens, and Wiege 2007). Some students may be willing to participate when invited by email. Other students may need the extra explanation or motivation that contact with an interviewer can provide. These interactions with interviewers allow the survey to address the issues of most concern to each sampled person.
Nonresponse bias results when those who do not participate in the study have opinions or experiences that are systematically different from those who do participate. If there is no difference, then nonresponse does not introduce bias (although it still reduces the effective sample size). For example, if students for whom the topic is salient also have different experiences relative to campus climate, then a survey that does not offer an incentive may produce different results. We aim to shape the response process to maximize the probability that all types of students will participate. Using all information available, our survey design attempts to generate balanced response rates by year in school, residence on/off campus, gender, minority status, and other characteristics of the target student population. Recent simulation studies using real survey data (Schouten et al. 2016) and empirical gold standard studies (Lundquist and Sarndal 2013) have found that this type of balancing leads to reduction in biases of estimates adjusted for differential nonresponse.
Surveys of individual college students have suffered from declining response rates (Jans and Roman 2007), with recent large national multi-institutional studies obtaining response rates around 30 percent (Dugan, Turman, and Torrez 2015; Krebs et al. 2016; Sarraf and Cole 2014). Krebs et al. (2016) conducted a similar survey at several schools and produced average response rates of 54 percent for females and 40 percent for males. These response rates, for surveys with the same design, did vary across the nine schools—ranging from 43 percent to 71 percent for females and from 30 percent to 60 percent for males.
The standard survey implementation for many web-based college student surveys involves communications distributed only through email, several reminder emails, sometimes involves an overall sweepstakes incentive, and often some local publicity around the study. A driving factor in these designs is cost—with many multi-institutional college student studies costing participating schools well under $10,000 to conduct. Evidence is building to guide survey researchers in approaches to improve overall quality—and much of it is not a surprise. Better questionnaires (content and presentation) and improved delivery methods (contact modes, invitation design, use of prenotifications and reminders, and incentives) are emerging as areas of focus (Couper 2008; Fan and Yan 2010; Tourangeau, Conrad, and Couper 2013). Whether more costly methods will yield improved estimates is an open question for surveys of this topic.
Method
We describe an approach designed to study sensitive sexual misconduct–related attitudes and experiences among a representative sample of students at the University of Michigan. Given the large student population at the University of Michigan, this study used a sample survey approach rather than a census of all students. A randomly selected sample, generated using the University Registrar list of enrolled students (both graduate and undergraduate) as the sample frame, allowed us to make inferences to the population as a whole. The student population of the university was defined as the population enrolled on January 6, 2015, which was the first day of the winter term. The sample was selected from two strata, graduate and undergraduate students, in order to ensure the proportionate representation of both student types. The same sampling rate was used in each stratum, producing a sample of 3,000 students. This sample included 1,005 (33.5 percent) graduate and professional students and 1,995 (66.5 percent) undergraduate students.
The approach prioritized as broadly a representative group of students as possible. This began by simultaneously employing known best practices: limiting the total length of the survey to average less than 15 minutes to minimize the burden on each student; considering the academic calendar when creating the data collection schedule; mailing a hard-copy invitation letter (prenotification of the email invitation) that included survey log-in instructions, 3 followed by an invitation email and up to four reminder emails sent to nonrespondents (Bandilla, Couper, and Kaczmirek 2014); and offering a sweepstakes-style incentive (a 1 in 300 opportunity to win a $100 gift card) to all students who were invited to participate.
To this, we experimentally added a substantial individual cash token of appreciation ($15 or $30), conditional upon completing the survey. In general, individual incentives are motivated by the same general principle of maximizing individual incentive-to-burden ratio to maximize individual participation (Groves and Couper 1998; Ryu, Couper, and Marans 2006; Singer and Couper 2008). However, for any specific topic or task, it is difficult to know the level of incentive that will maximize participation in the survey. Because of this unknown, we randomly assigned selected students to either a $15 individual incentive (two third of the sample) or a $30 individual incentive (one third of the sample). This experiment was conducted in phase 1 with the goal of determining which incentive would be more effective in phase 2 as well as in future similar studies.
Two-phase Design
The fieldwork for this survey was designed to follow a two-phase sampling design approach (Groves and Heeringa 2006; Hansen and Hurwitz 1946), explicitly designed so that the performance of the survey in phase 1 can be used to inform crucial design decisions in phase 2. For example, the individual incentive experiment described above was explicitly designed to inform the phase 2 design. If the experimental group offered $30 to participate in the survey at much higher rates than the experimental group offered $15, the design supports raising the phase 2 incentive to $30 for all students selected to participate in the study. Many options for adjusting the protocol for phase 2 were considered in advance, including options for additional mailings, interviewer contact via phone or face-to-face visits to encourage web-based participation, and subgroup-specific targeting of protocol changes. Subgroup-specific targeting is used in large national surveys to improve sample balance across high-priority subgroups (Wagner et al. 2012).
Decisions about protocol changes in phase 2 require careful monitoring of survey performance in phase 1. This is best accomplished by creating monitoring tools in advance that identify the key priorities for survey performance and use survey outcomes in phase 1 to establish performance metrics (Groves and Heeringa 2006; Wagner et al. 2012). In this specific example, the highest priority was a high response rate, targeting over 50 percent of students responding across all subgroups. The web-based data collection in the first phase supported daily monitoring of the response rate performance during phase 1 of this survey. Our template used sample frame measures of student characteristics to assess response rates across each of the key subgroups established by gender, race/ethnicity, undergraduate/graduate student, and on-campus/off-campus residence. As explained below and shown in Table 1, the total phase 1 response rate for this survey was over 50 percent. However, important subgroup response rates were not over 50 percent by the end of phase 1.
Response Rate by Incentive Level and Phase.
a Among those who did not respond in phase 1 and were selected as eligible for phase 2.
b This response rate conforms to the American Association for Public Opinion Research (AAPOR) Response Rate 2 (weighted). This rate includes partial interviews and interviews in the numerator. See the “AAPOR Standard Definitions” for a complete definition of how this response rate is calculated (AAPOR 2016).
All options were considered to improve response rates using a different protocol in phase 2. As shown in Table 1, phase 1 demonstrated no significant improvement in response rates with the higher incentive. Based on this result, we chose to broaden the modes of contact in phase 2—contacting respondents by phone, face-to-face, or hand-delivered mail, all to encourage participation in data collection via the web mode (interviewer-assisted, web-based interviewing).
Due to the expected high costs of telephone and face-to-face contact attempts, phase 2 of the survey chose a random sample of the nonrespondents who remained at the close of phase 1. Trained and experienced survey interviewers attempted to contact nonrespondents and encouraged them to participate in the survey. To avoid potential measurement bias from interviewer administration (e.g., Chapter 5 in Groves et al. 2004; Kreuter, Presser, and Tourangeau 2008; Tourangeau and Yan 2007), interviewers did not administer the questionnaire. Instead, they encouraged sampled students to complete the web survey. Because the second phase involved telephone and face-to-face contact, sampling for this phase was stratified by on-campus versus off-campus residential address and whether or not a telephone number was available. Very few on-campus cases did not have a telephone number, producing three strata: (1) off campus, no telephone number available, (2) off campus, telephone number available, and (3) on campus. The sampling rates of 0.333, 0.6, and 0.6, respectively, were selected to minimize costs because the face-to-face contacts (the only option for the first stratum) were more expensive. The inverse of these selection rates was used as a selection weight.
Interviewers contacted nonrespondents who were selected into the second phase by phone to encourage their participation, and they sent a follow-up email with log-in instructions when potential respondents requested. Interviewers also visited nonrespondents’ places of local residence with tablet computers—students could use these devices to complete the web survey during the visit. For students living in university housing and consistent with university policy not allowing unescorted interviewers inside university housing buildings, the study team delivered reminder letters to housing staff in sealed envelopes who delivered these reminders to nonrespondents’ mailboxes. These letters provided interviewer contact information for any potential respondents who preferred interviewer assistance.
Finally, key characteristics of the student population were recorded as of the date of sample selection. These data are based on registrar records. These population totals were later used to create adjustment weights for the respondents after the data collection.
Results
Response Rate Outcomes
By systematically applying the strategies described above, this survey achieved a relatively high overall response rate (67 percent 4 ), meaning that the majority of those invited answered the questions asked. The response rate did vary across subgroups within the population invited (see Table 2). For example, looking at Table 2, of those living on-campus 75 percent responded, and of those living off-campus 64 percent responded. This, combined with other differences shown in Table 2, demonstrates that response rates were not equal across all groups of the student population. This raises the concern that the respondents might also be a selective subset of the population with respect to the survey measures.
Response Rate by Subgroup.
The high overall response rate is a product of combining each of the steps described above, but two of those elements are particularly expensive: the individual incentives and the work of trained interviewers. In Table 1, we document the response rate differences by individual incentive level and phase.
Although the final response rate is slightly higher in the group offered a $30 incentive than in the group offered $15, this difference was not statistically significant. By contrast, the overall response rate after phase 2 was significantly higher than the response rate after phase 1 (compare rates in column 1 to those in column 3 of Table 1). The first phase of data collection achieved a response rate of 54 percent overall, yielding 1,676 completed interviews. Among those who did not respond in phase 1 but were followed up in phase 2, 29 percent responded. This second phase added 215 students to the study and raised the weighted response rate by 13 percentage points. The investment in trained interviewer effort in a two-phase design appears to be substantially more effective at increasing response rates relative to the increased incentive.
Also important, the costs of these two options for increasing response rates are similar. Of the 3,000 students who were selected for the survey, 1,891 completed it. Had all 1,891 participants been paid a $30 incentive upon completion rather than $15, the survey would have cost about $30,000 more than offering students a $15 incentive to complete the survey. By contrast, it cost about $15,000 to provide the 350–375 hours of interviewer effort to conduct the phase 2 operation as implemented in this study. 5 Thus, a student population survey that uses a $15 incentive and professional interviewer effort in a two-phase responsive survey design is expected to cost less and yield a higher response rate than the same survey using a $30 incentive and no interviewer follow-up.
Because the second phase design was not conducted as an experiment, it is impossible to know precisely what the response rate would have been had the design been more limited. We can say, however, that if we had stopped the study after approximately three weeks and had not employed interviewers for the follow-up effort, the study would have produced a response rate approximately 13 percentage points lower (54 percent rather than 67 percent).
Note that optimizing incentive levels to specific surveys, topics, and subpopulations is complex. Generally, the impact of incentives on survey participation is not linear, so it is possible to find an optimal incentive amount (Mercer et al. 2015). Other design features may interact with the incentive amount. For example, a small incentive may be less effective with a longer survey than a shorter survey. In many situations, experimentation may be necessary, as the unique combination of survey design features make it difficult to translate published research findings from one context to another. Further, populations are heterogeneous, so target populations respond to incentives in heterogeneous ways. Research on responses to incentives demonstrates that incentives have differential effects on some subgroups (Singer and Ye 2013). Responsive design builds on that premise. The leverage-saliency model is the underlying theory: Different people are impacted differently by various design features (Groves et al. 2000). Responsive design aims to uncover this uncertainty with experiments in early phases and then use that information to optimize later phases. In phase 1 of this study, we found that $15 worked as well as $30 for University of Michigan students. In phase 2, we did not increase the dollar amounts of incentives already offered to students.
Differences Between Phase 1 and Phase 2 Respondents
The addition of a second phase effort in this study not only increased response rates, it also changed the composition of students participating in the study. There are important reasons to expect this outcome.
The specific change between phase 1 and phase 2 of this survey—use of telephone calls and face-to-face visits by trained interviewers—is designed to improve contact with potential respondents, address their concerns about participating in the study, and encourage them to participate. Thus, we would expect the phase 2 protocol to disproportionately add respondents who either did not read the email invitations or who were otherwise reluctant to participate. The phase 2 protocol was not designed to increase the incentive to participate, but contact with an interviewer communicates the importance and legitimacy of the study and provides the opportunity to address respondent questions or concerns, reducing reluctance. Because the phase 2 protocol changed the form of contact to phone calls and personal visits from professional interviewers, this second phase of our responsive survey design is most likely to add respondents from subgroups who are generally reluctant to participate in online surveys or groups in which the new form of contact increases their awareness about the survey or addressed their reluctance to participate (Axinn et al. 2011). Thus, phase 2 should yield higher response among subgroups least responsive to email contact.
Among those who are otherwise reluctant to respond, the higher incentive should increase response. In general, we know that those who find the topic of a survey of personal interest are more likely to participate in a survey (Groves and Couper 1998; Groves et al. 2006; Groves et al. 2004). This survey was introduced as a survey on the topic of campus climate regarding sexual misconduct. Those selected who are either not interested in the topic of sexual misconduct or not sexually active may be those most likely to respond positively to a higher incentive.
Here, we compare key characteristics of the target population with the sample who responded during phase 1 to those who responded during phase 2. We also compare population characteristics across those who received the $15 incentive or the $30 incentive to participate. Our results are presented in Table 3.
Change in Representation of Specific Types of Students by Phase and Incentive.
Note: */+ Indicate a statistically significant change in representation. Bold identifies the comparison that is significantly different.
a Each respondent may choose all that apply.
+ p < .10. *p < .05. **p < .01. ***p < .001 (one-tailed tests).
The first two columns of Table 3 focus on distributions of selected respondent characteristics for the two incentive conditions (including those interviewed during phase 1 and phase 2 under each incentive treatment). For example, male students are 47 percent of respondents under the $15 treatment and 46 percent under the $30 treatment. The third and fourth columns show the distribution of these same characteristics for respondents to phase 1 and phase 2. Males constituted 44 percent of respondents in phase 1 and 57 percent in phase 2. The “Combined Response” column shows the distribution of the cumulative weighted responses for phase 1 and phase 2, as well as the combined incentive groups. The last column shows the population proportions (available only for characteristics recorded by the University Registrar). In the combined set of surveys, 47 percent of respondents were male while 52 percent of the student population were male. The results demonstrate that several subgroups of the student population responded to the phase 2 protocol at higher rates than others. Phase 2 increased the proportion of males and African Americans among the responders (statistically significant differences across phases). Both groups had responded at rates lower than the population values during phase 1 (compare population totals in the final column with phase 1 column). Phase 2 also increased the proportion of graduate and professional students among the responders, which increased their overrepresentation relative to undergraduate students. The “Combined” column of results is generally closer to the population total column than the “Phase 1” column. This is true for three of the four dimensions. In one dimension—enrollment as a graduate or professional student—the protocol increased imbalance by increasing participation of the overrepresented group. Together, this means that even though many in these groups did respond to the initial email protocol, the representation of each of these groups was significantly changed by switching to telephone and face-to-face contact from professional interviewers.
We found few statistically significant differences by incentive level; however, one significant difference deserves attention. The higher incentive recruited relatively more persons who were not sexually active in the past 12 months (p < .10). Because this survey was introduced as related to the topic of sexual misconduct, the theory of survey response predicts that students who are not sexually active will be more reluctant to participate because the topic is less salient to them (Groves and Couper 1998; Groves et al. 2004). It is no surprise to find that the higher incentive level increases participation among those students who have not had sexual experience within the prior 12 months. Similar results were obtained by Berzofsky et al. (2019).
The final set of subgroups in Table 3 reflect memberships in various student organizations: fraternities and sororities, varsity and club sports teams, the marching band, and ROTC. The majority of University of Michigan students are not members of any of these special groups, and those students who were not members of any of these special groups responded at a significantly higher rate to the phase 2 protocol. Members of each of these groups have an enhanced engagement with the University which may increase their attention to email requests from University officials. The telephone and personal contact significantly improved participation among those students who do not have these additional relationships to the University.
Measures of Sexual Misconduct
The 2015 University of Michigan survey was specifically designed to provide careful and comprehensive measurement of three types of sexual misconduct: sexual assault involving touching, kissing, fondling, and/or other acts but not penetration; sexual assault involving penetration (oral, vaginal, or anal penetration); and sexual harassment.
Specific words used to define each of these types of sexual misconduct in the survey are provided in the Online Appendix (which can be found at http://smr.sagepub.com/supplemental/). In our last set of analyses, we assess the impact of both the incentive level and the two-phase survey design on the key summary measures of each of these types of sexual misconduct. The differential incentive created no statistically significant difference in the overall summary measures of these three types of sexual misconduct. The two-phase design did produce significant differences across phases, but the small size of phase 2 in this study may have prevented those differences from generating significant changes in the final statistics. Also important, current best practices for survey nonresponse use postsurvey adjustments to attempt to correct for nonresponse bias due to under representation of key subgroups that can be identified in the initial sample frame. In this case, gender, race/ethnicity, undergraduate/graduate, and on campus/off residence are all available on the original frame used for this list-based sample—an important advantage to list-based samples—and can be used for postsurvey adjustments. Therefore, we also compare the combined phase 1 plus phase 2 statistics to the application of postsurvey weighting for subgroup response rate differences to the phase 1 results only. These differences and final estimates are presented in Table 4.
Differences in Key Statistics by Phase.
Note: */+ Indicate a statistically significant change in representation. Bold identifies the comparison that is significantly different.
*p < .05 (one-tailed tests).
The results in Table 4 show that those who responded in phase 2 of the survey have lower rates of reporting all three forms of sexual misconduct. However, only the difference in the estimated rate of sexual harassment between the two phases is statistically significant. Those who responded in phase 2 were characterized by a 6-percentage point (approximately a 25 percent less) lower rate of experiencing sexual harassment than those who responded in phase 1, and this difference is also statistically significant. The additional interviewer effort to add these students to the study brought in students who had significantly lower rates of experiencing all forms of sexual assault, but particularly lower rates of sexual harassment. However, the small number of cases targeted for phase 2 resulted in changes to final statistics on assault that were not significantly different from that obtained in phase 1.
Note that other types of statistics measured in this survey also demonstrated large and statistically significant differences between phase 1 and phase 2 responses. For example, respondents in phase 2 were significantly less likely to report attendance at specific campus programs, including both new student orientation and face-to-face programs designed to reduce sexual misconduct. Respondents to phase 2 were also significantly different than respondents to phase 1 in their stated likelihood of reporting sexual assault to specific types of campus resources—with phase differences running both directions depending on the specific type of resource. As documented elsewhere, nonresponse bias can vary greatly across measures within a single survey (Groves and Peytcheva 2008). That feature of nonresponse is demonstrated in these phase differences.
Of course, it is also possible that different subgroups of the student population also reacted differently to the second phase of this design. To investigate this possibility, we estimated the statistics presented in Table 4 separately for graduate students, undergraduates, males, females, and undergraduate females. Across all of these subgroups, we found almost no statistically significant differences between phase 1 and phase 2 on these three key statistics—of course, the small size of the phase 2 sample greatly reduces the statistical power to detect differences within these subgroups. That makes the one statistically significant difference more striking: graduate students who responded in phase 2 were significantly more likely to report experiencing unwanted penetration than graduate students who responded in phase 1 (5 percent in phase 2 compared to 1 percent in phase 1, p < .01, not shown in tables). Note that the graduate student response rate was significantly higher in phase 2 (Table 3).
Finally, the results in Table 4 also demonstrate that careful application of postsurvey nonresponse weights (poststratification) to the phase 1 responses only (column 3) produce estimates nearly identical to estimates that combine information from phase 1 and phase 2 (column 4). The same is true for the other statistics demonstrating significant phase differences (described above, but not shown in tables) and for the subpopulations demonstrating significant phase differences (graduate students as described above). Of course, high-quality sample frame measures, more feasible in a list-based sample, help to make this type of postsurvey adjustment successful. By conducting a successful two-phase design, we are able to document the success of postsurvey adjustment in achieving accurate measurement of these key statistics.
Conclusions
If all surveys achieve low response rates, then our ability to assess the quality of survey estimates will be severely limited. Here, we implement an empirical approach to the problem of nonresponse bias. We developed a two-phase responsive survey design plan to measure nonrespondents in the first phase and learn how they are different. As theories of survey participation predict, the consequences of a change in protocol not only vary across subgroups, but they also vary across the subject matter of specific measures within the same survey. As documented elsewhere, relatively small sample sizes in the second phase of a two-phase protocol always limit the effects of phase differences on final statistics (Axinn et al. 2011).
However, two specific issues are crucial for sociologists collecting survey data at all scales, including small-scale, list-based sample surveys. First, we cannot know the magnitude or direction of potential nonresponse bias without using some tool to measure these biases. Second, protocol differences across phases have the potential to illuminate subpopulation differences in important social processes—differences that may be important for either tests of hypotheses about these subgroups or policies/programs designed for these subgroups. Two-phase responsive survey designs are feasible at all scales and provide an important tool for detecting these differences in survey response.
Here, we show that on a small-scale campus student survey, a two-phase survey design with different recruitment protocols offered at each phase changed the representation of key subgroups in the survey as well as the reporting of a key topic (experiences of sexual harassment). These important differences point to the high value of using responsive survey design tools, even on small-scale surveys. The changes in the composition of respondents produced by this two-phase design are likely to produce many other differences in the statistics based on this survey, and the magnitude of those differences is likely to vary across statistics. The main obstacle to use of responsive design in smaller studies has been lack of knowledge and experience applying these methods outside of large national surveys. Sometimes, postsurvey adjustments may be adequate for correcting total population estimates for these differences—as shown here—but detection of this success depends upon having methods to measure differences between those who did and did not respond to the initial protocol. A key strength of the responsive design approach is that it creates direct measurement of those reluctant respondents.
Substantive Conclusions
These findings inform two key points in the practice of sociological research. First, responsive survey design methods can easily be implemented in small-scale surveys to maximize data quality within a fixed budget. The issues motivating responsive survey design are those of optimizing the allocation of fixed resources across design options to maximize data quality. In this example, variation in subgroup response rates is the quality dimension with lower variance expected to produce higher quality. Our results demonstrate that a change of the contact protocol in the second phase adds new respondents who are systematically different than respondents to the initial protocol. The two-phase difference in allocation of resources increased the response rate, improved representation of the target population, and informed key statistics with significant subpopulation differences. At any budget and scope, application of responsive survey design methods can help sociologists create higher quality data on the subjects they choose to study.
Second, key threats to the validity of web-survey results can be successfully addressed with careful application of responsive survey design methods. Web surveys have become popular in social research for good reasons. Not only can they be orders of magnitude less expensive to implement than other modes of survey data collection, but they also afford respondents a maximum level of privacy for reports on sensitive topics. Web surveys are self-reporting tools that do not require interaction with an interviewer, allow respondents to choose where and when they answer questions, and can be implemented so that answers to sensitive questions reside only on a secure server with protective encryption. These qualities of web-survey tools can make them an ideal option for the study of sensitive topics, and sociological research thrives on the investigation of such topics. However, a crucial limitation of web surveys is that they generally produce relatively low response rates, potentially increasing the threat of nonresponse bias (Dugan et al. 2015; Sarraf and Cole 2014). Improving response rates does not necessarily eliminate the possibility of nonresponse bias, but the systematic application of a different protocol in the second phase of a responsive survey design draws in respondents who themselves are systematically different than those in the first phase (Axinn et al. 2011; Peytchev et al. 2009). Here, we demonstrate that in the context of a web survey, adding a second phase change-in-contact protocol can be more effective at limiting nonresponse bias than higher levels of individual incentives. Changing the mode of contact, including phone calls, text messages, or face-to-face visits, is a valuable second-phase enhancement for web surveys. This is an important advance in the methods for conducting high-quality web surveys—an advance with wide utility in sociological research.
The specific results reported here also provide important tools for future web surveys. For example, the result that African American students respond at much higher rates to the phase 2 protocol indicate that similar student population surveys prioritizing participation of African American students may consider launching telephone contact with this subgroup immediately to maximize their participation. Evidence we report here indicates this subgroup is more likely to respond to personal contact from an interviewer than to higher incentives, but further research is needed.
Of course, this single-campus, student-focused web survey is not representative of all web surveys. The concentrated physical location and the list-based sample with both email addresses and phone numbers contributed to the success of this specific web survey. These are important limitations in any inference to other web surveys. However, the method of using alternative modes for contact with respondents—even while collecting all survey data via a web-survey tool—is an important advance with high promise in application to many different web surveys.
Methodological Conclusions
The results we present also advance general survey methodology in an important way. Research on two-phase differences in protocols overwhelmingly focus on individual incentive levels (Axinn et al. 2011; Peythcev et al. 2009). Building on the concept of “responsive design” and its foundation in a more general sociological theory of motivation to participate in surveys (Groves and Couper 1998), we extend this science to protocol differences in the mode of contact. We found that switching mode of contact led to the recruitment of groups that were less likely to respond to a single mode of contact, even with two different (and relatively high) incentives. These phases are “complementary” in that they recruit differentially across subgroups. As response rates to all different modes of survey data collection continue to decline, increased attention to mode of contact deserves a high priority in survey methodology. The evolution of communication technologies has created many more alternatives for mode of contact—not just text messaging but multiple forms of social media contact that can be delivered on an individual basis rather than generalized advertising. The results we report are consistent with the high potential for protocol differences in modes of contact to reach systematically different respondents, and increase the value of responsive survey design as a tool to limit the potential for nonresponse bias. Our cost comparison indicates the change-in-contact protocol was more effective than an increased individual incentive for a similar cost. Greater attention to the potential of mode-of-contact changes to improve survey quality is a high priority in survey methodology. Moreover, attention to the full breadth of protocol options informed by a sociological perspective on motivations to participate in surveys, such as salience of the topic and expert/authoritative endorsement, is also likely to continue to improve efforts to limit the risk of nonresponse bias.
Of course, there are many other dimensions of quality in surveys. The total survey error perspective (Groves and Lyberg 2010) seeks to minimize the area across all of these dimensions—nonresponse, measurement error, sampling error, and postprocessing errors. Responsive survey design has the potential to address sources of error other than nonresponse bias. One study found that a change in design improved reporting of sensitive behaviors (Peytchev, Peytchev, and Groves 2010). Further extensions of the methodology would make this an explicit aim.
Organizational Surveys and Campus Climate Studies
In this study, we found that obtaining higher response rates in campus student surveys of sensitive topics is possible. The topic of this study was salient on campus, and the high response rate may have been produced, in part, by the context of general media publicity that existed at the time. Since 2015, the salience of the topic has changed and future changes in topic-specific salience will shape variances in response rates. Nevertheless, comparison of a randomized experiment on incentive level and the two-phase design reveals that the expense of the second phase generated a bigger improvement in the response rate than the higher incentive. The second phase in this web survey used interviewer-assisted recruitment of a random subsample of nonresponding students. This design can keep the costs of interviewer effort affordable, but the substantive benefits of the second phase cannot be known until it has been done. The documentation we provide here of subgroup differences by phase and incentive levels allow those who must choose a survey design to understand the likely consequences of their choices.
In all surveys, researchers and organizational leaders are faced with difficult decisions regarding the allocation of scarce resources to the conduct of organization-specific data collections. The results presented here can be used to guide those difficult decisions. The results demonstrate the relative consequences of two of the most effective but expensive tools for managing nonresponse: individual incentives and interviewer-assisted recruitment. These results provide an important resource for the design and implementation of all types of surveys going forward.
Finally, surveys of sensitive topics such as sexual misconduct among institution-specific populations are not at all limited to universities. They are also needed and conducted among academic staff and faculty, government employees, military service members, and corporate staff. The relative comparison of key design features provided here can be used to assist in guiding design decisions for all such institution-specific climate survey data collections.
Supplemental Material
Supplemental Material, sj-docx-1-smr-10.1177_00491241211031270 - Applying Responsive Survey Design to Small-Scale Surveys: Campus Surveys of Sexual Misconduct
Supplemental Material, sj-docx-1-smr-10.1177_00491241211031270 for Applying Responsive Survey Design to Small-Scale Surveys: Campus Surveys of Sexual Misconduct by William G. Axinn, James Wagner, Mick Couper and Scott Crawford in Sociological Methods & Research
Footnotes
Authors’ Note
Because of the highly confidential nature of student reports of experiences of sexual misconduct, the data analyzed in this article are stored for restricted use in high security at the University of Michigan’s Institute for Social Research. To access these data for research purposes, individual investigators must obtain written permission from the University of Michigan’s Office of General Council and their home IRB review of specific analysis plans. Authors of this article will provide program code for the analyses published here upon request once these permissions have been obtained. All errors or omissions remain the exclusive responsibility of the authors.
Acknowledgments
We thank the many Michigan students who participated in this survey and the university administration for providing anonymous, de-identified data for research purposes. We also thank the design team for this Michigan survey, including Stephanie Chardoul, Heidi Guyer, Lisa Holland, Timothy Lynch, Malinda Matney, Zeina Mneimneh, Patty Petrowski, Holly Ryder-Malcovich, and Brady West. We are especially grateful to Heather Schroeder for her expert data analysis and both Armani Hawes and Jennifer Mamer for their assistance with the article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
The supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
