Abstract
Many of the official statistics and leading indicators that inform policy decisions are created from aggregating data collected in scientific survey interviews. What happens in the back-and-forth of those interviews—whether a sampled member of the public agrees to participate or not, whether a respondent comprehends questions in the way they were intended or not, whether the interview is spoken or texted—can thus have far-reaching consequences. But the landscape for social measurement is rapidly changing: Participation rates are declining, and people’s daily communication patterns are evolving with new technologies (text messaging, video chatting, social media posting, etc.). New analyses of survey interactions are demonstrating aspects of interviewer speech that can substantially affect survey participation, which is vital if social measurement is to be trustworthy. Findings also suggest that, once a survey interview starts, the risks of misunderstanding and miscommunication are greater than one might expect, potentially jeopardizing the accuracy of survey results; different approaches to interviewing that allow clarification dialogue can improve respondents’ comprehension and thus survey data quality. Analyses of text messaging and voice interviews on smartphones demonstrate the importance of adapting scientific social measurement to new patterns of communication, adding ways for people to contribute their data at a time and in a mode that is convenient for them even when they are mobile or multitasking.
Keywords
Tweet
Details of interaction in survey interviews can have far-reaching impact on important measures of people’s opinions and behaviors.
Key Points
The way that interviewers and respondents interact in survey interviews can substantially affect the resulting official statistics and leading indicators.
New analyses are uncovering details of interviewer speech in survey invitations that can increase participation.
The risks of misunderstanding and miscommunication in survey interviews are greater than one might expect.
Interviewing techniques that help respondents comprehend questions more uniformly—that attempt to standardize meaning rather than wording—can improve survey data quality.
New evidence shows that people interviewed via text messaging can provide more precise answers and disclose more sensitive information than in telephone interviews on their smartphones, even if they are mobile or multitasking.
Introduction
Accurate social measurement—quantifying public opinion and behavior—has become essential for informing policy and making decisions on major societal issues. Official government statistics that track changes over time in, for example, unemployment, health, agricultural productivity, crime, the prices of consumer goods, energy consumption, drug use, and mental health—both overall and broken down by subgroups of the population—are used by lawmakers to decide how, for example, public funds are allocated for food stamps, how congressional district boundaries are drawn, and at what level to fund local law enforcement. Polls of public opinion on issues from presidential performance to the state of the economy to marriage equality are used to justify proposed policy changes, and businesses use information from official statistics to inform their choices about where to locate and what to sell. The latest unemployment data or attitudes about race across the nation or community are recognized as important enough to be reported as major news.
Most of these measures are produced from responses to questions in scientific surveys, that is, in surveys of representative samples following theoretically based procedures that have proven to be trustworthy and predictive over many years (see Groves et al., 2009; Groves & Lyberg, 2010; Weisberg, 2009). In a large number of government surveys, the questions are asked and the responses recorded by interviewers either over the phone or in person. The number of such survey interactions involved in creating the data—aggregated to produce official statistics—is vast; just in the United States, hundreds of thousands of people are interviewed every month for federal surveys, and there are many other surveys beyond these. 1 What happens during these survey interviews thus has impact far beyond just those moments of data collection. That is, while any one response to a survey question may not seem particularly important, the aggregation of what comes out of those moments can be consequential.
Here, we review some of what cognitive and discourse research on survey interactions tells us about people’s willingness to participate in surveys, their comprehension of survey questions, and the accuracy of their answers. (The genesis of at least some of this evidence can be traced to concerted efforts starting in the 1980s to bring the perspectives of psychology and conversation analysis to survey research, under the rubric “Cognitive Aspects of Survey Methodology”; see, for example, Sirken et al., 1999; Tanur, 1992.) The evidence suggests that details of how interviewers speak can substantially affect survey participation. It also suggests that, once a survey interview starts, the risks of misunderstanding and miscommunication are greater than one might expect, and that this has the potential to lead to social mismeasurement on a larger scale. Implementing survey interviews designed to reduce miscommunication can guard against this.
It is also clear that the landscape for social measurement is rapidly changing; response rates are declining (e.g., Brick & Williams, 2013; Keeter, Kennedy, Dimock, Best, & Craighill, 2007) as people become fatigued from being oversurveyed, and they may not distinguish between surveys for social measurement in which providing their anonymous responses contributes to the social good and those designed for other purposes. People are also communicating via mobile devices and switching between modes of communication more frequently (e.g., between talking, emailing, texting, and video chatting on more than one device), using whatever available mode of communication best fits their needs at the time. These transformations are challenging existing methods for social measurement (e.g., landline telephone surveys, web surveys that rely on a desktop computer) to adapt and leading to new potentials for mismeasurement (Link et al., 2014). The policy implication is that survey data collection must continuously evolve to fit the ways that members of the public are communicating, including expanding the possibilities for participating in surveys when and how it is most convenient for them.
Being Invited to Participate in Surveys
The accuracy of social measurement in scientific surveys can be jeopardized if sampled members of the public are not willing to participate in them at sufficient rates (increasing the risk of “nonresponse error”). 2 This means that the invitation to participate in a survey is crucially important. For surveys that involve interviews, it falls on the interviewers to persuade sample members to participate—to become respondents. Of course, survey organizations have long made efforts to hire and retain interviewers with good interpersonal skills who are successful in convincing people to participate (more experienced interviewers tend to be more successful; Couper & Groves, 1992), and they have developed practices to support their interviewers’ success in garnering cooperation.
Beyond this practical wisdom, systematic research has identified a number of important features relevant to convincing potential respondents to participate in face-to-face and telephone surveys (see Groves, 2004, for an overview). These include basic concerns such as whether a potential respondent is at home or available on the telephone at all, the effectiveness of precontact announcement of an upcoming survey invitation, sociological factors such as the demographic similarity between interviewers and respondents, the extent to which people invited to participate are socially alienated or disaffected, and the kinds of information that have proven most effective in convincing skeptical sample members of the value of participation.
Analyses of audio recordings of “doorstep” interactions for face-to-face interviews (Morton-Williams, 1993) demonstrate the importance of projecting an air of professional competence, giving a good first impression, maintaining the interaction, and tailoring the introduction to the individual respondent (see also Campanelli, Sturgis, & Purdon, 1997; Couper & Groves, 1996). Such analyses make clear the importance of examining actual interactions, rather than interviewers’ beliefs about what is effective; even experienced interviewers can disagree substantially on which tactics are most effective (Snijkers, Hox, & de Leeuw, 1999). They also demonstrate that different approaches are likely to be needed and effective in different surveys; surveys that are rarely administered, for example, give fewer opportunities for interviewers to learn survey-specific tactics for persuading sample members to participate (Sturgis & Campanelli, 1998).
A new and more fine-grained kind of evidence about the impact of interviewers’ conversational moves on participation has come from more detailed analyses of audio recordings and transcripts of real-world survey invitations at the level of the conversational turn. Two recent studies demonstrate the insights that come from combining quantitative analyses with rigorous qualitative coding, as well as the practical usefulness of this kind of empirical investigation.
Schaeffer, Garbarski, Freese, and Maynard (2013) examined recordings and transcripts from 358 telephone invitations to participate in the 2004 Wisconsin Longitudinal Study, which had interviewed a large group of 1957 Wisconsin high school graduates four previous times over the years. Because they knew each sample member’s participation history along with many other attributes, Schaeffer and colleagues were able to create a sample of pairs of recruits to the 2004 survey matched on their “propensity” to participate in that survey, but where in each case one person now agreed to participate, and one declined. (This represents a real advantage over studies in which researchers have no idea what recruits’ propensities to participate in their survey might be.)
Detailed examination of the interactions revealed that certain kinds of interviewer speech were associated with a sample member’s increased odds of agreeing to participate, for example, more formal greeting tokens (“hello” vs. “hi”), more politely worded requests (using “may I,” “please,” and the sample member’s first and last name or title plus last name), and the judicious use of mitigators (“just” or “might”) and continuers (“mm-hmm”). They also revealed that the order in which interviewers presented information matters: Interviewers’ identifying themselves or their institution before asking to speak with the sample member increased agreement to participate (see also Maynard & Hollander, 2014). But the larger story is how interactive the effects are: Actions of the sample member, and how the interviewer responds to those actions, lead to different pathways that invitation sequences might take. Wrong moves at inopportune moments (e.g., an interviewer’s asking “why not?” when a respondent says “not interested”) are associated with refusal.
In another recent study, Conrad, Broome, et al. (2013) analyzed a sample of 1,380 audio-recorded telephone invitations to five different surveys by 100 different interviewers from the University of Michigan Survey Research Center, made in most cases as “cold calls.” The results demonstrate that invitations in which interviewers were moderately disfluent (e.g., using “um” and “uh” at a moderate rate) were more successful than invitations that were “robotic” (overly fluent) or painfully disfluent. In invitation sequences that ended up leading to agreement to participate, respondents were more likely to use back channel utterances (e.g., “uh-huh” or “I see”) and less likely to speak over the interviewer, consistent with a view that conversational behaviors by both respondents and interviewers are informative about likely outcomes.
Findings like these allow new kinds of evidence-based interviewer training for navigating the complexities of invitations and improving survey participation (e.g., Groves & McGonagle, 2001), although much remains to be learned about individual and cultural differences across different sample populations, as well as how trainable different aspects of interviewer speech can be. Much also remains to be learned about what leads to successful invitations in different modes (e.g., in contacting sample members through text messaging or email). The kinds of arguments that interviewers will need to have available to counter lack of knowledge and mistrust about data protections and uses will, we suspect, need to evolve over time as the landscape changes. Survey centers, of course, already select interpersonally skilled interviewers in their hiring process and work closely with them to help them succeed at inviting participants. But systematically training interviewers in conversational nuance based on insights from this kind of research evidence is a new frontier for improving survey participation.
Miscommunication During Survey Interviews
Another area of research on survey interaction with important policy implications has focused on question comprehension and interviewer–respondent interaction in interviews. It has long been known that seemingly minor changes in question wording and ordering can affect the responses people give and substantially enough to affect inferences about the population’s opinions and behaviors. For example, estimates of how much television people watch per day are higher when the question’s response scale ranges higher (from less than 2.5 to more than 4.5 hr) than when it ranges lower (from less than 0.5 hr to more than 2.5 hr; Schwarz, Hippler, Deutsch, & Strack, 1985). Questions can be interpreted differently depending on which other questions precede them, and even who the respondent believes is asking the question; in one study, people judged the very same behavior as more likely to represent sexual harassment when it was part of an alleged “Sexual Harassment Survey” conducted for Women Against Sexual Harassment rather than as part of an alleged “Work Atmosphere Survey” conducted for a Work Environment Institute (Galesic & Tourangeau, 2007). The available response options, the order of response options, and even the numbers associated with a rating scale (−5 to +5 vs. 0 to 10) can affect response distributions (see Clark & Schober, 1991; Conrad, Schober, & Schwarz, 2014; and Schwarz & Hippler, 1991, for additional examples).
These sorts of findings suggest that there is no such thing as a perfectly neutral unbiased question that is guaranteed not to affect responses. Survey respondents do not turn off the ordinary reasoning and comprehension processes that they bring to conversation more generally (Clark & Schober, 1991), and they will interpret questions based on all available evidence—including who is asking the question, what questions came before, and what the response options might mean about the intentions behind the question (Schwarz, 1996, 1999). As we see it, the challenge for social measurement is to not ignore these kinds of “response effects” or to wish they would go away but rather to more deeply understand them and to design surveys that minimize response effects (and the resulting measurement bias) by making use of that understanding. This is just the kind of work that methodological research groups in organizations that produce official statistics (e.g., the U.S. Bureau of Labor Statistics, the U.S. Bureau of the Census) are committed to doing.
Just as we saw with survey invitations, focusing on the back-and-forth of interviewer–respondent interaction reveals an additional set of considerations relevant to respondent comprehension in survey interviews. A seminal study (Suchman & Jordan, 1990) examined transcripts of strictly standardized survey interviews in which interviewers are not only required to read questions exactly as worded with no deviation from the script but also to avoid clarifying what questions mean if respondents ask for clarification (e.g., asking “What do you count as ‘work for pay’?”). 3 As Suchman and Jordan’s examples demonstrate, this can lead to perverse interactions that can frustrate the respondent; when an interviewer responds to a direct request for clarification with “whatever it means to you” or says “let me repeat the question” and proceeds to do so, respondents can feel that the basic ground rules of conversational interaction have been violated, and they can end up providing answers that do not meet the needs of survey designers.
From the perspective of basic research on how people “ground” their understanding in dialogue, addressees tend to comprehend better—more accurately—when they can work together with speakers to make sure that they have understood the speaker’s references as intended (e.g., Clark & Wilkes-Gibbs, 1986; Schober & Clark, 1989); misunderstanding is a risk if a listener cannot engage in clarification dialogue. It turns out that the same principles apply in surveys: Respondents in telephone interviews in which they can ask for clarification about the meaning of words and expressions in questions end up giving answers that better fit the survey designers’ intentions. In addition, response accuracy improves yet more when interviewers are empowered to provide unsolicited clarification whenever they get the sense that respondents might need it on a particular question, based on their ordinary conversational intuitions. This has been demonstrated in laboratory experiments in which respondents answer questions from U.S. government surveys about housing, employment, and purchases on the basis of fictional scenarios (Schober & Conrad, 1997; Schober, Conrad, & Fricker, 2004) as well as in a national telephone sample using the same questions and a reinterview method to ascertain the accuracy of the original answers (Conrad & Schober, 2000).
At least for the kinds of misunderstanding addressed in these studies—that is, cases where a respondent may have trouble knowing how the key concepts in a question map onto his or her own circumstances 4 —a more collaborative interviewing technique that uses ordinary conversational processes to ground meaning can lead to higher quality data. To use Suchman and Jordan’s terms, doing what it takes to standardize meaning rather than wording can pay off. But the evidence shows that the improvement in data quality comes at a cost: Interviews take substantially longer, and interviewers need specialized training in official definitions of survey concepts, which themselves may need to be developed. 5
Are the additional costs of this kind of interviewing justified—or when might they be? One important factor is the extent to which misunderstanding without clarification is likely to be a problem for a particular survey or particular question—on the frequency within a population of what we have called “complicated mappings” between survey concepts and people’s circumstances, and just how often this might lead to mismeasurement. The few attempts at trying to quantify the variability of understanding (and thus the frequency of potential misunderstanding) across a population have not been reassuring. For example, Belson (1981, 1986) reinterviewed survey respondents to find out what they had been considering when they answered particular survey questions, and he found substantial variability in how they interpreted ordinary words and phrases like “you” (16% included other people too), “watch television” (33% included time that they had not been paying attention to it), and “weekday” (61% included more or fewer days than 5 weekdays). Suessbrick, Schober, and Conrad (2000) reinterviewed people responding to questions in a U.S. National Cancer Institute’s survey about tobacco use (the Tobacco Supplement to the Current Population Survey). Disturbingly, 10% of the respondents changed their answer to the question, “Have you smoked at least 100 cigarettes in your entire life?” from “yes” to “no” or “no” to “yes” when given a standard definition about what to count (any puffs on any tobacco cigarettes, whether you finished the cigarette or not, whether or not you bought the cigarettes yourself, and excluding marijuana cigarettes, pipes, and cigars). Given that this first “filter” question determines whether respondents are considered smokers or not, this means that 10% of the respondents may have been answering the “wrong” questions in the rest of the survey, potentially skewing results for a basic health measure.
In most surveys, prior work that systematically uncovers the range of conceptions across a population and the potential for resulting mismeasurement simply has not been done; this kind of extra pretesting would be more time-consuming and expensive than is usually feasible. So knowing how much needs to be invested in clarification to improve survey accuracy is not always clear. While one might guess that highly multicultural and multilingual populations may be particularly prone to this kind of measurement problem, even that is not guaranteed. Nonetheless, the larger scale evidence thus far suggests that interviews that provide clarification as needed can lead to significantly more accurate data, as measured by comparing survey responses in interviews in which interviewers did and did not provide clarification with administrative records that are likely to be accurate (Bruckmeier, Müller, & Riphahn, 2015).
We propose that, in the absence of certainty about how big the problem is for a particular survey, the researchers need to think through the trade-offs. If complicated mappings are known to be rare (which can only really be determined empirically), and if the precise values of estimates from the population are not essential, then the extra effort and expense may not be worth it. Or if one is most interested in change in an estimate in a longitudinal survey (say of consumer confidence or unemployment), and one is confident that error due to complicated mappings is similarly distributed at every point of measurement, then the extra costs may not be justified. However, for a high-stakes survey with one-time measurement where precise values (“point estimates”) are meaningful and the frequency of complicated mappings is unknown, investing in conversational interviewing to improve response accuracy may be worthwhile.
Survey Responding in New Communication Modes
Survey interviews have traditionally been administered in face-to-face or telephone interactions and, more recently, in web-based “self-administered” modes on computers. However, more and more people are communicating via many different modes on multiple devices: email, text messaging, video chatting, voice messages, blog postings, and various forms of social media. They are also replying to messages that were sent in one mode in different modes (e.g., responding to a voice mail with a text message), or even switching modes or devices midinteraction (e.g., when their connectivity or battery power dies, or when they switch from email to Skype). All of these changes are affecting how people can be contacted to participate in surveys and how survey data can be collected.
How is the quality of survey data—and of the estimates that result—being affected? The fact that so many respondents are now interacting while mobile (e.g., responding to a web survey on their phone rather than from their desktop) and potentially multitasking while answering survey questions suggests that there is potential for new kinds of mismeasurement (see Link et al., 2014, for discussion). And the fact that different subgroups of populations are adopting and using new communication technologies at different rates (Smith, 2015) raises the concern that new sources of measurement bias could be emerging. However, there is also potential for new ways to contact and engage respondents in ways that they find most convenient and comfortable. As we see it, there is substantial risk to social measurement from not adapting to potential respondents’ preferred ways of communicating.
New research examining interviewer–respondent interaction and survey data quality in mediated modes of communication gives some promising indications—while also raising new questions. In one study (Schober et al., 2015), 634 people who had agreed to participate in an interview on their iPhone were randomly assigned to answer 32 questions from U.S. social and government surveys, either by text messaging or by talking, in interviews administered either by a human interviewer or by an automated (text or voice) interviewing system. Analyses of the interactions demonstrated that the dynamics of texting interviews—with individual questions and responses interleaved—were quite different from the dynamics of voice interviews (see Figure 1), with far fewer turns spread out over much longer periods of time. Nonetheless, by several measures, texting led to higher quality data—more precise numerical answers (i.e., fewer that were rounded to end in a 0 or a 5), more differentiated answers to a battery of questions (less “straightlining”), and more disclosure of socially undesirable information—than voice interviews, both with human and automated interviewers.

Interview duration and median number of turns per survey question in Schober, et al. (2015). These timelines display the median duration of question-answer sequences with the median number of turns after each question in four different survey interview modes on iPhones.
What accounts for this pattern? There are a number of differences between texting and talking that may be relevant (see Schober et al., 2015); what seems most plausible is that the relative asynchrony of texting versus talking and the increased social distance with an interviewer in text reduce time pressure (thus allowing more precise responding) and the embarrassment of disclosing sensitive information to a listening interviewer. The fact that there was greater precision and disclosure in text even for respondents who reported having been mobile or multitasking while answering is consistent with a view that responding when and how it is more convenient may (counter to common wisdom about multitasking) lead to more trustworthy and accurate answers. Being given the option to respond in a way that is most convenient may well have additional benefits; a follow-up study (Conrad, Schober, et al., 2013) demonstrated that respondents whose invitation to participate required them to choose whether to continue in the mode of invitation or to switch to one of the other three modes produced higher quality data across all four modes. In both studies, a substantial proportion of participants reported that, for future interviews, they would prefer interacting with an automated system (rather than a person) and via text.
Text messaging is only one of many modes of interaction that people are now using every day that could be used for survey data collection. Each has distinct features that may well make a difference for survey data quality, for good or ill. Feuer and Schober (2015), for example, demonstrated that in video-mediated (Skype) survey interviews, respondents who could also see themselves (in addition to the interviewer) in a small self-view window revealed more sensitive information and reported the interview to be less sensitive than respondents without a self-view. People answering questions asked by an on-screen animated virtual interviewer smile more and produce more “uh-huh”s and nods when the virtual interviewer has greater facial motion (Conrad et al., under review), and they disclose less sensitive information to both a human and a virtual interviewer than to an audio-only interface (Lind, Schober, Conrad, & Reichert, 2013).
Although detailed analyses of interaction in new and emerging survey modes is only beginning, the existing evidence already raises the question of what should be considered the gold standard for survey data collection in a transforming world. Longstanding survey practices have been based on the assumption that face-to-face or telephone interactions lead to the best data quality, consistent with common wisdom about the dangerous distractions of multitasking. But new modes (particularly asynchronous ones like texting or mobile web surveys) may well allow respondents to be more willing to participate in surveys, as well as to consider their answers more carefully, giving them time, for example, to consult their own records to make sure they provide the most accurate possible answers and to be clearer about the opinions they want to express.
Policy Implications
Although policy makers are often unaware of the origins of the data that they use in their work, the interactive processes that generate the data can have enormous impact on the data’s accuracy. Data based on biased samples (which can result from ineffective invitations) and inaccurate data (which can result from respondents’ variable understanding and misunderstanding of survey questions, and from researchers’ insufficiently considering how answers can be affected by the mode of interaction) run the risk of misinforming policy makers and leading to poor decisions. The kind of research on survey interaction that we are calling attention to here suggests some concrete steps for safeguarding the quality of social measurement in a rapidly changing world.
First, the evidence suggests the potential benefits of systematically training interviewers in the kinds of speech and conversational moves that lead to successful survey participation. It also suggests that, at least for some surveys, it would be beneficial to train interviewers to provide tailored clarification of survey concepts when it is needed, as long as there is also careful attention to not introducing new measurement biases. Survey centers, of course, already select for and train interviewers with good conversational intuitions, but presumably, even the best interviewers could benefit from more detailed skills building and feedback, and not all interviewers are interpersonally skilled. Explicit training in the subtle cues that can be meaningful in survey interactions (e.g., speech disfluencies or gaze aversion as predictors of unreliable answers; see Schober, Conrad, Dijkstra, & Ongena, 2012), as well as in the impact of different ways of responding to those cues, could make interviewers more effective in getting accurate data.
Second, new forms of communication are leading to new kinds of interaction, with different norms, dynamics, and expectations surrounding them. Whether every new communication mode is appropriate for social measurement is an open question (see papers in Conrad & Schober, 2008), but it is clear that maintaining the trustworthiness and reliability of ongoing social measurement will require adapting to fit the ways that people currently communicate even as those ways change. We suspect that deciding, once and for all, which technologies will prove most effective for social measurement is likely to be impossible, as different subpopulations adopt different technologies at different rates, and people range in their preferences: Some people will probably always want a “human touch,” while others may find a less pressured “anytime-anywhere” interaction with an automated system to be attractively convenient. And these preferences may vary in different situations and change over time. The upshot, we expect, is that a continuing—perhaps growing—need will be to provide more choices of communication mode and more convenient ways for people to participate in social measurement.
All of this requires continuing commitment to investing in the infrastructures for social measurement, and to their continued upgrading and adaptation. In practical terms, it will (as always) require continuing scrutiny of the trade-offs in costs and benefits of the different elements of social measurement: not only the costs of hiring and training and monitoring interviewers, technology costs, questionnaire design costs, and data analysis costs but also the time and effort that members of the public can reasonably be expected to contribute. We believe that empirical findings on the details of survey interaction in different modes, and how members of the public feel about that interaction, will be important to consider in the mix.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
