Abstract
Contextual effects refer to the process by which responses given to survey questions can be affected by question order. Generally, contextual effects harm data measurement validity by introducing bias and increasing measurement error; the risk is that responses to a survey’s later questions are partly affected not only by the substance of the question but also by the preceding questions. Two opposite effects are possible: a carryover effect refers to the assimilation of later questions into those previously asked, and a backfire effect refers to the contrasting of earlier and later questions. In the case where a stereotype is activated in earlier questions of a survey, the previous literature suggests a carryover effect is more likely. The present study tests whether this is also the case in factorial vignette research by examining the influence of first presenting a vignette that corresponds more closely to a stereotypical view of sexual abuse. Results indicate a backfire effect, pointing to the distinctively different way in which vignette scenarios activate stereotypes compared to general survey questions. The results also highlight the need for researchers to control for contextual ordering effects when modeling factorial vignette data.
Social scientists like to think their survey questions are answered based on the substantive topic, but previous empirical research indicates that the format of questions can create various issues that influence respondents’ answers. Notably, the phrasing, ordering, and context of questions have all been demonstrated to significantly alter responses (Schwarz 1999). The present study adds to this body of literature and investigates contextual effects in vignette research. The issue of ordering has largely been unexamined in this context, yet it potentially carries great importance, as question ordering matters in survey design more broadly. To examine the influence of vignette ordering on responses to scenarios, we conducted a survey of residents of Washington State in the summer of 2016, presenting respondents with two hypothetical scenarios of sexual abuse to analyze the effect of first presenting scenarios containing more negative stereotypes.
Contextual Effects in Survey Research
Previous research has demonstrated that responses to survey questions can be significantly different depending on the order in which questions are asked. For example, in an oft-cited study, H. H. Hyman and P. B. Sheatsley (1950) found that U.S. respondents were more likely to approve admitting reporters from Communist countries after a question about Communist countries admitting U.S. journalists. Similarly, Howard Schuman, Stanley Presser, and Jacob Ludwig (1981) found 15 percent more public support for abortion after an unrelated question than after a question about child-defect pregnancies. Differences such as these are concerning for survey researchers who want to measure public opinion on topical areas and make appropriate recommendations, making the study of question ordering—also called contextual or priming effects—imperative (Schwarz and Sudman 1996).
Contextual effects seem to limit the time and effort required from respondents to form an opinion, notably when respondents use readily available information, such as a prior question or set of questions falling into a similar topical area (Tourangeau and Rasinski 1988; Yeric and Todd 1996). Two distinct types of contextual effects can occur: carryover and backfire (Tourangeau and Rasinski 1988). Carryover refers to the assimilation of a survey’s later questions into those previously asked; backfire refers to the contrasting of earlier and later questions. When they occur, these two types of contextual effects impact responses in opposite ways: carryover effects result in the alignment of responses between earlier and later questions, and backfire effects create divergences between the two. Both types of effects have been discussed in the context of ordering general and specific questions. Patterns of results show that a carryover effect is likely when general questions are asked first, the consistency in ratings being explained by a continuous line of thinking; backfire effects occur when specific questions are answered before general ones, to avoid redundancies in thought (Strack, Martin, and Schwarz 1988).
There is a scarcity of research about contextual effects on attitudinal and perception questions involving stereotypes, so it is more difficult to predict the type of contextual effect to expect. Generally, stereotypes are conceptualizations about a group of individuals that are widely held and often automatically activated in the presence of a member of the stereotyped group (Banaji and Hardin 1996; Kanahara 2006). They take the form of a learned association in which specific characteristics are instinctively ascribed to a member of the group. Previous psychological research demonstrates that, once activated, a stereotype is used to interpret subsequent ambiguous situations (Devine 1989); in a survey context, this would likely result in a carryover effect.
Two studies have indirectly looked at the influence of stereotypes on subsequent attitudes, and they both indicate a carryover effect. First, in a study that randomly assigned the order of questions about racial attitudes, Paul M. Sniderman and Thomas Leonard Piazza (1993) found that asking first about affirmative action provoked more self-reported negative attitudes toward black individuals in the next question, suggesting the first question generated a carryover effect that enhanced negative stereotyping. In another study, Gregory M. Herek and John P. Capitanio (1999) examined the impact of asking about either homosexual men or homosexual women first on self-reported opinions about both groups in a sample of heterosexual respondents. For male respondents, opinions of homosexual men were more favorable if asked about homosexual women first, offering further support of a possible carryover effect. Generally, these results are in line with those of research investigating the effect of stereotype activation on behaviors (as opposed to attitudes, which are the focus of the current study). S. Christian Wheeler and Richard E. Petty (2001) conducted an extensive review of the literature on this topic. In 15 (93.8 percent) out of 16 studies, they found assimilation of behavior to a negative stereotype, indicating research participants generally conformed their behavior to what would be expected based on an activated stereotype. Overall, research on stereotypes, mostly about behaviors but also regarding attitudes, confirms other findings that stereotypes act as heuristic devices to simplify the cognitive process (Bodenhausen 1990), with the core components of a negative stereotype having a lasting effect on subsequent questions or behaviors.
Research has established the importance of question ordering for surveys in general, but less work has examined question ordering for vignette studies. The paucity of research on this topic is problematic given that question ordering is particularly important when respondents are asked sets of similar questions (Smith 1992). This is a typical condition for vignette studies, which often ask respondents to make a number of judgments about similar scenarios. Moreover, research shows the location of vignettes within a larger survey matters, with evidence suggesting reading vignettes first might influence later self-assessment questions (Hopkins and King 2010; Lau, Seltzer, and Bianchi 2015), and the order of dimensions within vignettes also matters (Auspurg and Jäckle 2015). To our knowledge, however, only a single study has directly examined the ordering of vignettes (Sauer, Auspurg, and Hinz 2020). Randomization in vignette experiments ensured vignette dimensions and respondent characteristics were not aligned, but the study’s within-subject design opened the possibility of bias through order effects. Carsten Sauer and colleagues’ (2020) study, which used a sample of 408 university students, evaluated scenarios describing employees and asking about the fairness of their wages. To examine vignette order, some respondents received a randomly assigned first vignette (as is standard with factorial survey research) and others rated an extreme case first (i.e., high underpayment or high overpayment). Results indicated no significant difference in regression coefficients estimated based on order of vignette presentation, prompting the authors to recommend continuing the random assignment of vignette order, as it may eliminate the presence of bias through ordering effects. Yet, additional research on more representative samples and on other topics is needed to more fully understand the potential for bias through ordering effects in vignette research.
In general, there are two methods by which researchers present multiple vignettes to respondents, and ordering may matter for both. First, in some studies, respondents are asked to rate a number of distinct vignettes. Many studies use this design, including research on sex offenders (Gavin 2005; Hanson and Scott 1995), behavioral intentions (Mazerolle and Piquero 1998), and expert opinions and judgments (Hanzlick and Goodin 1997). In these studies, each respondent is typically presented with the same list of vignettes (in the same order) and asked to provide some evaluation or judgment related to each scenario. For example, Randy Hanzlick and Julia Goodin (1997) presented a set of scenarios describing subjects’ deaths to a sample of medical examiners. If vignette ordering affects responses, then providing respondents with the exact same list of vignettes may systematically bias responses to the list’s later vignettes. Although each respondent is exposed to the same potential ordering bias, these effects might still lead to over- or underestimation of factors related to later vignettes. In other words, although the effect of vignette ordering is controlled for by presenting each respondent with the same vignettes in the same order, there still might be some average effect on estimating the accuracy of responses to later vignettes.
The second type of study in which multiple vignettes are presented to the same respondent is the factorial survey experiment (FSE). FSEs are a variant of the vignette methodology in which respondents are asked to evaluate multiple versions of the same scenario and to provide their evaluation, judgment, or response. FSEs implement an experimental design within a survey by presenting respondents with a number of vignettes that vary along a set of predefined dimensions (Rossi and Anderson 1982). The hallmark of this approach is that the random assignment of dimension levels ensures the vignette dimensions are orthogonal to respondent characteristics. Best practices recommend, especially for studies involving larger factorial dimensions, that a subset of vignettes are presented to respondents (Auspurg and Hinz 2015). Regardless of how the vignettes are assigned to each respondent, scholars commonly randomize the order in which the vignettes are presented (Auspurg and Hinz 2015; Hopkins and King 2010). Most studies indicate when they have used this method, although in some cases it is not clear how researchers presented the vignettes to respondents.
Randomization of vignette order is, on the surface, a powerful tool for combating bias arising from ordering effects. Under ideal conditions (when the sample size is large enough for the given factorial vignette universe), this should result in level balancing. As a result, in a model where vignette characteristics are used to predict respondent judgments, randomization can help ensure the coefficients represent the average effect of each factorial dimension and level on the outcome variable. In practice, however, this assumes ordering effects are of the same magnitude across respondents. Such an assumption is tenuous, given that prior research suggests ordering effects may, in fact, vary along with respondent characteristics (Auspurg and Jäckle 2015), and respondent characteristics may not be randomly distributed across a sample. An additional challenge to this assumption relates to the possible differential activation of stereotypes resulting from the random assignment of vignette dimensions. Considering the previous stereotype research presented, it is conceivable that the different levels in dimensions of a scenario activate core beliefs about the stereotyped group in some research participants but not in others who receive a different dimension level. This can be problematic for subsequent answers, considering previous work shows stereotypes have lasting effects once activated. This is the specific case we investigate in the present study.
Aim of the Study and Hypothesis
The current study aims to investigate the effect of ordering in vignette scenarios about sexual abuse by comparing respondents’ ratings on an adult-victim vignette based on whether or not the respondent was first exposed to a child-victim vignette. Our survey presented respondents with two vignette scenarios of sexual abuse, one perpetrated against a 9-year-old child and one against an adult victim. The order of the vignettes was randomly assigned, with half of respondents rating the child-abuse vignette first and the other half rating it second. We tested specifically for the effect of activating the “sex offender stereotype” in vignette respondents, in light of previous research documenting incorrect public beliefs about sexual offenders regarding offense types, re-offense risk, and amenability to treatment (Craun and Theriot 2008; Fortney et al. 2007; Levenson et al. 2007; Mancini and Budd 2015; Pickett, Mancini, and Mears 2013; Quinn, Forsyth, and Mullen-Quinn 2004) that result in “mythic narratives and beliefs that do not match reality” (Socia and Harris 2016:382). We draw on literature showing that offenses against children are seen more negatively because of the vulnerability of child victims (Mears et al. 2007; Rogers, Hirst, and Davies 2011) and that such offenders are considered very dangerous (Jahnke, Imhoff, and Hoyer 2015; Lynch 2006). These perceptions result in more public support for severe punishment of abusers of children compared to those who victimize adults (Kernsmith, Craun, and Foster 2009; Mancini and Mears 2010; Mears et al. 2007).
We hypothesize that we will observe differences in ratings that indicate a carryover effect. Simply stated, we believe respondents who rate the child sexual abuse vignette first will subsequently rate the adult-victim scenarios more severely, because the child sexual abuse scenario corresponds more closely to the stereotype of a “sex offender” and thus will evoke the related concepts of dangerousness and recidivism.
Method
Data Collection and Sample
We collected the vignette data from a sample comprising 1,000 adults from Washington State. The sample was constructed by YouGov, a company specializing in online research and sampling. YouGov was contracted to electronically survey a representative sample of Washington State’s population for the purpose of gaining insight into public perceptions of sexual offenders. YouGov’s sampling process involves two stages: first, an online survey is administered to a non-probabilistic sample drawn from its pool of members; second, some respondents from the initial sample are selected based on their match to an established sampling frame (Rivers 2006). The final sample comprised respondents who were matched to Washington State respondents of the American Community Survey 2012. They were determined to offer the closest fit on gender, age, race, education, political party identification, ideology, and political interest.
Overall, respondents represented 36 (out of 39) counties of Washington State; the majority of respondents were from King County, the location of the state’s most populous city. Most respondents were white (84.9 percent) and a little over half of the sample was female (56.3 percent). On average, respondents were 52.4 years old at the time of survey administration (S.D. = 15.6). Respondents were from various income levels; more than half of participants had a yearly household income higher than US$60,000. 1 Eighty-three percent of the sample had completed some college education, and 61.6 percent reported being a parent. Sixty-one percent of respondents were married or living in a domestic partnership, and half of the participants were working. With regard to political ideology, about a third of the sample (33.6 percent) reported being liberal, a third (33.5 percent) reported being moderate, and 27.4 percent reported being conservative.
Survey and Factorial Vignettes
The survey was administered to measure public opinion about sexual offenders. As such, it contained a number of questions about respondents’ demographic characteristics, as well as their knowledge and attitudes about sex offenders in general and about applicable policies, such as registration, residency restrictions, and community notification. The final part of the survey, most relevant to the present study, presented each respondent with two factorial vignettes, one against a child victim and one against an adult victim. The order of presentation of these two vignettes was randomly assigned, with half the sample rating the child-abuse vignette first and the other half answering it second.
The adult-victim and child-victim vignettes were built on a similar framework that told the story of an athletic coach found guilty of sexual assault. Each vignette varied on three dimensions, as presented in Table 1. The text of the adult-victim vignette follows; it should be read in conjunction with the third column of Table 1 to determine the exact wording: [Offender Sex] was a coach of the local adult athletic league for several years. However he/she was fired from his/her position after being accused of sexually assaulting a [victim sex] participant on his/her team. [Offender priors]. [H]e/she was later found guilty and sentenced to serve time in state prison. His/her sentence is almost up and he/she will soon be released back into the community.
Vignette Dimensions and Levels
Because responses to the child-victim vignette were not directly related to our research question and they did not affect the results regarding vignette order and vignette dimensions, the text of the child-victim vignette is not presented here. It is available upon demand.
Measures
Independent variable (IV)
We had only one predictor variable. It captured the order of the vignette scenarios rated by each respondent. We distinguished between two groups: respondents who received the vignette involving the child victim first (the more stereotypical scenario, coded as 1) versus those who received the vignette involving the adult victim first (the less stereotypical scenario, coded as 0).
Dependent variables (DVs)
The study included three dependent variables, which we hypothesized would vary based on the order of the vignettes: (1) social distancing, (2) appropriateness of sanctions, and (3) estimates of recidivism risk. Participants rated these three items in response to the vignette involving the sexual abuse of an adult victim. All DVs were adapted from B. G. Link and colleagues’ (1999) vignette experiment, which derived data from the 1996 General Social Survey measuring public opinion about mental illness. Specifically, we replaced the individual with mental illness in the original vignettes with the perpetrator described in our vignette. The three DVs were index variables, created by combining a number of survey questions all rated on a five-point Likert Scale, after Principal Component Analysis indicated they each loaded onto one factor. The specifics are presented next.
The first dependent variable is social distancing (α = .93). It was created by combining respondents’ ratings on five variables measuring their willingness to (1) move next door to, (2) socialize with, (3) be friends with, (4) work closely with, and (5) marry into the family of the abuser described in the vignette scenario. All variables were reverse coded for the purposes of the current analysis so that higher values indicate a higher level of distancing from the abuser and the behavior described in the vignette scenario.
The second DV, appropriateness of sanctions (α = .92), combined six variables asking respondents their level of agreement with various sanctions against the offender in the vignette: (1) that the behavior constituted a serious offense, (2) that serious punishment was warranted, (3) that the perpetrator should have to register locally, (4) that the perpetrator should have to register nationally, (5) that the public should be notified about the perpetrator, and (6) that the perpetrator should be restricted in establishing a residence. Higher scores on the combined variable indicate respondents believed the perpetrator should be submitted to more sanctions and punitive measures.
Finally, the DV estimate of recidivism risk (α = .77) combined five variables asking respondents to rate the likelihood the offender in the vignette would (1) commit a new sexual offense, (2) commit a new nonsexual offense, (3) commit a crime involving physical violence, (4) be successfully rehabilitated, and (5) never offend again. The latter two variables were reverse coded to reflect the same theoretical direction as the others. Higher scores on the resulting combined variable indicate respondents expected the perpetrator in the vignette to be at a high risk of recidivating.
Control variables
We used the three adult vignette dimensions presented in Table 1 as covariates. Offender sex refers to whether the vignette described a female or male abuser, respectively coded 0 and 1. Victim sex was coded the same way, representing the victim of the abuse related in the vignette. Finally, offender priors distinguished between a perpetrator who had never been accused of abuse in the past (coded 0) and one who had been (coded 1). The inclusion of these three variables as covariates allowed us to determine whether the order of the vignette still had an effect after controlling for other factors that have been demonstrated to affect ratings of perpetrators of sexual crimes (Waterman and Foss-Goodman 1984). The survey also collected information on respondent characteristics, but we omit those from our model, both to simplify the presentation of our results and because our focus is on the ordering of questions. Given that the vignette dimensions and factors are randomly generated, they are orthogonal to respondent characteristics; thus, the exclusion of such characteristics does not induce any omitted variable bias in our results.
Descriptive statistics
For analytic purposes, we only used the answers of respondents without missing data on any variable; this brings the sample size to 961, indicating a missing data rate of less than 4 percent. As expected, the random assignment of vignette factors produced even distributions on vignette characteristics (adult-victim vignette first: n = 475, 49.4 percent; male offender: n = 493, 51.3 percent; female victim: n = 490, 51.0 percent; offender with priors n = 484, 50.4 percent). Finally, descriptive statistics on all three dependent variables generally indicate respondents’ condemnation of the individual perpetrating the sexual assault described in the adult vignette, as illustrated by high levels of social distancing (M = 19.38, S.D. = 4.88), rating of more severe punishment as appropriate (M = 24.41, S.D. = 5.46), and high estimates of recidivism risk (M = 16.20, S.D. = 3.45).
Analysis
We conducted multivariate analysis of covariance (MANCOVA), an appropriate statistical approach considering that the three dependent variables are correlated and the effect of our independent variable was measured while controlling simultaneously for the effects of covariates. In particular, we examined differences in the level of social distancing, severity of sanctions, and estimated recidivism risk on the adult vignette (the DVs) between respondents who received the child vignette first and those who received the adult vignette first (the IV), after controlling for the effect of vignette dimensions relative to the offender’s sex, victim’s sex, and offender’s prior record (covariates presented in Table 1). We computed MANCOVA using the general linear model function in SPSS version 24. We checked that the correlations between our DVs were significant but not overly high (all r < .6). An examination of Box’s test of equality of covariance matrices indicated our analysis did not violate the assumption of covariance homogeneity (Box M = 11.59, p = .07), nor did it violate Levene’s assumption of equality of error variances (social distancing: F = 0.66, p = .42; appropriateness of sanctions: F = 2.18, p = .14; estimate of recidivism risk: F = 0.03, p = .87).
Results
First, we tested our assumption that respondents would rate the vignette involving the sexual abuse of a child more severely, compared to the vignette involving the sexual abuse of an adult victim, by conducting dependent t-tests. The results indicated significant differences for all three dependent variables, all in the expected direction. Specifically, respondents’ answers had higher levels of social distancing (M = 20.16 [S.D. = 4.70] vs. M = 19.39 [S.D. = 4.86]), viewed more severe sanctions as appropriate (M = 26.25 [S.D. = 4.53] vs. M = 24.44 [S.D. = 5.45]), and estimated the recidivism risk as higher (M = 16.74 [S.D. = 3.37] vs. M = 16.20 [S.D. = 3.46]) in the child vignette compared to the adult vignette, with t-values, respectively, of 6.61, 11.92, and 5.30, all significant at p < .001. This suggests we are correct to assume the child vignette is more likely to elicit the stereotype of a sexual offender in respondents. However, these results do not account for the influence of the main variable of interest (vignette order), and they do not correct for the increase in the likelihood of a type 1 error resulting from conducting multiple t-test comparisons.
Second, as part of our main analysis, we conducted MANCOVA, in which we compared the two groups of vignette order on their mean levels of social distancing, appropriateness of sanctions, and estimate of recidivism risk in response to the adult-victim vignette while also accounting for vignette dimensions. The MANCOVA found a significant multivariate effect for vignette order—Wilks’ λ = 0.99, F(4, 954) = 4.10, p = .007, partial η2 = .013—and two of the three vignette dimensions, offender sex—Wilks’ λ = 0.97, F(4, 954) = 10.25, p < .001, partial η2 = .031—and offender priors—Wilks’ λ = 0.89, F(4, 954) = 40.91, p < .001, partial η2 = .114, but not for victim sex—Wilks’ λ = 0.99, F(4, 954) = 2.01, p = .11, partial η2 = .006. Univariate tests are presented in Table 2.
Tests of Dependent Variables across Vignette Order and Significant Covariates
Note: MANCOVA = multivariate analysis of covariance.
With regard to our independent variable, results show small, significant differences in each DV based on vignette order—social distancing: F(1, 954) = 8.00, p = .005, partial η2 = .01; appropriateness of sanction: F(1, 954) = 10.21, p = .001, partial η2 = .01; estimates of recidivism risk: F(1, 954) = 7.16, p = .008, partial η2 = .01. The means were consistently larger in the group who answered the adult-victim vignette without having previously rated the child-victim vignette, in contradiction to our hypothesis. We followed up with pairwise comparisons to examine the two groups of vignette order (i.e., child vignette first and adult vignette first) for each DV. Small effect-size mean differences were detected between the two groups across all three DVs (social distancing: mean difference = 0.85, S.E. = 0.30, p = .005, confidence interval [CI] = 0.26–1.44, Cohen’s d = 0.21; appropriateness of sanctions: mean difference = 1.09, S.E. = 0.34, p < .001, CI = 0.42–1.76, Cohen’s d = 0.25; estimates of recidivism risk: mean difference = 0.56, S.E. = 0.21, p = .008, CI = 0.15–0.97, Cohen’s d = 0.19).
With regard to covariates, the univariate tests demonstrated that vignettes involving a male abuser received significantly higher ratings on all DVs—social distancing: F(1, 954) = 27.45, p < .001, partial η2 = .03; appropriateness of sanction: F(1, 954) = 16.14, p < .001, partial η2 = .02; estimates of recidivism risk: F(1, 954) = 17.11, p < .001, partial η2 = .02, as did vignettes involving a recidivist abuser—social distancing: F(1, 954) = 50.03, p < .001, partial η2 = .05; appropriateness of sanction: F(1, 954) = 32.02, p < .001, partial η2 = .03; estimates of recidivism risk: F(1, 954) = 118.38, p < .001, partial η2 = .11.
Third, to further explore these results, we ran a second MANCOVA examining mean differences between scenarios with different characteristics, one of which corresponded even more closely to the stereotype of a sexual offender. We examined the DVs’ mean ratings on the adult vignette for respondents who first read a scenario involving a recidivist child abuser and compared them to the group of vignettes without the combination of these two characteristics. The patterns of results are similar to those obtained in the first MANCOVA. Specifically, we found a significant multivariate effect for vignette order—Wilks’ λ = 0.99, F(3, 954) = 3.54, p = .01, partial η2 = .01—and the same two vignette dimensions—offender sex: Wilks’ λ = 0.97, F(3, 954) = 9.79, p < .001, partial η2 = .03; victim sex: Wilks’ λ = 0.99, F(3, 954) = 2.22, p = .08, partial η2 = .007; offender priors: Wilks’ λ = 0.89, F(3, 954) = 41.41, p < .001, partial η2 = .12. An examination of univariate tests (presented in Table 2) indicates that respondents exposed to the most stereotypical child vignette first consistently rated the adult vignette significantly lower on all three DVs—social distancing: F(1, 954) = 4.63, p = .03, partial η2 = .01; appropriateness of sanction: F(1, 954) = 10.36, p = .001, partial η2 = .01; estimates of recidivism risk: F(1, 954) = 4.65, p = .03, partial η2 = .01. We then used pairwise comparisons to examine the two groups of vignette order (i.e., stereotypical child vignette first and adult vignette first) for each DV. Differences in means between the two groups indicated significant differences across all three DVs but had small effect sizes (social distancing: mean difference = 0.75, S.E. = 0.35, p = .032, CI = 0.07–1.44, Cohen’s d = 0.19; appropriateness of sanctions: mean difference = 1.27, S.E. = 0.40, p < .001, CI = 0.50–2.05, Cohen’s d = 0.25; estimates of recidivism risk: mean difference = 0.52, S.E. = 0.24, p = .031, CI = 0.05–1.00, Cohen’s d = 0.20).
The previous observed patterns were also confirmed for covariates, in which DVs received higher ratings in vignettes describing male abusers—social distancing: F(1, 954) = 27.45, p < .001, partial η2 = .03; appropriateness of sanction: F(1, 954) = 16.14, p < .001, partial η2 = .02; estimates of recidivism risk: F(1, 954) = 17.11, p < .001, partial η2 = .02—and recidivist abusers—social distancing: F(1, 954) = 50.03, p < .001, partial η2 = .05; appropriateness of sanction: F(1, 954) = 32.02, p < .001, partial η2 = .03; estimates of recidivism risk: F(1, 954) = 118.38, p < .001, partial η2 = .11. These findings confirm that our initial hypothesis is not supported by these analyses, yet they also convincingly demonstrate that the order in which vignettes are presented matters.
Discussion
Summary of Findings
We investigated the effect of vignette order in a factorial vignette survey conducted to examine public opinion about sexual offenders. We wanted to examine the effect of activating a stronger “sex offender” stereotype in the first vignette scenario. Considering previous findings of an assimilation effect after a stereotype is activated, we hypothesized we would observe a carryover effect: we expected respondents first exposed to the sexual abuse of a child would rate the adult vignette more harshly, considering the possibly salient and enduring effect of a story describing child sexual abuse. Our results do not support our hypothesis; rather, they indicate the opposite effect. Specifically, our results show a backfire effect, in which respondents compare the latter scenario to the former. That is, respondents who first read about the sexual abuse of a child were then less severe in their ratings of perpetrators of sexual violence against adult victims. We found similar effects in ratings relative to appropriateness of sanctions and perceptions of risk: if respondents first read about abuse against children, then for perpetrators with adult victims, they rated punishment as needing to be less severe and the risk to be less prevalent. We found a similar pattern of results when examining ratings of a group that corresponded even more closely to the stereotype of a sex offender (i.e., a child abuser with a prior record).
Implications
Substantively, the current study points to important differences between the influence of stereotypes in factorial vignettes and survey questions. Our results do not appear to align with Sauer and colleagues’ (2020) study that examined bias arising through ordering of vignettes. Contrary to the current results, Sauer and colleagues did not find bias due to ordering of vignettes. We propose this might be due to the fact that the extreme cases they experimentally manipulated (overpayment or underpayment) do not provoke the same type of emotional reaction and appraisal in a factorial survey research participant as does a sexual abuse scenario. In addition, our results also contradict the literature on the assimilation effect of stereotypes in survey questions and subsequent behaviors. We found the opposite (a contrast effect), that vignette scenarios evoke and activate stereotypes in a way that is fundamentally different from the survey questions previously studied. We posit that the difference is due to the fact that factorial vignettes, by definition, present a specific story; in the present analysis, respondents had to react to a singular scenario of sexual abuse. The dimensions of each scenario combined to elicit the stereotype of a sex offender to different degrees, but they were nonetheless restricted to an individualized case describing the actions of a specific perpetrator. On the other hand, previous studies on the effect of stereotypes on survey answers have utilized general terminology and concepts to activate these stereotypes. For example, the general idea of “affirmative action” was evoked in Sniderman and Piazza’s (1993) study, and the concepts of “gays” and “lesbians” were conjured in Herek and Capitanio’s (1999) study. Here, it is not the idea of a general “sex offender” that was elicited, but the case of a specific sexual abuser. This is why our results align more closely to studies that found a backfire effect when specific questions are used first, as opposed to a carryover effect when general questions are rated first (Strack et al. 1988).
Methodologically, these findings confirm the importance of recording and statistically modeling for contextual effects in survey research, and specifically in factorial vignettes. They provide nuance to the current recommended practice of randomization in factorial vignettes, in which the first vignette a respondent sees is randomly assigned, without modeling potential effects of the characteristics of this first scenario on ratings of subsequent vignettes. The goal of randomization is to prevent ordering effects by balancing levels, but our results caution against this approach in cases where some level dimensions can activate a stereotype more strongly than others. The results demonstrate that in such cases, the magnitude of the effects is not the same across respondents and the goal of randomization is not reached. Although effect sizes of mean differences were small, they were consistently found. These contextual effects are problematic in interpretation of respondents’ ratings, as the results contain not only the effects of the vignette dimensions, but also the ordering effects. Randomization might still be a good general practice for administering vignettes, especially if the order of randomization is recorded and its effect statistically modeled, and we do not know the effects of potential ordering on nonrandomized vignette designs. Still, our results, although they ran counter to our substantive hypotheses, were robust and based on a representative sample of adults within Washington State. Given this, we caution that researchers should, at a minimum, randomize and keep track of vignette order to adjust for order effects statistically when they model data based on vignettes.
Limitations
There are limitations to the present study. First, we used the term “sex offender” in the first part of the survey, before having respondents answer the factorial vignettes. This can be problematic, as a recent study demonstrates that the simple use of this terminology (compared to the expression “people who have committed crimes of a sexual nature”) is linked to harsher ratings and judgments by members of the public (Harris and Socia 2016). Second, both child-victim and adult-victim vignettes related scenarios of abuse occurring between an athletic coach as the perpetrator and a league participant as the victim. Sexual abuse in sports, specifically involving coaches (e.g., Sandusky at Penn State), has received a lot of media attention and might therefore elicit a specific type of bias from respondents regarding these perpetrators. Future factorial vignette research on sexual abuse should include more diversity in the scenarios.
Third, we do not report respondents’ demographic characteristics as covariates in the MANCOVA calculated, although previous studies have found they are significant predictors of punitive attitudes (Craun and Theriot 2008; Levenson et al. 2007; Socia and Harris 2016). We ran the analyses with demographic covariates and found some were significant, but we omitted their presentation here for simplicity of the model because they were not directly related to the research question and did not affect the results regarding vignette order and vignette dimensions. These results are available upon demand. Given these results and that the randomization ensured respondent characteristics were orthogonal to vignette factors, we believe our results demonstrate that vignette ordering can have direct effects on responses to scenarios and that randomization and use of control variables for each factor level are not a sufficient mechanism to account for factor effects. Still, in a study about the effect of order in vignette dimensions, Katrin Auspurg and Annette Jäckle (2015) show that these effects may be greater for some individuals, suggesting there may also be potential moderation or indirect effects. Unfortunately, we did not have access to the type of variables Auspurg and Jäckle identified as being linked to the magnitude of ordering effects. Future research should examine the degree to which attitude certainty regarding sexual offenders affects ordering effects and, more broadly, identify other types of characteristics and traits that might make a respondent more or less susceptible to ordering effects.
Fourth, the placement of vignettes in relation to the full survey was static in our research design (they were presented at the end in all cases). Given that some research demonstrates the importance of vignette placement (Hopkins and King 2010; Lau et al. 2015) and dimension placement (Auspurg and Jäckle 2015), future research should examine ordering and placement effects simultaneously. Finally, we conducted the current analyses using public opinion about sexual offenders, a particularly loathed group of individuals. The generalizability of the current findings to public opinion on other topics, specifically some that are less emotionally charged, remains to be established in future research.
Conclusion
Researchers use factorial vignette surveys as a rigorous method to disentangle the effects of concomitantly varying factors on various topical issues, but it is important to investigate and account for contextual effects, such as vignette order, when designing these surveys. The present results indicate that the current practice of randomizing the order of vignettes is not a fool-proof approach and might introduce unwanted bias by injecting variation into vignette ratings that is not only due to the factors of interest, but also, in part, due to the order in which the vignettes were presented. There is only limited research on the effects of vignette ordering, but our study and Auspurg and Jäckle (2015) both suggest the order in which vignette dimensions are presented might affect responses to vignettes. More research needs to be conducted, including experiments examining the magnitude of ordering effects for randomized and nonrandomized vignette presentations, but this topic clearly needs to be considered more carefully for future vignette studies. Our results suggest that randomizing vignette order and statistically modeling its effect might be the best approach. After all, it is only by concentrating on the influence of all important dimensions of an issue while eliminating irrelevant factors that researchers can develop a proper understanding and accurate estimates of public attitudes about the topics they study.
Footnotes
Acknowledgements
We thank the Washington State Office of Financial Management for generously granting access to the data analyzed in the present manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
