Abstract
Web surveys have expanded the set of options available to questionnaire designers. One new option is to make it possible to administer questions that respondents can answer by moving an on-screen slider to the position on a visual scale that best reflects their position on an underlying dimension. One attribute of sliders that is not well understood is how the position of the slider when the question is presented can affect responses—for better or worse. Yet the slider’s default position is under the control of the designer and can potentially be exploited to maximize the quality of the responses (e.g., positioning the slider by default at the midpoint on the assumption that this is unbiased). There are several studies in the methodology literature that compare data collected via sliders and other methods, but relatively little attention has been given to the issue of default slider values. The current article reports findings from four web survey experiments (n = 3,744, 490, 697, and 902) that examine whether and how the default values of the slider influence responses. For 101-point questions (e.g., feeling thermometers), when the slider default values are set to be 25, 50, 75, or 100, significantly more respondents choose those values as their answers which seems unlikely to accurately reflect respondents’ actual position on the underlying dimension. For 21- and 7-point scales, there is no significant or consistent impact of the default slider value on answers. The completion times are also similar across default values for questions with scales of this type. When sliders do not appear by default at any value, that is, the respondent must click or touch the scale to activate the slider, the missing data rate is low for 21- and 7-point scales but higher for the 101-point scales. Respondents’ evaluation of the survey difficulty and their satisfaction level with the survey do not differ by the default values. The implications and limitations of the findings are discussed.
Web surveys have opened up new opportunities for survey research that are not feasible in other, more traditional, data collection methods. One example is the extended set of question types that web surveys make available to questionnaire designers, compared to other modes. A case in point is the slider question. A slider question displays a line horizontally or vertically, and respondents register their answers by dragging a slider to the desired value with a pointing device. Slider questions are widely used in web surveys but of course cannot be implemented in paper questionnaires or interviewer administered modes. A related question type is visual analog scale (VAS) initially used in paper-and-pencil questionnaires (Hayes & Patterson, 1921), but more recently adapted for online administration. Instead of dragging a slider, web survey respondents answer a VAS question by directly clicking on a desired position which generally makes a mark on the line (response scale). In the web survey context, sliders and VAS generally look very similar to each other, and the terminologies are sometimes used interchangeably. In fact, in some major commercial web survey platforms, slider questions can be answered as if they were VAS questions, that is, by pointing and clicking rather than dragging and dropping. However, the existence of the slider bar represents a major difference between these two types of question. When designing a slider question, researchers should consider several criteria, one of which is the default placement of the slider. This is an important consideration because the default value of the slider bar can potentially affect respondents’ answers. In fact, Funke (2015) observed that when responding to a slider question, respondents were less likely to choose the default value (in the middle) than when responding to a radio button question, which could change the response distribution. However, in that study, the default value of the slider bar was not experimentally manipulated. To the extent that the different default positions different affect answers, designers should be privy to this information. However, to the best of our knowledge, no study has systematically examined the impact of default slider positions on survey responses.
This study reports findings from four web survey experiments designed to test the influence of the default values of slider bars on survey responses. Considering the similarity between slider and VAS questions in practice, we review the literature about both questions types. Then, we describe and present findings from the series of experiments we conducted to test the slider bar initial placement.
Several research projects have compared slider and/or VAS questions to other question types and found somewhat mixed evidence about how they affect on responses. Several studies showed no measurement advantages for slider/VAS compared to other types of questions, such as radio buttons, and some studies found that sliders led to reduced data quality. One early study compared a radio button question with a slider question (Cook, Heath, Thompson, & Thompson, 2001). Specifically, the radio button item implemented a 9-point scale, and the slider questions had three variations, including 5-point scale, 9-point scale, and 100-point scale. The authors showed that the reliability (Cronbach’s α) of the radio button scales was higher than all three slider questions. Among slider questions, the 100-point slider had the highest reliability. The line length of VAS was also experimentally compared (4 cm vs. 10 cm) and no significant difference was detected between the longer and the shorter VAS (Kreindler, Levitt, Woolridge, & Lumsden, 2003). Response validity was also compared between radio button and VAS questions and no substantial difference was found (Bayer & Thomas, 2004). Also, this study found that the completion time was longer for the VAS than the radio button condition. Stanley and Jenkins (2007) found similar completion times for standard slider scales and slider-type questions with graphical designs. However, slider-type graphic questions tended to have higher mean ratings than standard slider scales.
In the study by Funke, Reips, and Thomas (2011), the question format (radio button vs. slider) was crossed with the visual display orientation (vertical vs. horizontal). They found that the break off rate (percentage of respondents quit the survey before finishing it) for slider was significantly higher than radio button scales, for both orientations. Also, the discrepancy was larger for lower than higher educated respondents. In addition, the response time was longer for slider than radio button scales. As for the mean ratings, slider scales resulted in significantly higher means than radio button scales when displayed vertically for one of the two questions examined. A later study by Funke and Reips (2012) compared 5-point radio button versus continuous VAS with 250 gradations using semantic differential scales. They found similar item nonresponse, mean ratings, interitem correlations, and response times between the two question types. However, VAS respondents were more likely to edit their answers than radio button respondents, and the authors interpreted this to be an indication of higher answer precision with VAS than 5-point radio button scales. More recently, Funke (2015) compared three types of questions (VAS, radio button, and sliders) with three scale lengths (3-point, 5-point, and 7-point) in a 3 × 3 factorial design. The study showed that the break off rates, time to complete the entire questionnaire, and item nonresponse were higher for the slider scales than the ratio buttons. The slider scales performed especially poorly on mobile devices with touch screens. The VAS and radio buttons led to similar break off rates. The mean ratings were similar across all three question types.
Several studies also tested different designs of the VAS. For example, Couper, Tourangeau, Conrad, and Singer (2006) tested two variations of VAS—with or without numeric feedback (the value was dynamically displayed as respondents moved the slider)—and compared them to radio buttons and open numeric input using a text box. Overall, the VAS was associated with higher break-off rates and item nonresponse rates than the radio button and text box conditions. Between the two VAS conditions, the one with numeric feedback was associated with more rounded answers (numeric answers that were divisible by 5) than the one without feedback (where rounded numerical answers generally reflect less precise thinking). Liu and Conrad (2016) expanded on the Couper et al. (2006) study by examining four types of VAS: feedback and numeric scale labels, feedback without numeric labels, no feedback and numeric labels, and no feedback and no numeric labels. They also tested a text box and a dropdown menu in two other conditions. The VAS with numeric labels produced higher levels of rounded answers than the other VAS conditions, although overall VAS questions reduced the frequency of rounded answers relative to the text box condition. Also, respondents answered the VAS question more quickly than the text box question.
Slider/VAS techniques have been used widely in clinical and public health research to measure attributes such as self-reported pain (Hjermstad et al., 2011). In general, these questions display a line with the end points labeled. The patients need to draw a tick mark on the line or drag a slider bar to indicate their level of pain (Bijur, Silver, & Gallagher, 2001; Gallagher, Liebman, & Bijur, 2001; Jensen, 2003, p. 200). Other types of pain scales, including verbal rating scales and numeric rating scales, were compared to the VAS/slider pain scale, and studies found similar reliability and validity across the different scale types for measuring pain (Ferreira-Valente, Pais-Ribeiro, & Jensen, 2011; Lara-Muñoz, de Leon, Feinstein, Puente, & Wells, 2004).
In the four experiments reported here, we compare different default slider values to each other and to a scale without a slider value as well as text input. If respondents do not click and move the slider bar for scales with a default value, their answers will be registered as the default value. Hence, we would observe a higher percentage of answers to be the same as the default values, which would change the distribution of answers. However, if the respondents interpret the initial value to be only a starting point rather than one of the many valid answers, as suggested by Funke (2015), we would expect fewer answers to match the initial value of the scale. In addition, this study examines how different ways of presenting the initial value in VAS questions affect the response mean. If the default value is also the final value at higher than chance frequencies, the overall mean estimates will likely be shifted toward the initial value. The time it takes to complete the questionnaire will also be calculated as an indicator for task difficulty. We interpret a longer time is as an indication of greater difficulty completing the survey. It can also be argued that longer response times indicate a more careful and conscientious response process and therefore the time should not be solely taken as evidence of difficulty. Given that, the survey also asked respondents upon completing the questionnaire to rate their subjective difficulty and satisfaction with the survey task.
Slider question designers need to decide where to initially position the slider. Even providing no default value should be an active decision, based on empirical findings and theoretical considerations. Thus, the findings of this study should provide useful guidance for designers and should help reduce measurement bias for slider questions.
Experiment 1: 101-Point Scale
The experiment and measures
This survey experiment was conducted using FluidSurveys, an online survey platform owned by SurveyMonkey. The platform routes respondents to a Thank You page after completing a SurveyMonkey questionnaire; we recruited them from the Thank You page. 1 The survey included 18 questions total. Among them, nine questions were for the experiment. In total, 4,477 participants started this survey and 3,477 completed the survey. When assuming a 0.15 effect size (small), both analysis of variance (ANOVA) and χ2 tests have 0.99 power. This analysis was conducted on just the completed cases. It was conducted between December 18, 2015, and January 4, 2016, in the United States. No incentive was provided.
There were seven experimental conditions involving nine questions formatted as 101-point feeling thermometers, with 0 indicating very cold and unfavorable feelings and 100 indicating very warm and favorable feelings. The conditions consisted of six slider scales with different default values and one text box with no default value (see Appendix A for screenshots of sample questions). For the slider scale conditions, the default values were 0, 25, 50, 75, and 100; in one slider condition, there was no default value, requiring the respondent to click on the response scale to activate the slider. In the text box condition, respondents were instructed to type in their answers in a numeric format; the content of what respondents entered was not automatically checked. All slider bar conditions provided dynamic numeric feedback, that is, the position of the slider (0–100) was displayed and updated continuously as respondents dragged and dropped the slider.
Respondents in the six slider bar conditions received the following instruction: We would like to get your feelings toward different groups of people or organizations using something we call the feeling thermometer. For each group or organization, please give us a number from 0 to 100 where 0 represents very cold or unfavorable feeling and 100 represents very warm or favorable feeling. If you didn’t feel particularly warm or cold toward a group or organization, you would rate it at the 50-degree mark. We would like to get your feelings toward different groups of people or organizations using something we call the feeling thermometer. For each group or organization, please give us a number from 0 to 100 where 0 represents very cold or unfavorable feeling and 100 represents very warm or favorable feeling. If you didn’t feel particularly warm or cold toward a group or organization, you would rate it at the 50-degree mark.
The analysis has five components. First, the mean responses were compared across the seven conditions. Second, the time it took to complete the survey was also compared across the seven conditions. Third, the percentage of answers that were the same as the initial values for the slider bars—in particular, the percentage of answers that were equal to 0, 25, 50, 75, and 100—was calculated for each condition. Fourth, the percentage of item missing data was calculated for each of the seven conditions. Lastly, the respondent’s subjective rating of their experience completing the survey was compared across the different conditions. More specifically, at the end of the survey, respondents were asked to rate the difficulty of the survey (very easy, somewhat easy, somewhat difficult, and very difficult) and their satisfaction with the survey (very satisfied, somewhat satisfied, somewhat dissatisfied, and very difficult), both using a 4-point scale. All the analyses were conducted using R by the first author.
Results
Table 1 presents the sample means for the substantive responses by experimental conditions for the nine variables. 2 As the results show, the means were similar across conditions, and the differences were not statistically significant. The effect size (η2) for the ANOVA test is less than .002 for all items. We also calculated the response means by treating default values as item nonresponse for all questions and conditions (Appendix F). As the table shows, the response means do not significantly differ across conditions.
Response means and time to complete by experimental condition (101-point, Experiment 1).
The time to complete the questionnaire was also compared across the seven conditions (Table 1). Overall, the time to complete differed significantly by conditions (χ2 = 41.85, p < .01). Pairwise Bonferroni comparisons showed that the time to complete the text box was significantly longer than completion time in the slider bar conditions. This suggests that it is probably more difficult for the respondents in the text box condition to generate an answer without the context provided by the numerical scale anchors and as feedback when the slider is moved. The difference among the slider bar conditions was not significant.
The key research question this study was designed to answer was whether the default values of slider questions change the response distribution. Table 2 shows the mean number of responses that are the same as the default value for each condition. Let’s use default 0 as an example to demonstrate how we calculate this number. We first dichotomized each answer to be 0 or not 0. Then, we summed up the total number of answers to the nine questions that were 0 for each respondent. Last, we calculated the average number of 0 response under each condition. As Table 2 shows, the mean numbers of 0 responses are similar across the seven experimental conditions, and the difference is not statistically significant (p = .5). When the default value was 25, the mean responses differed significantly between conditions (p < .001) and the pairwise Bonferroni comparison indicated that the difference was driven by the Default 25 condition, in which substantially more endorsement respondents answered 25 than in other conditions. Similarly, when the default values for the slider questions were 50, 75, or 100, significantly more people “chose” those values as their responses than when the slider bars were defaulted to other values or without default value, as the pairwise Bonferroni comparison showed. This is evident from the larger values in the main diagonal of the table, in which the default and selected values are the same, than in the other cells in each row. Interestingly, for the text box condition, significantly more respondents provided 50 as their answer. This is probably due to response rounding, as previously demonstrated by Liu and Conrad (2016). Missing data were only possible for the slider bar without a default value and the text box conditions. Overall, the missing data rates were very low and they were similar between the two conditions (p = .23). The effect size (η2) for the ANOVAs is less than .05 for all items.
Mean numbers of answers match the default values by experimental conditions (101-point, Experiment 1).
Experiment 2: 7-Point Scales
The experiment and measures
In the second experiment, we ask the same questions as in Experiment 1 but with 7-point response scales. The survey was conducted using FluidSurveys, as in Experiment 1. The samples were selected from workers in Amazon Mechanical Turk (MTurk), an online crowdsourcing labor market for tasks such as surveys. We posted the survey request on the MTurk website between July 18 and 21, 2016, in the United States, with a US$0.50 incentive upon survey completion. In total, 505 respondents started the survey and 490 completed it. Respondents were randomly assigned to one of the five experimental conditions. The five conditions were no default value, default values at 0, 4, 7, and text box question. In this case, the three default values for the slider bars were the end points and the midpoint of the scale (see screenshots in Appendix C).
The analysis was similar to Experiment 1. The response means were calculated first and compared across conditions to test whether there was an overall difference in response distribution. Second, the mean numbers of responses that were the same as the default values were calculated and compared across conditions.
Results
Table 3 presents the response means for each question by the experimental condition. It is clear that the response means were similar across conditions and none of them were statistically significant. When the default values were coded as item nonresponse (Appendix G), the results were similar in that the means were not significantly different across conditions. This is similar to the findings in Experiment 1. The effect size (η2) for the ANOVA test is less than .006 for all items. The time to complete was also compared across the five conditions. Overall, it was a very short survey and it took the respondents on average less than 1 min to complete. The text box condition took the longest to complete, which was expected as respondents had to type in their answers. However, the difference between the text box and slider conditions was small and not significant (p = .13).
Response means and time to complete by experimental condition (7-point, Experiment 2)
Similar to Experiment 1, in this experiment, we also compared the mean numbers of answers that were the same as the default values by experimental conditions (Table 4). Different from the 101-point scale in Experiment 1, in this experiment, the default value did not really have much impact on the response selection. The likelihood of choosing 1, 4, or 7 was not higher for slider bars with default values set to be 1, 4, or 7 than for the other conditions. The difference for the Select 4 condition was marginally significant but interestingly, fewer respondents used 4 as their answers as compared to the other conditions. This is the opposite from the results in Experiment 1. For the no default value slider bar and text box conditions, the item missing rates were very low and not significantly different from each other (p = .85). The effect size (η2) for the ANOVA test is less than .006 for all items.
Mean numbers of answers match the default values by experimental conditions (7-point, Experiment 2).
Experiment 3: 21-Point Scales
The experiment and measures
Experiments 1 and 2 showed somewhat different results for default slider values. In Experiment 1, substantially more answers were the same as the default values, especially when we set the default values greater than zero (i.e., 25, 50, 75, or 100). For the 7-point scale in Experiment 2, the impact of the default value on the responses disappeared. One factor that could explain the difference is the length of the scale. Because the 101-point lines potentially allowed finer discrimination more nuanced interpretation than the 7-point scale used in Experiment 2, more people may have relied on the default values to make sense of the scale. The 7-point scale, on the other hand, is short and each response option relatively easy to interpret. To test this idea, we conducted a third experiment where we asked the same set of questions but with a 21-point scale, that is, in between the 101- and 7-point scales in length. The experiment was conducted using FluidSurveys (see screenshots in Appendix D). The sample was recruited from MTurk between July 22 and 24, 2016, in the United States with a US$0.50 cents incentive paid upon completion. Respondents were again randomly assigned to the experimental conditions: slider bars with no default value or default values at 0, 5, 10, 15, and 20 and a text box condition. In total, 697 respondents completed the survey. When assuming a 0.15 effect size (small), both ANOVA and χ2 tests have 0.86 power.
Results
As in the previous two experiments, we started by examining the response means across the experimental conditions (Table 5). We found that none of the nine questions showed significant difference between the means across the seven experimental conditions. The effect size (η2) for the ANOVA test is less than .009 for all items. When treating all the responses that were equal to the default values as item nonresponse (Appendix H), the response means were not significantly different across conditions. As for the time to complete, it took about 1 min to complete the survey and the difference across conditions was small and not significant (p = .36).
Response means and time to complete by experimental condition (21-point, Experiment 3).
The default values for this experiment were 0, 5, 10, 15, and 20. We calculated the mean number of questions that had an answer that was the same as the default value by the experimental conditions (Table 6). Interestingly, respondents in the text box condition were more likely to select 5 (p < .001) as their answers than all the other conditions. The Bonferroni pairwise comparison showed that 5 was significantly more common in the text box than all the slider conditions, and among the slider conditions, there was no significant difference. Similarly, text box condition elicited significantly more 10 s as responses (p < .001) than the slider bar questions, except for the default zero condition. When the default value of the slider bar was 10, fewer respondents actually chose it as their answer. Responses of 0, 15, and 20 were not selected at different rates across the experimental conditions. The missing data for the no default value slider bar and text box conditions were also very low and similar (p = .94). The effect size (η2) for the ANOVAs was less than .01 for all items.
Mean numbers of answers match the default values by experimental conditions (21-point, Experiment 3).
Experiment 4: No Default Value With Three Starting Positions
The experiment and measures
The three experiments above showed that the default value could have an impact on the answers respondents provided. The results suggested that at least for the 101-point scale, the default value biased responses for all nonzero defaults. Thus, no default value condition might be a wise—and less biasing—choice than a default value. Moreover, across the previous three experiments, the slider without a default value led to (1) overall response means that were similar to those in the slider conditions and (2) reasonably low rates of missing data. One limitation of the previous three experiments is that when respondents skipped a question, their answers would still be registered as the default value. In this experiment, we tested disentangled starting position from no default value by eliminating feedback about the initial slider position, allowing us to position the slider on the scale without explicitly communicating this position to respondents.
This experiment was conducted using the SurveyMonkey platform and the samples were selected from SurveyMonkey Audience in the United States, a nonprobability online panel owned by SurveyMonkey. For each complete survey, US$0.50 was donated to a charity by the company. This experiment used a 3 × 3 design with first factor being the scale length (7-, 21-, and 101-point scales) and second factor being the starting position of the slider bar (left, middle, and right). The default position of the slider bar represented a valid answer. For example, for the 7-point scale, the middle was 4. However, different from the previous experiments, if respondents did not take any action on the slider, no answer would be registered, that is, a missing data point. Also, different from the previous experiments, in this experiment, no default value showed up on the slider before it was moved. Experiment 4 also expanded the previous three experiments by testing all three scale length in one experiment setting. In total, 902 respondents completed the survey. The focus of this experiment was to test whether the item nonresponse rate is higher for some start positions than others. When assuming a 0.2 effect size (small), both ANOVA test has a power of 0.88 and χ2 test has a power of 0.89.
Results
We started by examining the mean responses for each question by experimental condition (Table 7). For the 101-point scale, the 21-point scale, and the 7-point scale, none of the nine questions showed a significant difference in mean. The effect size (η2) for the ANOVA test is less than .008 for all items of 101-point scale and less than 0.02 for all items of 21- and 7-point scales. As for the median time to complete, within each scale length, the difference across the starting position was small and not significant.
Mean numbers of missing data and time to complete by experimental conditions (Experiment 4).
The mean number of missing observations was calculated for each scale length and slider bar starting position separately (Table 8). Overall, missing data were quite rare and generally unaffected by the starting position of the slider. For the 101-point scale, the missing data rate was slightly higher when the slider bar started at the center, but for the 21-point scale, the missing data were higher when the initial slider position was on the left. For the 7-point scale, the rate of missing data was more evenly distributed across the three starting positions. The effect size (η2) for the ANOVA test is less than .03 for all items. Our general impression is that the starting position of the slider does not systematically affect the response means across scale lengths.
Mean number of missing observations by experimental condition (Experiment 4).
Respondent’s Evaluation
At the end of each survey, we asked the respondents to evaluate their experience participating in the survey by answering two questions. The first question asked how easy or difficult it was to complete the survey on a 4-point scale (very easy, somewhat easy, somewhat difficult, and very difficult). The second question asked the respondents how satisfied or dissatisfied they were overall with the survey also using a 4-point scale (very satisfied, somewhat satisfied, somewhat dissatisfied, and very dissatisfied). The results are presented in Table 9. For Experiment 1, the two evaluation measures were similar across the seven conditions: easiness of the survey, χ2(18, N = 3,405) = 28.15, p = .06, and satisfaction with the survey, χ2(18, N = 3,366) = 21.11, p = .27. The majority of the respondents found the survey to be very easy or somewhat easy, and they were very satisfied or somewhat satisfied with the survey. Similarly, for Experiment 2, the respondents’ evaluations of the survey difficulty, χ2(18, N = 687) = 19.09, p = .39, and satisfaction level, χ2(18, N = 691) = 12.34, p = .83, were similar across conditions. For the 7-point scale, although the satisfaction levels were similar, χ2(12, N = 486) = 11.72, p = .47, the percentages of respondents who found the survey to be very easy were higher for the sliders with default values of 1 and 4 than for the other default positions, χ2(8, N = 485) = 17.03, p = .03. However, the absolute difference was small and the majority of the respondents still found the survey to be easy in all conditions. For Experiment 4, the evaluation of difficulty and satisfaction levels was similar for the three starting positions for the 101-point scale, difficulty: χ2(8, N = 256) = 10.44, p = .11; satisfaction: χ2(6, N = 255) = 3.17, p = .79; 21-point scale (difficulty: χ2(8, N = 309) = 6.92, p = .33; χ2(6, N = 307) = 9.23, p = .16; and 7-point scale difficulty: χ2(8, N = 332) = 2.39, p = .88; satisfaction: χ2 (6, N = 330) = 4.99, p = .54.
Respondent’s evaluation of the survey by experimental conditions (Experiments 1, 2, 3 and 4.).
Discussion
Through four survey experiments, this study systematically and comprehensively examined the impact of the default value and starting position of slider on responses. Overall, the findings showed that the default value of the slider question mattered under some conditions but not under others. In particular, for the slider questions with 101 response options, when the slider default values were set to be 25, 50, 75, or 100, significantly biased responses, that is, more respondents, chose the default values as their answers than other values. Also, when the question was asked in a text box format, more respondents provided 50 as their answer. This is possibly because respondents replied with the most noncommittal answer possible and this is a kind of satisficing response. As Funke (2015) suggested, respondents could use the initial position of the slider as an anchoring point which might lead to biased estimates. On the other hand, when respondents did not take any action but skipped the question altogether, the answer would be recorded as the default value. Either way, it would lead to a potentially biased survey result. This suggests that the measurement impact of slider default values also depends on the analytical approach. Although the default values have limited impact on mean scores, if researchers analyze the data as categorical questions, the results are likely to be different depending on what the default values are. This means that although the default values altered the response selection, it was not strong enough to change the mean distribution.
For the 7-point scale, however, the effect of the default values completely went away. Respondents were neither more likely nor less likely to choose the default values. Also, the mean distributions were similar across different default values for the slider questions. Given the shortness of the scale and its widespread use in surveys, respondents might find it easy to interpret each single response option and choose one that effectively represents their opinions without relying on the position/default value of the slider. Unlike the 101-point scale, for which default values may simplify a potentially overwhelming scale, there is no such attraction to using default values when formulating responses on the 7-point scale.
When examining the impact of the default values on the 21-point scale, yet a different pattern emerged. Although we did not see a clear pattern when the default values were 0, 15, and 20, when default values were 5 and 10, the respondents in the text box condition were more likely to provide those numbers as their answers. Respondents in the default 10 condition were actually less likely to pick 10 as their answer, compared to the other conditions. This finding is similar to what Funke (2015) found in his study. The explanation Funke proposed is also applicable here: Respondents are probably not treating the default value as a valid answer option and hence they are less likely to select it as their answer. However, what this explanation cannot address is why only the default 10 condition showed such a pattern while the other default values did not. Additional research is definitely necessary to solve this puzzle.
Missing data are another indicator investigated here. For the sliders without a default value, missing data are possible. In Experiments 1, 2, and 3, the rate of missing data for the slider without default values was low and comparable to the text box condition. It is also interesting to notice that the missing rates for the 101-point scales are substantially higher than the missing rates for the 21- and 7-point scales (see Tables 2, 4, and 6). This may suggest that the respondents find the longer slider scales to be more difficult than the shorter ones. In Experiment 4, the rate of missing data for the three slider starting positions (left, center, and right) was also low and reasonably similar across starting positions. Given this, using a slider question without a default value but with the slider positioned on the bar, similar to the slider questions in Appendix E, is a promising approach.
Time to complete is an additional measure analyzed in all of the experiments. The median time to complete was similar across conditions within each experiment. This is probably because the experiments were not very long. In all experiments, there were less than 20 questions and at most four webpages. Should one conduct a longer survey, the completion time may differ across scales.
Lastly, respondents’ evaluation of the survey difficulty and their satisfaction level with the survey were collected and analyzed. The majority of the participants were satisfied with the survey and found it to be easy to complete. No significant differences were found across the conditions in each of the experiments.
Given the overall findings across the four experiments, we think the slider bar without default a value is best choice because (1) it does not distort the answers, (2) the rate of missing data is low, (3) the mean distribution is similar to the other questions, (4) the time to complete is no longer than the other questions, and (5) the respondent’s satisfaction level is similar to the slider questions with default values. Although there are some differences in terms of missing data for questions when the slider was initially positioned on the left, right, or center, given the overall low rate of missing data, we do not consider missing data to be a major concern. In addition, in the conditions where the slider bars have default values, we are not able to distinguish between respondents who purposely selected the default values from those who skipped the question. Skipping a question or item nonresponse is often valuable information for both methodological and substantive research.
The two survey platforms used in this study allow designers to present respondents with slider questions that can be answered in the same manner as VAS, that is, by pointing and clicking. Dragging and dropping is not required for slider questions. Considering this and the extra layer of complexity of the slider position, there is no obvious benefit of using slider questions over VAS. However, this study did not compare the different slider positions versus the VAS. We believe this is an important research topic for future research. There are a few other aspects that this study was not able to address. First, one other slider starting position worth testing is to start from outside the slider bar. This is distinctly different from any of the conditions we explored in this study and may result in a different finding. Second, the way people interact with the variety of slider questions was not addressed in this study. For example, for those whose answers which were the same as the default value, we cannot know whether those were respondents’ actual answers or whether they skipped the questions. Paradata such as time to complete can be used as a proxy. Other research techniques such as eye tracking and mouse movement can also be used to collect more direct and quantitative data. Third, the numeric feedback can affect the way slider questions are answered (Liu & Conrad, 2016) and so conditions without numeric feedback may show a different pattern. Fourth, other research has shown that slider questions perform differently on personal computer than mobile devices (Buskirk & Andrus, 2014; Buskirk, Saunders, & Michaud, 2015). It would be worthwhile for future research to cross-examine how the type of slider/VAS question might interact with the devices that respondents use to complete the survey. Fifth, the survey topic and population is limited. Future research should replicate it with other questions and populations. Survey practitioners and researchers have many choices when it comes to web survey designs. It is important for them to choose particular question types and design them in a way that have been shown to improve data quality and refrain from using features that do not. After all, the ultimate goal of a survey is to gather high quality and less biased information about the population.
Footnotes
Appendix A
Appendix B
Appendix C
Appendix D
Appendix E
Appendix F
Response Means by Experimental Condition (101-Point, Experiment 1).
| Means | No Default Value | Default 0 | Default 25 | Default 50 | Default 75 | Default 100 | Text Box | F | p |
|---|---|---|---|---|---|---|---|---|---|
| Small business | 41.0 | 46.5 | 38.2 | 39.2 | 43.5 | 43.8 | 43.9 | 1.68 | .20 |
| Labor unions | 43.0 | 44.8 | 41.2 | 39.4 | 40.7 | 41.7 | 46.8 | 1.70 | .19 |
| Military | 73.2 | 74.6 | 74.5 | 74.9 | 75.1 | 67.6 | 73.2 | 0.00 | .98 |
| Dem. Party | 45.7 | 49.7 | 41.8 | 41.7 | 42.7 | 37.2 | 44.1 | 1.53 | .22 |
| Rep. Party | 33.7 | 41.0 | 34.0 | 32.6 | 34.6 | 34.6 | 33.7 | 0.05 | .82 |
| Federal government | 32.3 | 34.4 | 28.5 | 26.1 | 31.3 | 30.6 | 30.7 | 0.00 | .99 |
| Supreme Court | 48.6 | 49.5 | 43.8 | 44.2 | 47.5 | 51.2 | 48.1 | 0.04 | .85 |
| Congress | 25.7 | 29.7 | 23.7 | 22.7 | 26.7 | 28.9 | 26.7 | 0.80 | .37 |
| Tea Party | 31.1 | 40.7 | 28.8 | 28.4 | 29.0 | 30.6 | 28.9 | 3.64 | .06 |
Note. Default values were coded as item nonresponse.
Appendix G
Response Means by Experimental Condition (7-Point, Experiment 2).
| Means | No Default Value | Default 1 | Default 4 | Default 7 | Text Box | F | p |
|---|---|---|---|---|---|---|---|
| Small business | 3.0 | 3.4 | 3.0 | 3.1 | 3.1 | 0.02 | .90 |
| Labor unions | 3.9 | 4.4 | 4.2 | 4.0 | 4.2 | 0.08 | .78 |
| Military | 4.8 | 4.6 | 4.7 | 4.2 | 4.6 | 1.87 | .17 |
| Dem. Party | 3.8 | 4.5 | 4.0 | 3.5 | 3.8 | 1.99 | .16 |
| Rep. Party | 3.0 | 3.6 | 2.8 | 2.7 | 2.8 | 3.72 | .05 |
| Federal government | 3.2 | 3.4 | 2.6 | 3.0 | 3.0 | 3.12 | .08 |
| Supreme Court | 4.2 | 4.5 | 4.0 | 4.1 | 4.2 | 1.12 | .29 |
| Congress | 2.9 | 3.3 | 2.5 | 2.8 | 2.8 | 2.10 | .15 |
| Tea Party | 2.9 | 2.1 | 2.4 | 2.7 | 2.5 | 0.18 | .67 |
Note. Default values were coded as item nonresponse.
Appendix H
Response Means by Experimental Condition (21-Point, Experiment 3).
| Means | No Default | Default 0 | Default 5 | Default 10 | Default 15 | Default 20 | Text Box | F | p |
|---|---|---|---|---|---|---|---|---|---|
| Small business | 5.9 | 6.8 | 5.8 | 6.2 | 6.5 | 6.6 | 7.2 | 3.49 | .06 |
| Labor unions | 10.5 | 10.9 | 10.5 | 10.8 | 10.4 | 9.4 | 10.0 | 2.83 | .09 |
| Military | 12.1 | 11.5 | 10.9 | 10.4 | 11.0 | 11.1 | 11.3 | 1.15 | .28 |
| Dem. Party | 10.7 | 10.4 | 9.0 | 11.0 | 9.4 | 10.2 | 10.0 | 0.49 | .48 |
| Rep. Party | 5.3 | 6.6 | 5.3 | 4.4 | 5.1 | 4.8 | 5.8 | 0.30 | .59 |
| Federal government | 7.4 | 7.4 | 6.1 | 7.4 | 6.4 | 7.3 | 7.5 | 0.07 | .79 |
| Supreme Court | 11.5 | 10.4 | 10.5 | 10.3 | 11.4 | 10.8 | 10.4 | 0.65 | .42 |
| Congress | 6.3 | 6.5 | 5.5 | 6.6 | 6.0 | 6.5 | 6.3 | 0.06 | .81 |
| Tea Party | 4.8 | 7.3 | 3.8 | 3.6 | 4.9 | 4.5 | 5.4 | 0.27 | .61 |
Note. Default values were coded as item nonresponse.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
