Abstract
Questions using agree/disagree (A/D) scales are ubiquitous in survey research because they save time and space on questionnaires through display in grids, but they have also been criticized for being prone to acquiescent reports. Alternatively, questions using self-description (SD) scales (asking respondents how well a statement describes them from Completely to Not at All) can also be presented in grids or with a common question stem, and by omitting the word agree, SD scales may reduce acquiescence. However, no research has examined how response patterns may differ across A/D and SD scales. In this article, we compare survey estimates, item nonresponse and nondifferentiation across these two types of scales in a mail survey. We find that SD scales outperform A/D scales for non-socially desirable questions that ask about positive topics. For questions that ask about negative topics, we find that estimates for SD items are significantly more negative than A/D items. This may occur because the SD scale is unipolar and has only one negative response option (Not at All), whereas the bipolar A/D scale has two negative response options (Disagree and Strongly Disagree). We recommend that researchers use SD scales for non-socially desirable positive valence questions.
Introduction
Survey and market researchers collecting data on attitudes and opinions have a variety of scales from which to choose. For example, to measure self- versus other-orientation, a researcher could ask respondents to report how much they agree or disagree with the statement, “Helping others is important to me.” Conversely, they could ask respondents to report how well the statement “Helping others is important to me” describes them.
While these two types of questions—agree/disagree (A/D) and self-description (SD)—have some features in common, they also differ in important ways that may affect the quality of the data reported and the circumstances under which each should be used. That is, understanding how a scale’s design features affect response behavior and data quality can guide a surveyor’s choice of scale (DeCastellarnau, 2018). Although there is a long history of empirical evaluation of A/D questions (e.g., Alwin et al., 2018; Berg & Rapaport, 1954; Lenski & Leggett, 1960; Saris et al., 2010), SD scales are relatively uninvestigated. Moreover, we know of no research that directly compares the two scale types in self-administered surveys.
Our research takes an important first step toward evaluating SD scales. We compare A/D and SD scales to understand how responses differ across the two scales when measuring the same construct. Using six positive valence and five negative valence items in a mail survey, we compare survey estimates (mean and standard deviation as in Christian & Dillman, 2004) and two undesirable response behaviors—item nonresponse and nondifferentiation (the most extreme form of which is straightlining)—across A/D and SD questions. Through our investigation, we seek to answer the following research questions:
Research Question 1 (RQ1): Do survey estimates (i.e., mean and standard deviation) differ between A/D and SD scales?
Research Question 2 (RQ2): Does item nonresponse differ between A/D and SD scales?
Research Question 3 (RQ3): Does nondifferentiation differ between A/D and SD scales?
Background of the study
A/D scales
Questions using A/D scales ask respondents to indicate the degree to which they agree or disagree with a particular statement (e.g., “Helping others is important to me”) (see Table 1 for an example). A/D scales are popular because they can easily measure most constructs (Saris et al., 2010), allow multiple items to be placed together with a common question stem, and can be designed to have the same number and labels of response options (Dillman et al., 2014). In self-administered modes, agree/disagree items are often presented in a grid, saving questionnaire space. In addition, as A/D scales have been used widely for many years, researchers often include them in surveys to maintain trend data (Holbrook, 2013).
Example of agree/disagree and self-description scales.
Despite their ubiquity, A/D scales have been criticized for having poor data quality. For example, A/D scales are prone to acquiescence bias because respondents are artificially more likely to agree with a statement than disagree (Berg & Rapaport, 1954; Holbrook et al., 2003; Krosnick & Presser, 2010; Lenski & Leggett, 1960; Narayan & Krosnick, 1996; Saris et al., 2010; Schuman & Presser, 1996), particularly with positive valence statements (i.e., when it would be socially acceptable to agree with an item) (Ross & Mirowsky, 1984). However, when it is not socially acceptable to agree with the premise of an item (i.e., negative valence items such as psychological distress), the acquiescence effect disappears; respondents instead endorse categories that present them in the best light.
A/D scales are also bipolar (Dillman et al., 2014), meaning that attitudes are measured across two dimensions (agreement and disagreement), with a zero point in the middle. This zero point can either be explicit (e.g., a Neither Agree nor Disagree option) or implicit (i.e., the point where response options change from agree to disagree). Bipolar scales have demonstrated questionable item reliability, potentially because respondents must concurrently evaluate the direction (e.g., Agree vs. Disagree), intensity (e.g., selecting Strongly or not), and neutrality (e.g., selecting the 0-point or not) of each assessment (Alwin et al., 2018).
SD scales
SD scales represent a potential alternative to A/D scales. Questions using SD scales ask respondents to indicate how much a particular statement describes them. For a 5-point SD scale, response options can include Completely, Mostly, Somewhat, A Little Bit, and Not at All (see Table 1). SD scales lack a “neutral” response option, but since they ask respondents to make a self-evaluation, a topic on which respondents are likely well-informed, respondents may be less likely to need a “neutral” category.
The design of SD scales makes them an attractive choice for collecting attitude and opinion data from respondents. As SD questions share a common stem and response options, like A/D questions, they can be presented efficiently as a set. However, unlike A/D scales, SD scales omit the word agree, reducing the likelihood of acquiescence, and they are unipolar, which may increase reliability (Alwin et al., 2018).
SD scales have been used in research on education (Pintrich et al., 1993), health and aging (Burgard et al., 2009), personality (Kruger & Gilovich, 2004), religion (Astin et al., 2011), and Need for Cognition/Need to Evaluate scales (Nir, 2011) and in major surveys including the General Social Survey (Smith et al., 2018), the Medical Expenditure Panel Survey (Agency for Healthcare Research and Quality, 2017), and the World Values Survey (Inglehart et al., 2014). Despite the possible benefits of SD scales and their ubiquity in research, to our knowledge no published research has compared SD and A/D questions in a mail survey. This article adds to the literature on A/D and SD scales by comparing mean values, standard deviations, item nonresponse, and nondifferentiation across the two scales in a mail survey.
Hypotheses
We start with the hypothesis that for positive valence items (top panel of Figure 1), SD questions will produce lower estimated means, indicating less positive answers, than A/D questions (H1a). 1 We expect this for two reasons. First, approximately 10% of respondents to A/D scales engage in acquiescence that biases responses in a positive direction (Krosnick & Presser, 2010). By excluding the word “agree,” SD scales reduce the chance for such acquiescence. Second, because of differences in scale polarity—the A/D items in our surveys have only two positive response options, (Strongly Agree and Agree) whereas the SD items have four (Completely, Mostly, Somewhat, and A Little Bit)—positive responses to the positive valence SD items are allowed to be spread out across the top four response options rather than concentrated in only the top two.

Positive and negative valence mail survey items with agree/disagree and self-description scales.
However, a competing hypothesis is that means (and by extension, rates of top-two-box reporting) for positive valence SD and A/D scales will not differ (H1b). The lower responses due to reduced acquiescence and the provision of four positive response options in the SD scale may be offset by increased positive responding due to social desirability, especially on more sensitive items. In particular, socially desirable reporting may increase if the SD scale feels more personal and self-evaluative than the A/D scale (i.e., the degree of closeness differs across the scales). For example, it may be more extreme for a respondent to say that the statement “Helping others is important to me” describes them Completely than it would be to Strongly Agree with the statement. That is, the wording of the response options themselves may catalyze a need to edit responses due to social desirability. In sum, we expect small or no differences between estimates from A/D and SD scales for positive valence items that are more socially desirable (H1b) and larger differences for items that are less socially desirable (H1a).
Acquiescence is less concerning for negative valence items (bottom panel of Figure 1) because it is not socially acceptable to agree with a negative construct (e.g., losing one’s temper) (Ross & Mirowsky, 1984). Thus, for these items, we hypothesize (H2a) no difference in means between SD and A/D scales because there will be no positive bias in the A/D items without acquiescent behavior. However, two additional features may affect responses. First, the increased personal closeness of SD items may invoke social desirability concerns, pushing responses to this scale toward the most negative response option (i.e., Not at All), especially for items where social desirability concerns are greatest. Second, because the SD scale only has one negative response option (Not at All) compared with the A/D scale’s two (Strongly Disagree and Disagree), all negative responses have to be concentrated in a single response option (i.e., a difference due to scale polarity), potentially exacerbating a negative shift in estimates. Thus, counter to H2a, we alternatively hypothesize (H2b) that the SD scale will have lower means than the A/D scale on negative valence items because responses will be more concentrated in the most negative response option (e.g., more bottom-box reporting).
Researchers strive to measure the variability that exists in a population for a particular construct, but question design can affect these estimates of variation. In a unipolar SD scale, four positive response options allow for capturing more variation for positive valence items compared with only two positive response options in the bipolar A/D scale. Thus, we hypothesize (H3) that responses to SD scales will have larger standard deviations than responses to A/D scales for positive valence items.
In contrast, respondents using an SD scale have only the zero point to distance themselves from a negative valence item compared with two response options for those using a bipolar A/D scale. Thus, we hypothesize (H4) that SD scales will have smaller standard deviations than A/D scales for negative valence items.
High item nonresponse commonly indicates low data quality (Groves, 1989). Question topic, structure (e.g., open vs. closed-ended), and complexity are associated with item nonresponse (Dillman et al., 2002). However, in this study, A/D and SD items are generally equivalent on each of these features. The same questions are asked to all respondents, and both scales use the same structure (i.e., they are both 5-point closed-ended scales). Thus, we hypothesize (H5) that there will be no differences in the rates of item nonresponse across SD and A/D scales for positive and negative valence items.
Several of the same features that we expect to influence means and standard deviations may also influence nondifferentiation, which is the tendency to select the same or nearly the same response option across a set of items when respondent burden is high and/or motivation is low (Krosnick, 1991). For example, in a set of positive valence items, the increased tendency for acquiescence, which should be constant over items, and the fact that there are only two positive response options in A/D scales will concentrate responses in the Strongly Agree and Agree response options, increasing nondifferentiation. Having more positive response options in SD scales will allow for more variation in positive responses across items, potentially reducing nondifferentiation. Moreover, while SD scales are less prone to acquiescence, they are perhaps more prone to social desirability, an effect that will vary over items of differing sensitivity. As a result, in a set of positive valence items of varying sensitivity, we expect less nondifferentiation (i.e., more differentiation) in SD than in A/D scales (H6). Scale polarity is expected to be the primary driver of nondifferentiation for negative valence questions because acquiescence is not expected. SD scale respondents have only one negative response option to reject the premise of a negative valence question, while A/D respondents have two. Therefore, we hypothesize (H7) that the SD scale will have more nondifferentiation than the A/D scale in a set of negative valence items. Our hypotheses are summarized in Table 2.
Summary of hypotheses.
SD: self-description; A/D: agree/disagree.
Method
Data
The data for this study come from the National Health, Wellbeing and Perspectives Study (NHWPS) survey. NHWPS was a 12-page questionnaire (including a cover page) with 77 questions fielded by the University of Nebraska–Lincoln’s Bureau of Sociological Research between April 10 and August 12, 2015. A total of 6,000 addresses were randomly sampled from the USPS Delivery Sequence File, and 1,002 respondents completed and returned the survey (American Association for Public Opinion Research [AAPOR] RR1 = 16.7%). The “next birthday” within-household selection method was used to randomly select the survey respondent.
NHWPS asked questions related to health, mental health, well-being, victimization, current events, and demographics. There were two versions of the questionnaire. In Version 1, two grid questions were presented using an A/D scale (response options: Strongly Agree, Agree, Neither Agree nor Disagree, Disagree, Strongly Disagree). In Version 2, the same two grid questions were presented with a SD scale (Completely, Mostly, Somewhat, A Little Bit, Not at All). The first grid question contained six items that asked about topics with a positive valence (e.g., “I consider myself a good person.”). The second contained five items that asked about topics with a negative valence (e.g., “I lose my temper pretty easily.”) (see Figure 1). A total of 3,000 addresses were randomly assigned to each questionnaire version (Version 1: n = 522, AAPOR RR1 = 17.4%; Version 2: n = 480, AAPOR RR1 = 16.0%).
Substantive responses are examined using the mean and standard deviation for each of the 11 survey items. Both scales are coded 1–5 with 1 assigned to Strongly Disagree/Not at All and 5 assigned to Strongly Agree/Completely. A top-two-box reporting variable is coded 1 if the respondent selected one of the highest two response options (i.e., answers of 4 or 5 on either scale), and a 0 otherwise. Similarly, a bottom-two-box reporting variable is coded 1 for answers using the lowest two response options (i.e., answers of 1 or 2 on either scale), and 0 otherwise. Finally, since the unipolar SD scale has only one negative response option (i.e., Not At All), we examine bottom-one-box reporting as a sensitivity analysis; this variable is coded 1 for answers using the lowest response option (i.e., answers of 1 on either scale), and 0 otherwise.
Nondifferentiation is operationalized in two ways. First, as what we call “set variance” and second, as straightlining. Set variance is calculated by first obtaining the standard deviation of each respondent’s answers (missing items are excluded) to each set of items within a grid (calculated using the rsd [
The main independent variable is an indicator of the scale version respondents used to complete the items with A/D coded 0 and SD coded 1.
We control for respondents’ age (grand mean centered) and level of education (using three dichotomous indicators: high school graduate or less, some college, or college graduate or more) (see Supplemental Appendix A for question wording). Missing, don’t know, and refusal answers for age (12%) and education (6%), were imputed to the mean (age) or modal category (education). The overall average age of respondents was 57.35 years, and did not significantly differ across the A/D respondents ( x̅ = 57.46 years) and the SD respondents ( x̅ = 57.24; t = 0.218, p = .828) (Table 3). The majority of respondents were at least college graduates (50.80%), and education also did not differ significantly across experimental versions of the survey (χ2= 2.565, p = .277).
Summary of respondent characteristics.
Note: There were no signficant differences in respondent age or education between the A/D and SD versions.
Finally, because some of our hypotheses hinge on social desirability of items, for interpretation purposes, we provide ratings of the social desirability of each item. Ten graduate students 2 were briefed on the concept of social desirability, and rated each of the 11 items using the scale Not At All Socially Desirable (=1), A Little Bit Socially Desirable (=2), Somewhat Socially Desirable (=3), Very Socially Desirable (=4), and Extremely Socially Desirable (=5). We report the mean scores for each item in Table 4 alongside our first set of results. The intraclass correlation (ICC) estimating the consistency of agreement among coders is 0.66 (p < .01), indicating good reliability (Cicchetti, 1994).
Item social desirability ratings and means.
SE: standard error; SD: self-description; A/D: agree/disagree.
For items with the A/D scale, Strongly Agree = 5, Agree = 4, Neither Agree nor Disagree = 3, Disagree = 2, and Strongly Disagree = 1. For the SD scale, Completely = 5, Mostly = 4, Somewhat = 3, A Little Bit = 2, and Not At All = 1.
p < .05; **p < .01; ***p < .001.
Analysis methods
We begin by examining unweighted response distributions across the scale versions. For each of the 11 items, we use t-tests for bivariate analyses and ordinary least squares (OLS) regression models predicting each item’s average response for multivariate analysis. In the multivariate OLS models, scale version, respondent age (centered), and education are entered as predictors. We then examine item variance across scale versions using the test for equality of standard deviations (sdtest in Stata). As a sensitivity analysis, we also examine rates of top-two-box, bottom-two-box, and bottom-one-box reporting across the scales using tests of proportions (prtest).
Next, we compare item nonresponse rates across scale versions using tests of proportions for bivariate analyses and binary logistic regression models for multivariate analysis.
Finally, we examine set variance and straightlining. We compare set variance for each of the two question sets using t-tests for bivariate analyses and OLS regression models for multivariate analysis. To examine straightlining across scales, we use tests of proportions for bivariate analyses and binary logistic regression models for multivariate analysis.
Results
Survey estimates
Table 4 displays means overall and by version for each item, and bivariate tests of differences across scale versions. Higher means indicate responses leaning toward Strongly Agree/Completely. For the positive valence question (Q6), three of the six items had significantly lower means in the SD version, consistent with H1a. These results potentially indicate that the SD scale has (a) less acquiescence and/or (b) more spread of positive responses across positive response options. These differences were generally on the magnitude of one-quarter of a standard deviation or less, corresponding to a Cohen’s d < 0.25, a “small” effect size. For instance, the estimated mean response to “I am a community leader” was 3.764 using the A/D scale (between Agree and Neither Agree nor Disagree), but dropped slightly to 3.533 using the SD scale (between Somewhat and Mostly), a difference of 0.231 (the pooled standard deviation for this item is approximately 1). Notably, these three items had the lowest mean social desirability ratings of the items in the Q6 set, consistent with our hypothesis that the differences between A/D and SD scales would occur on items that were lower in social desirability (H1a).
The remaining three items had the highest social desirability ratings of the Q6 set. One item (Q6A) had a statistically significantly higher mean with the SD scale (i.e., trending toward the socially desirable Completely response), although the difference in average responses was quite modest (only 0.07, or about 0.13 standard deviations, a small effect size). Means for the other two items (Q6B and Q6E) did not significantly differ across scale versions. These results are consistent with our hypothesis that respondents may have engaged in socially desirable reporting using the SD scale on items that were higher in social desirability (H1b). This response behavior may have led to no differences between A/D and SD means for Q6B and Q6E, or in the case of Q6A, a slightly higher SD mean.
Consistent with H2b, for all five of the negative valence items (Q33), SD items had significantly lower means than their A/D counterparts at the p < .001 level, with differences on the magnitude of around half of a standard deviation, or a “medium” effect size. For instance, the estimated mean for “When someone treats me badly, I think it is okay to treat them badly” was 2.246 for the A/D scale (between Disagree and Neither Agree nor Disagree), but 1.836 for the SD scale (between Not at All and A Little Bit). This may indicate that respondents using the unipolar SD scale to answer negative valence items anchor their answers on the lowest (Not at All) response option since it is the only option that rejects the item’s premise. These differences are larger for items with higher social desirability ratings. Multivariate results (see Supplemental Appendix B), and top-two-box, bottom-two-box, and bottom-one-box results (Supplemental Appendices C and D) for these items generally follow the same pattern.
Table 5 displays the standard deviation and variance for each item overall and by treatment. Consistent with H3, a majority of the mail survey positive valence items (four of the six items) were statistically significantly more variable with the SD scale (p < .001). This supports the hypothesis that by providing more positive response options, the SD scale fosters more variability across respondents for positive valence items. Only two of the five negative valence items had statistically significantly less within-item variance with the SD scale (p < .01). Thus, contrary to H4, having only a single response option to express a “no” answer with the SD scale does not reduce within-item variance for a majority of negative valence items.
Item variance by scale.
SD: self-description; A/D: agree/disagree.
p < .01; ***p < .001.
Item nonresponse
The item nonresponse rates for all 11 items were very low, ranging from 0.6% to 2.3% (Supplemental Appendix E), and, with the exception of one item, did not significantly differ across the A/D and SD scales (supporting H5). The exception is Q33D (ability to stay calm during a disagreement), in which the SD scale had a higher item nonresponse rate (SD = 2.292%, A/D = 0.575%, z = 2.31, p < .05), but this difference was small (less than 2 percentage points) and likely due to Type I error. These results are confirmed in multivariate models controlling for age and education (Supplemental Appendix F).
Nondifferentiation
The top panel of Table 6 shows the bivariate results comparing set variance across scales (i.e., variance across items in a set within respondents). For the positive valence questions, the SD set of items had significantly more response variance (i.e., less nondifferentiation) than the A/D set (t = 6.15, p < .001) as hypothesized (H6). This indicates that the SD scale encouraged more variability in responses across items in these sets. Contrary to H7, variance across the items in the negative valence grid did not differ across treatments (t = 0.51, p = .611). Results for both the positive and negative valence items are confirmed in multivariate models controlling for respondent age and education (see Supplemental Appendix G).
Nondifferentiation by scale.
SD: self-description; A/D: agree/disagree; SE: standard error.
p < .05; ***p < .001.
The bottom panel of Table 6 shows the straightlining results. Overall, 16.63% of respondents straightlined on the positive valence mail grid. Consistent with H6, the SD scale had 4.9 percentage points fewer respondents straightlining for this question (z = 2.05, p < .05). For the negative valence grid, the overall straightlining rate was 13.02%, and the SD scale had 5.0 percentage points fewer straightlining than the A/D scale (10.381 vs. 15.414, z = 2.35, p < .05), contrary to H7. 3 Once again, these results are confirmed by multivariate analyses controlling for age and education (see Supplemental Appendix G).
Discussion
Researchers’ choice of response options for scales can affect the data they obtain. In this study, we add to the literature describing the impact of that choice in a mail survey by comparing the effects of SD and A/D scales on response distributions, nondifferentiation, and straightlining using items with both positive and negative valence. This is the first study to our knowledge to test the implications of using an SD scale versus an A/D scale.
In support of H1a, our results show that SD scales obtain different responses than A/D scales for the three positive valence mail items that were scored as the least socially desirable: “I consider myself a leader,” “Helping my community is important to me,” and “I can make a positive difference in the community around me.” When compared with A/D scales, SD scales for these three items had lower estimated means (consistent with H2b), larger standard deviations (supporting H3), and equivalent rates of item nonresponse (supporting H5). In addition, SD scales produced less nondifferentiation than A/D scales in the positive valence items (consistent with H6). These differences may have resulted from reduced acquiescence with the SD scale, the difference in polarity across the two scales (i.e., SD scales better capture variability that exists in the population on these items), or both. Since SD scales overcome many of the downsides of A/D scales while still retaining many advantages, we recommend that survey researchers collecting rating data for non-socially desirable positive valence items use SD scales instead of A/D scales.
For the three most socially desirable positive valence mail items (i.e., “I consider myself a good citizen,” “I consider myself a moral person,” “Helping others is important to me”), the self-evaluative nature of the SD scales seem to result in socially desirable responding. This results in statistically equal (two items) or slightly higher (one item) estimated means in the SD treatment compared with the A/D treatment (generally supporting H1b). However, presenting these as SD rather than A/D items also resulted in less nondifferentiation (H6). Although the potential for measurement bias in response distributions seems similar between these two scale types (i.e., due to acquiescence for A/D, and social desirability for SD), this reduction in nondifferentiation leads us to cautiously recommend using SD scales over A/D scales for socially desirable items with a positive valence. However, we note that further research is needed to replicate these findings and should also examine if there is a level of sensitivity at which the increased social desirability in SD items puts them at a disadvantage relative to A/D items.
We are more skeptical about using the SD scale for negative valence items. Consistent with H2b, the SD scale produced significantly lower estimated means for all negative valence items. This likely happened because the SD scales are unipolar, and negative responses were concentrated among a single negative response option at the low end of the scale. With the bipolar A/D scale, responses are spread among multiple negative response options. This concentration of negative responses in a single category also likely led to smaller standard deviations for negative valence items asked using the SD scale, though these differences were significant for only two of these five items (i.e., offering limited support for H4). In a welcome departure from H7, however, the SD scale was equal to the A/D scale in terms of nondifferentiation for negative valence items when it was operationalized as set variance, and it did result in a slight reduction in the percent of respondents who straightlined.
Taken together, our results are generally positive regarding the utility of SD scales, but they also highlight the need to jointly consider the valence and social desirability of the items and the scale type (i.e., unipolar or bipolar, number of scale points, etc.) when making such decisions. Offering more response options that match an item’s valence tends to increase within-item variance and can decrease nondifferentiation and straightlining. Thus, while SD scales appear to be advantageous in positive valence items, the fact that they are unipolar may be to their detriment in negative valence items.
While our results suggest that SD scales are a better choice than A/D scales in some circumstances, A/D scales are ubiquitous in marketing and survey research and may still have advantages over SD scales in certain cases. For example, respondents may be more familiar with A/D scales and may answer them more quickly than SD scales. Clients may also be more accustomed to A/D scales and prefer them over SD scales in surveys they sponsor. In addition, many psychometric instruments have been validated using A/D scales (e.g., customer satisfaction; Nicholls et al., 1998), and would need to be revalidated if researchers wished to use SD scales for these instruments instead. Finally, historical trend data may rely on questions with A/D scales. Bridge studies would be required to explore how trend lines might change when replacing A/D scales with SD scales.
There are several important avenues for future research on the topic of SD scales. For example, we examined a limited number of questions. Furthermore, most questions in this study were rated as “A Little Bit” or “Somewhat” socially desirable by our coders. Replication of this study on additional positive and negative valence items and additional questions from the extreme (i.e., high and low) ends of the social desirability range would provide more insights into the joint role of valence and social desirability on answering. In addition, the social desirability of each question in this study was rated by graduate students. It is possible that our respondents, who are generally older than our graduate student raters, might view the social desirability of these questions differently. These results would be strengthened by asking the respondents themselves to rate the social desirability of these questions (as in Kreuter et al., 2008).
One concern in studies like this is that respondent motivation may wane in long questionnaires, affecting indicators of data quality. However, the battery items we analyze in this study were presented toward the beginning of the questionnaire (i.e., in the first half) where respondent motivation likely is highest (Krosnick, 1991). Future work should nonetheless replicate this study on a shorter questionnaire. Also, the presentation order of the battery questions was constant in this study (i.e., the positive valence battery was always displayed before the negative valence battery), as was item order within each battery (e.g., Q6A was always displayed before Q6B, Q6B before Q6C, and so on). Future studies should vary the order of the batteries within the questionnaire and the items within the batteries to disentangle any potential order effects from the differences across items that we observe here.
Future research should also expand beyond this necessary first step of looking at patterns of response at the individual item level to assessing correlational outcomes, including concurrent and predictive validity. Likewise, sets of A/D items are commonly scaled together to create a composite measure(s) of underlying characteristics. Future research should examine how SD scales affect scale reliability, factor loadings, and other indicators of measurement equivalence through structural equation models.
Future work should also explore whether these results extend to survey modes beyond mail. As mail and web surveys are both self-administered and use the visual communication channel, we would expect similar findings in web surveys as we observe here in a mail survey. A/D and SD scales may perform differently, however, across self-administered and interviewer-administered modes. In visual modes like mail, respondents can look back to the question and scale labels as needed, but in aural modes like telephone, respondents have to hold this information in working memory while formulating their answers (Christian et al., 2007; de Leeuw, 2005; Dillman et al., 2014; Olson et al., 2018). This extra effort may increase the tendency to acquiesce on questions with A/D scales due to satisficing behaviors (Krosnick, 1991). Likewise, because the presence of an interviewer can exacerbate social desirability effects (de Leeuw, 2005), responses to SD items may suffer from more social desirability in interviewer-administered modes than in self-administered modes. As a starting point for future studies, we provide results from an experiment comparing A/D and SD scales on the same outcomes from this study in a telephone survey (Supplemental Appendix H). Although this experiment is not a direct mode comparison and is limited to only four positive valence items, we hope that these supplemental analyses will inform future research on the topic.
In sum, this initial examination of SD scales suggests that they are a beneficial alternative to A/D scales. Although survey estimates changed when using SD scales, they did so in a way that suggests lower acquiescence to and/or the ability to capture more variance in positive valence items, as well as less nondifferentiation.
Supplemental Material
sj-docx-1-mre-10.1177_1470785320971592 – Supplemental material for Are self-description scales better than agree/disagree scales?
Supplemental material, sj-docx-1-mre-10.1177_1470785320971592 for Are self-description scales better than agree/disagree scales? by Jerry Timbrook, Jolene D Smyth and Kristen Olson in International Journal of Market Research
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funding for the NHWPS data collection was provided by the Office of Research and Economic Development and the Department of Sociology at the University of Nebraska–Lincoln. The analysis was partially funded by Cooperative Agreement USDA-NASS 58-AEU-5-0023, supported by the National Science Foundation National Center for Science and Engineering Statistics. The WLT2 material (
) is based on work supported by the National Science Foundation (Grant Number SES-1132015 to K.O.). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
