Abstract
In survey measurement, acquiescence bias is a response effect that occurs when respondents agree to the item or the question in the scale regardless of its content. It is assumed that negative items force participants not to agree with some items. Using the mixture approach, however, is not without a substantial cost on both the structure and the scale psychometric properties. The effects of including negative items in scales is what this study tried to investigate. Therefore, the aim of the study is to empirically evaluate the effects of changing negative items to their equivalent positively worded items on the reliability and the factor structure of psychological scales. It is hypothesized that this approach improves the scale factors structures and reliability. Seven commonly used psychological scales that have both negatively and positively worded items have been selected. The scales were applied on seven different samples with a total number of 4192 participants from a public university in the United Arab Emirates. The results confirmed that changing negative items to their equivalent positively directed items systematically and significantly increased the reliability values as well as improved the factor structure of psychological scales.
Keywords
Introduction
Response effects are critical mistakes in survey measurement that are caused by respondents’ conduct and have a negative impact on measurement validity. Response effects are the proclivity to choose specific scale question alternatives based on extraneous information rather than the substance of the questions (Paulhus 1991). Acquiescence bias is a typical form of response effect, along with excessive responding, a preference for the median category, and others (Mayerl & Giehl, 2018). Acquiescence bias occurs when respondents agree to the scale item or question regardless of its substance (Lewis & Sauro, 2009). It may impact estimates in either way depending on how the survey question is constructed (Hill & Roberts, 2021). For example, Extreme Response Style (ERS) is defined as the proclivity to provide extreme responses on a 5-point Likert-type scale (strongly agree and strongly disagree) regardless of the content (Weijters et al., 2020).
Controlling or reducing measurement mistakes such as acquiescence style is an important priority for researchers, survey designers, and measurement professionals in general. One typical method is to utilize a blend of positive and negative questions in surveys/scales (Mayerl & Giehl, 2018). This mixed method causes respondents to disagree on certain things. According to Chyung et al., fadorably phrased questions induce acquiescence bias, and combining negatively and positively written items helps to lessen acquiescence bias.
Using the combination strategy, on the other hand, comes at a significant expense. Mixing item orientations may lead to major methodological issues such as distorted scale component structure, decreased internal reliability, and weaker correlations between positively and negatively phrased questions (Mayerl & Giehl, 2018). Scales and surveys that measure unidimensional constructs are often built such that all items load on one component independent of their orientation. However, in many circumstances, negative things create a new factor (Lewis & Sauro, 2009; Dodeen, 2015). This signifies that negative elements are not acting in the same way as the other items on the scale. Furthermore, negative questions may confuse respondents since they make it more difficult to grasp the meaning of the items, which may result in a method factor that results in measuring something other than what the researchers or developers meant to measure (Zhang et al., 2016).
Many studies in the relevant literature have looked at the impact of adding negative items on scales or the influence of item direction on the psychometric properties of psychological scales, particularly factor structure. Chan (1991) investigated the impact of various answer choices based on items with positive or negative orientations on the component structure of a scale administered to a sample of high school students. The findings showed that item orientation created varied estimates of the latent trait and that the positive direction of the same scale had a better model fit than the negative direction.
Qasem and Gul (2014) studied the effect of item direction on the number of factors of a scale that measures students’ attitudes toward undergraduate study. The scale, which consisted of 40 items, was applied in different format based on the direction of the items: all positive, all negative, half-positive/half-negative, and random number of positive and negative items. Results revealed that the direction of items changed the number of the extracted factors. In a similar study, Zhang, et al., (2016) examined the effect of including negatively worded items on the factor structure of the Need for Cognition (NFC) scale (9 positive items and 9 negative items) on sample of undergraduate students. Three revised versions with different number of negative items in each were analyzed using confirmatory factor analyses. Similarly, the results showed that the factor structure of the NFC scale was affected by the negative items. It is clear from these examples that the negatively worded items affect the factor structure of psychological scales and that some of the extracted factors reflected the direction (positive vs. negative) of these items.
Additionally, negatively worded items can reduce scales homogeneity and reliability. Several studies have investigated this undesirable effect of having negative items on the reliability of scales. For example, Jaensson and Nilsson (2016) investigated the effect of including negative items on the Swedish version of the Quality of Recovery Questionnaire (14 items: 7 positive and 7 negative). The findings indicated that some items showed higher mean values when they were worded in the negative direction, and that the reliability of the positive items was higher than that of the negative ones. Roszkowski and Soven (2010) studied the effect of having only two negatively worded items in a scale used in student evaluation. It was concluded that replacing these two items by positive items improved the internal consistency of the scale. Other studies (Barnette, 2000; Józsa, et al., 2014) reached to the same result which is: the scale reliability would increase or improve when eliminating/changing the negative items.
One explanation of the undesirable effects of having negative items on the factor structure and the reliability of the scales is that negative items can cause careless responses by participants or lack of attention to item content since many participants have minimal interest in the research (Coleman, 2013). Another explanation is that negative items become more difficult to understand (Zhang, et al., 2016). Respondents cannot read the negative items well which affects their interpretation and understanding. Additionally, respondents tend to answer the negative items in a non-consistent way with the average of the positive items in the same scale (Józsa & Morgan, 2017). This lowers the correlation between the items which, in turn, decreases the homogeneity and the reliability and changes the factor structure of the scale.
According to Zeng et al., (2020) stated that a combination of positively and negatively worded items is most of the times added in a survey to decrease the participants acquiescence bias but this combination might harm the validity of the survey. In line with this, the aim of this study is to empirically investigate the effect of changing negatively worded items to their equivalent positively worded items on the reliability and the factor structure of psychological scales. The current study is unique from the other studies conducted in the past as it has selected seven common psychological scales and applied them on seven different samples. This methodological approach makes it different than most of the other studies in the literature conducted in the same domain as they used only one scale, thus contributing toward the existing body of knowledge in this way.
Methodology
Study Setting and Participants
Number and Percentage of the Participating Students for each Scale by College.
Note. HA = Helping Attitude Scale; LO = Life Orientation Scale; IL = UCLA Loneliness Scale; MS = Marital Satisfaction Scale; ML = Meaning in Life Questionnaires; SE = Self-Esteem Scale; SS = State Self-Esteem Scale.
Gender, Age Mean, and GPA Mean of the Participating Students for each Scale.
Note. HA = Helping Attitude Scale; LO = Life Orientation Scale; IL = UCLA Loneliness Scale; MS = Marital Satisfaction Scale; ML = Meaning in Life Questionnaires; SE = Self-Esteem Scale; SS = State Self-Esteem Scale.
Study Instrument
The following is a brief description of each of the psychological scales which have both positively and negatively worded items and used in this study:
Helping Attitude Scale (HA) (Nickell, 1998): The scale measures attitudes, beliefs, feelings, and behaviors of people. The scale consists of 20 items and used a 5-point Likert scale that ranges from 1 (strongly disagree) to 5 (strongly agree) which makes the total score to range between 20 and 100. Out of the 20 items, five are negatively worded. Examples of these items are: Item 1 “Helping others is usually a waste of time” and Item 18 “Helping people does more harm than good because they come to rely on others and not themselves”.
Life Orientation Scale (LO) (Scheier et al., 1994): The scale consists of 6 items that measure optimism versus pessimism. Respondents rate each item on a 4-point Likert scale ranging from 0 (strongly disagree) to 4 (strongly agree). The scale has 3 negatively worded items. Examples of these items are: Item 7 “I hardly ever expect things to go my way” and Item 9 “I rarely count on good things happening to me”.
The UCLA Loneliness Scale (UL) (Russell, 1996): The scale measures participants’ experience of loneliness. A 4-point scale that ranges from (1) never to (4) always is used for the 20 items that make the scale. However, in this study only 19 items were used as item 17 has been identified as culturally biased item and recommended to be excluded from the scale (Dodeen, 2015; Lasgaard, 2007). Out of the 19 items in the scale, 10 were negatively worded. Examples of these items are: Item 13 “How often do you feel that no one really knows you well?” and Item 14 “How often do you feel isolated from others?”
Enrich Marital Satisfaction Scale (MS) (Fowers & Olson, 1993): This is a brief scale that measures marital quality and marriage satisfaction. The scale covers several aspects such as communication, conflict resolution, roles, financial concerns and others. The scale consists of 10 items, 5 of them are negatively worded. Examples of these items are: Item 5 “I am not happy about our communication and feel my partner does not understand me” and Item 12 “I am not satisfied with the way we each handle our responsibilities as parents”.
Meaning in Life Questionnaire (ML)–Short Form (Steger et al, 2006): The scale assesses how respondents feel toward their lives and how motivated they are to find meaning in their lives. Items were rated from 1 (Not at all true) to 4 (Completely true). Out of the 10 items in this scale, 5 items are negatively worded. Examples of these items are: Item 2 “I am looking for something that makes my life feel meaningful” and Item 8 “I am seeking a purpose or mission for my life.”
Self-esteem Scale (SE) (Rosenberg, 1965): Rosenberg Self-esteem Scale is a common self-report scale. The scale has 10 items that measure personal self-esteem on a 4-point Likert-type scale ranging from 1 (strongly disagree) to 4 (strongly agree). Out of the 10 items, five are negatively worded. Examples of these items are: Item 2 “At times I think I am no good at all” and Item 8 “I wish I could have more respect for myself.” The total scores of the scale ranging from 10 to 40, and the higher scores indicate higher levels of self-esteem.
State Self-Esteem Scale (SS) (Heatherton & Policy, 1991): The scale measures a participant’s self-esteem at a given point in time. Items use a 5-point Likert scale that ranges from 1 (not at all) to 5 (extremely). Out of 20 items of the scale, 12 items are negatively worded. Examples of these items are: Item 7 “I am dissatisfied with my weight” and Item 13 “I am worried about what other people think of me.”
Study Procedure
The study selected and applied seven psychological scales that originally contain mixed item-wording directions. A brief description of each scale is presented as follows: for each scale used in this study, all the negatively worded items have been changed to be in the positive direction and added to the original version of the scale. The negative items were changed into the positive direction by the author and then reviewed by a panel of five experts with adequate experience and background in linguistic or translation. Electronic dictionaries and linguistic websites were also used to find appropriate antonyms. Negative items are either negated or polar opposite. The negative items in each scale have been changed to their equivalent positive items and added to the scale. Thus, the applied version of each scale included the original positive items, the original negative items, and the changed items.
Data Collection and Analysis
Google Forms was used to administer the scales online. The data was analyzed using the Statistical Package for Social Sciences (SPSS) version 23.0. Furthermore, descriptive statistics were used to analyze the data in the research. Furthermore, reliability procedures and exploratory factor analysis (EFA) were carried out.
Ethical Statement
Before responding to any questions, participants were fully informed of the study’s objectives and that participation was totally voluntary and would not have any negative effects. Furthermore, it was assured to the participating students that the information gathered would be kept completely private and utilized solely for that research purpose.
Ethical Approval
ERS_2020_6132, Social Sciences, Ethics Committee-Research, United Arab Emirates University (UAEU), UAE.
Results
Reliability Analysis for each Scale with and without the Negative Items.
Note. HA = Helping Attitude Scale; LO = Life Orientation Scale; IL = UCLA Loneliness Scale; MS = Marital Satisfaction Scale; ML = Meaning in Life Questionnaires; SE = Self-Esteem Scale; SS = State Self-Esteem Scale.
The results of this analysis were clear and systematic for all scales. Changing the negative items to the positive direction increased the reliability values as assessed by Cronbach’s alpha. For example, Cronbach’s alpha for Helping Attitude Scales increased from 0.80 to 0.93, for UCLA Loneliness Scale (UL) from 0.49 to 0.85, and for the State Self-Esteem Scale (SS) from 0.64 to 0.92. The same thing happened to the reliability values for all other scales. The second analysis (EFA) shows the effect of the negatively worded items on the factor structure of psychological scales. This analysis was used to uncover the underlying structure of each scale and to identify the nature of the factors that can be extracted with and without the negative items. Additionally, the analysis determines which items loaded on each factor and their loading values. EFA was conducted on each scale using Principal Axis Factoring (PAF) as the extraction method of the factors. The results of EFA when it is used in the original form were presented and compared with the results obtained from the modified form when the negative items were replaced by their equivalent positively worded items. Two results were presented from this analysis: First, how the initial eigenvalues and the percentage of the explained variance changed as a result of changing the negative items in each scale.
The Eigenvalues and the Variance Explained by the First Two Factors in each Scale with and without Negative Items.
*: Only one factor was extracted.
Items Loadings on the First Two Factors of Helping Attitude (HA) Scale, Life Orientation (LO) Scale, and Marital Satisfaction (MS) Scale with and without the Negative Items.
*: The negatively worded items in the original scale.
**: Only one factor was extracted.
Items Loadings on the First Two Factors of UCLA Loneliness (UL) Scale and with and without the Negative Items.
*: Negatively worded items.
**: Only one factor was extracted.
Items Loadings on the First Two Factors of Meaning of Life (ML) Scale and Self-Esteem (SE) Scale with and without the Negative Items.
*: Negatively worded items.
**: Only one factor was extracted.
Considering the scale Helping Attitude (HA) for example from Table 5, negative items were highly loaded on the second factor (F2) while their loadings on the first factor (F1) were negative and low. This means that these items were not functioning on the same direction as the rest of the items in the scale. The situation was totally changed when these items were replaced by their equivalent positive items. The replaced-positively worded items highly and positively loaded on the first factor (F1) which means that they function in the same direction as the rest of the items in the scale. Without negative items in the scale, all items were highly and positively loaded on the first factor (F1). This clarifies the effects of the direction of the items on the nature of the extracted factors. Of course, the situation is not always like the HA scale example. Some positively worded items are still loading highly on the second factor (F2) as in the case of item 2 and item 4 from the Life Orientation (LO) scale.
In some other scales such as MS, changing the negative items to their equivalent positive direction improved the factor structure since only one factor was extracted from the data as show in Table 6. All the items were highly and positively loaded as one single factor that represents the scale.
Discussion
It has been observed that the direction of the items in psychological scales, whether it’s positive or negative, affects participants’ responses to these items. Generally, measurement specialist, scales developers, and psychologists prefer including some negatively worded items in scales to control over some types of response bias. Common response bias includes acquiescent bias, which is selecting positive responses unrelatedly of the content, and extreme response bias, which is selecting all high or all low ratings (Dodeen, 2015; Zeng et al., 2020). However, the use of the negative items has a big cost that undesirably affects the psychometric properties of the scales specially their reliability and factor structure (Chan, 1991; Lewis & Sauro, 2009; Qasem & Gul, 2014; Woods, 2006; Zhang, et al., 2016). This negative effect of including negatively worded items in psychological scales was what this study tried to investigate.
Seven common psychological scales which have both positively and negatively worded items were selected and applied on the target samples of university students. The samples represented main demographic variables students such as college, gender, and age. The negative items in each scale have been changed to their equivalent positive items and added to the scale. Thus, the applied version of each scale included the original positive items, the original negative items, and the changed items. The statistical analysis, which is conducted on each scale separately, included the reliability and the exploratory factor analysis.
The results confirmed that changing negative items to their equivalent positively directed items systematically and significantly increased the reliability values. This was clearly observed by the increase in Cronbach’s alpha values. Generally, reliability is important indicator of the accuracy of the measurement process. The measurement process in our case includes responding to specific items that are included in a psychological scale. The wording of these items affects the participants’ responses to these items and thus the measurement accuracy. The response to any item in any scale and then the selecting of a specific choice depends on the respondent’s understanding of this item and on how he/she interprets its meaning which highly depends on the wording of the item. It has been observed that in all scales used in the study, when negative items were changed to their equivalent positive direction, the reliability values increased. Increasing reliability of the scales increases, in turn, accuracy which means improving the measurement process.
The second main effect of changing the negative items which the study tried to establish was improving the factor structure of psychological scales. Each scale used in the study was originally designed to assess only one construct (unidimensional). This means that only one factor should be extracted from all the items in each scale. The results of conducting the EFA showed that negative items caused method effect in these scales, and more than one factor has been extracted (multi-dimensional) with positive and negative items loaded onto different factors. These results are similar to what has been observed in other studies (Barnette, 2000; Motl, et al., 2000). On the other hand, when the negative items were changed to positive direction, the factor structure of each of the psychological scales substantially improved. More specifically, the results showed that, for each scale, the eigenvalue of the first factor and the percentage of the explained variance were meaningfully increased as a result of changing the negative items to positive direction.
This result was confirmed with the changes in the items loadings on the first two factors in each scale. First, negative items highly loaded on the second factor (F2) in most of the scales. When these items were replaced with positive items, they highly and positively loaded on the first factor (F1). This indicates that these items are functioning now in the same direction as the rest of the items in the scale. All the results (eigenvalues, variances explained, and items loadings) indicated that changing the negative items to positive direction improved the factor structure of the psychological scales. This result confirms what has been found in the literature (Dodeen, 2015; Qasem & Gul, 2014; Zhang, et al., 2016).
Conclusion
When response effect such as acquiescence bias, extreme responding, tendency to the middle category, or others undesirable responses are expected and need to be handled, some negative items can be included in the scales but as fillers not as original items. For this purpose, filler items are negatively worded items and they are answered by the respondents as normal items but their results will not be included in any analysis of the collected data. The filler items can be used to control over any response effect or bias, and to make a desired balance between the positively and negatively worded items in the scale. In the literature, several scales have used this procedure to handle the issue of including negative items. For example, Life Orientation Test, which is a 10-item measure of optimism versus pessimism, has four items serve as fillers, the Adult Hope Scale, which contains 12 items, has four items serve as fillers and the Arabic Scale of Happiness (which is a 20-item measure of happiness, has also five filler items.
Study Implications
The implications of this study are clear and direct. Using negatively worded items decreases the reliability of the psychological scales and negatively affects their factor structure. These results negatively affect the quality or accuracy of using scales in psychological measurement. The current psychological scales which already developed to include both negatively and positively worded items, the study suggests changing the negative items to their equivalent positive direction before applying them in new samples or population. In the seven psychological scales used in this study, this procedure resulted in improving both reliability as well as the factor structure.
Study Limitations
The limitations of the study are associated with the number and the nature of the psychological scales. No specific criterion was used to select these scales. In addition, only a sample of seven scales was used which does not represent the tons of existing psychological scales. Another limitation is related to the participants, the scales applied only on college students who do not represent the whole population for whom the psychological scales were originally developed. However, these limitations do not affect the clear and systematic results of the negative impact of including negatively worded items in psychological scales. Negatively phrased items, while less reliable, may be necessary for proper evaluation of some constructs, particularly in the personality domain. For example, if you want to assess the level of depression, you could ask something like, “I feel sad,” on a Likert scale. If you changed that to a less negative phrase, such as “I feel happy,” it could change what you’re measuring in terms of a mental health diagnosis. Similarly, when assessing cognitive distortions according to CBT theory, rephrasing to a positive working may actually change the measurement of the distortion you are attempting to assess.
Future Recommendations
More studies can be conducted on psychological scales or even scales from other social sciences that have similar construction of positively and negatively worded items. These studies can use representative samples from the populations that are targeted by the original scales. Additionally, studies that empirically investigate the effects of using filler items in handling the problem of negative items are recommended. Future research will look into the effects of changing negatively worded items to positively worded items on validity.
Supplemental Material
Supplemental material - The Effects of Changing Negatively Worded Items to Positively Worded Items on the Reliability and the Factor Structure of Psychological Scales
Supplemental material for The Effects of Changing Negatively Worded Items to Positively Worded Items on the Reliability and the Factor Structure of Psychological Scales by Hamzeh Dodeen in Journal of Psychoeducational Assessment
Footnotes
Acknowledgments
The author is very thankful to all the associated personnel in any reference that contributed in/for the purpose of this research.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethical Approval
ERS_2020_6132, Social Sciences, Ethics Committee, Research, United Arab Emirates University (UAEU), UAE.
Consent to Participate
A written consent form was signed by the respondents before the commencement of the study.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
