Abstract
Three experiments were conducted to test the robustness and explanations of the Nonselective Superiority Bias (NSSB), whereby any randomly selected item from a positive category is rated more favorably when compared with a cohesive group of other exemplars from the same category. Having participants rank order all exemplars prior to making a direct comparative rating did not reduce the NSSB (Experiment 1). Whether participants considered similarities or differences between the randomly selected target and the other individual exemplars, the target was rated more positively than the rest (Experiment 2). Finally, even comparing a randomly selected exemplar with exemplars from different categories (apples vs. oranges), the NSSB was still obtained (Exp. 3). The generalizability of the bias and the implications of the current results for the focalism, unique attributes, and LOGE explanations of the NSSB are discussed.
Everyone has probably heard someone else say, “I know what I like,” (also the title of a hit rock and roll song) but research in social judgment, social cognition, and decision making shows that preferences can change dramatically depending on how the options are presented and preference is assessed (e.g., see Gilovich, Griffin, & Kahneman, 2002). A compelling example is the Nonselective Superiority Bias (NSSB; Giladi & Klar, 2002; Klar, 2002). In a representative demonstration, participants are instructed to generate five different exemplars from a positive category, such as pleasant acquaintances. Then the name of one of the acquaintances is randomly selected and rated for pleasantness compared to the rest, considered as a whole. Typically, findings are that the acquaintance that was randomly selected is judged to be more pleasant than the rest (see also Suls et al., 2010). A comparable bias is found for exemplars of negative categories, such as irritating acquaintances: any randomly selected person is rated more irritating than the others. As all exemplars have the same probability of being drawn, it is logically impossible for the randomly selected target to always be better (or in the case of a negative category, worse) than the rest; nonetheless, the target item is judged more extremely than the other items. Giladi and Klar (2002) refer to this effect as the Non-Selective Superiority/Inferiority Bias (NSS/IB) because the item randomly selected as the target is rated more extremely than the rest (Giladi & Klar, 2002). Although the effect has been demonstrated with positive and negative categories, the research reported in this article focused on the positive or superiority version of the bias.
The NSSB seems counterintuitive, which raises questions about its replicability, generalizability, and explanation, although it has been obtained under seemingly stringent conditions. Klar & Giladi (1997) had students successively rate each member of their class compared to the rest on traits such as intelligence, friendliness, or generosity. Everyone, in the group, in turn, was rated as “above average.” The same effect was obtained even when participants rated anonymous group members (Klar & Giladi, 1997). Obviously, all of the group members cannot be “above average,” but the ratings showed just that.
Krizan and Suls (2008) also replicated the NSSB even when it was pitted against the tendency for people to rate themselves as superior to others on positive or high-skilled traits (i.e., Better Than Average [BTA] Effect; Alicke, 1985; Kruger, 1999). Participants were instructed to generate a list of pleasant persons and also to include their own name in the list. Then a name was selected “at random” as the comparison target. Either the participant’s own name or the name of another person from the list was the target. With the participant as target, the self was judged to be the most pleasant—consistent with the standard BTA effect. However, when the participant was included with the rest of the group, the target (someone else from the list) was rated more positively than the group, contrary to the self-enhancing BTA (Alicke, 1985).
One boundary condition for the NSSB was identified by Windschitl, Conybeare, and Krizan (2008), who changed the timing of the introduction of the target item. In the standard paradigm, participants are told which item is the target item after being exposed to the entire group; Windschitl et al. (2008) showed that if the target item was indicated (by separating it or highlighting) before the entire set of exemplars was introduced, the target item was consistently rated to be less attractive than the referent group. These authors did not provide a full explanation for the reversal but did show that the target revealed late (in the conventional paradigm) receives an advantage in terms of heightened attention or weight (see Houston, Sherman, & Baker, 1989).
There are many occasions in the real world, however, in which the timing is “right” for the bias. For example, a manager making a hiring decision might arbitrarily pick one of the best resumes to peruse, even though the others had already been reviewed. The current research primarily was designed to test the robustness of the bias and search for boundary conditions when the timing of the introduction of the target item is conducive to finding the effect.
As a secondary aim, we hoped the results might be informative about the reasons for the bias. According to the Local Comparisons-General-Standards (LOGE) model (Giladi & Klar, 2002), the target receives more positive ratings because it is not compared to the restricted set of other five pleasant people (comprising the local group), but to a hybrid standard, comprised of the local group and the (lower) global standard associated with “acquaintances, in general.” Because this combined standard logically must be less positive (than the local group), the target compares more favorably. In short, LOGE claims a global standard encroaches on the judgment process and is responsible for the NSSB.
Another explanation, referred to as focalism (Chambers & Windschitl, 2004; Moore & Kim, 2003), assumes that the target item is weighted more heavily in comparative ratings because a focal item is given more attention and judgmental weight than the referent group. Finally, the unique attributes hypothesis (UAH) suggests that the target item is not only weighted more heavily, but, because of the salience associated with being the target, its unique attributes define the standard with which the other items are compared (Chambers, 2010; Suls et al., 2010). For example, if Las Vegas were compared to other travel destinations, it would be rated better than the rest because of its superior entertainment (its unique attribute), but if Honolulu were the target it would be rated as superior because of its excellent climate. It is worth noting that both focalism and unique attributes should operate especially when the target of the comparative judgment is revealed late (after all of the exemplars are known). This is because Windschitl et al.’s (2008) findings, described earlier, suggest late identification of the target commands more weight and attention.
Three experiments are reported, which involved variations in instructions or manipulations to test the limits of the NSSB as well as provide information bearing on the explanations described above. The first experiment had participants generate a series of positive exemplars and rank them for favorability before randomly selecting a target item to compare to the referent group. We predicted that, after establishing a rank ordering, raters should be constrained from judging a randomly selected exemplar as better than the rest. The LOGE model supports this prediction, as a global comparison should not come into play because local comparisons (in the form of rank ordering) were overtly made. Focalism and UAH do not lend themselves to explicit hypotheses about the prior effects of rank ordering, although one might speculate that creating a preference hierarchy would discourage any bias favoring the exemplar that was later randomly selected as the target item.
In Experiment 2, participants generated several positive exemplars and then were instructed to adopt a similarity mind-set or a difference mind-set about the items. Our rationale was that thinking about how the exemplars are similar to each other should counteract the tendency to judge the randomly selected target more positively because shared features should highlight the similarities among items, thereby “shortcircuiting” unique attributes and focalism effects. A difference mind-set should produce the converse tendency because the differences between the target and the others should be highly accessible. According to both the focalism account and the UAH, a difference mind-set might increase the bias by magnifying the “special” status of the target item.
The third experiment modified the standard procedure in which all exemplars are drawn from the same category (e.g., pleasant people). Instead, positive exemplars from several different categories were generated. This procedural modification allowed us to inquire whether a randomly selected “favorite” from one category (such as favorite musicians) would be judged “the best” compared to favorites from different categories (such as travel destinations). If the bias extends to different categories, this would require the LOGE model to posit an implausibly diffuse global standard of “things.”
Experiment 1: Does Ranking Limit the NSSB?
In some decision-making contexts, judges consider the features of each option and rank order them prior to making a final selection or evaluation. For judges who previously had rank ordered all of the options, the order should be evident when shortly after asked to rate a randomly selected item from the list. In fact, conducting a prior ranking seems like a strong strategy to “shortcircuit” the nonselective superiority effect. If a person earlier had ranked an item as fourth (out of six) on a list, they should not subsequently rate that item more positively than the other items (including the item they ranked first)—a prediction tested in this experiment.
Method
Participants and Design
Eighty-seven undergraduate students enrolled in an elementary psychology course earned partial course credit for participating. Participants were randomly assigned to a two condition (listing vs. ranking) between-participant design.
Material and Procedures
Participants were recruited in groups of one to eight for a study of “judgments about food preferences.” When participants arrived in the lab, each was given a packet of materials with instructions to generate a list of six of their “favorite foods.” On a random basis, one half of the participants were instructed to list the six foods “in rank-order from most liked.” The remaining participants were simply asked to list six favorite food items. Then, the experimenter randomly selected a number and participants treated the item in the corresponding row as the target item in the rating instructions. For example, if the experimenter drew a “4,” the subject was asked to consider the food written in the fourth row of his or her form as the target item.
Participants were instructed, “Compared to the other foods that you listed, how would you rate the selected food? (−5 = dislike it much more than the other foods; 0 = like it as much as the other foods; 5 = like it much more than the other foods). After the rating materials were collected, all participants in the ranking condition were given a sheet of paper and given a “surprise recall test,” to list from memory the rank order they had made earlier. Participants who had been in the listing condition were asked just to list the items for a second time. This supplementary information was collected to assess participants’ recall accuracy (less than 5 min after generating the lists/ranks). Then participants were debriefed and thanked for their participation.
Results and Discussion
In the standard instructions condition, the randomly selected food was rated more favorably than other foods in the favorites list (M = 1.2, SD = 1.76, t(45) = 4.6, p < .001, d = 1.37)—replicating the NSSB. The bias also was exhibited for participants who had previously rank ordered their six favorite foods. That is, rank ordering the food items did not eliminate or even reduce the bias, M = 1.63 (1.5), t(40) = 7.2, p < .001, d = 2.28. The mean ratings across conditions did not differ, ns.
One potential confounding factor might be that participants could not recall the rank orderings they made earlier; if they failed to remember their prior rank ordering, then it should have had less effect on the preference ratings. In supplementary analyses, we compared each participant’s initial rank ordering of foods with the rank ordering that was recalled at the conclusion of the lab session. In the ranking conditions, 10 participants (out of 41) incorrectly recalled their rank ordering, although no participants misremembered that the target item was ranked first. In the listing condition, participants only generated the list of foods with no mention of rank ordering; only three participants forgot one or more items from their initial list.
As a further check, the ratings of those who forgot their initial rankings were excluded from the analyses; but the NSSB was still evident, M = 1.5 (1.4), t(30) = 6.52, p < .001, d = 2.38. The bias was also significant for participants assigned to the mere listing condition who recalled all the items, M = 1.3 (1.8), t(42) = 4.72, p < .001, d = 1.46. With some confidence, we can conclude that inclusion of participants who forgot their original rank ordering did not confound the basic results. Contrary to expectation, initially ranking of items did not eliminate the NSSB.
One potential complication is that participants in the mere listing condition, while not instructed to list the items in order from most preferred, may have done so anyway—perhaps because the most preferred exemplars were generated first. However, the reader should keep in mind that the target item was randomly selected from one of six so even if some of the participants initially (and implicitly) generated a rank ordered list, across participants, every item had an approximately equal probability of being selected as the target stimulus. In fact, the mean ranking of the target items (in the rank order conditions) was slightly (but nonsignificantly) below the midpoint (M = 3.43, SD = 1.75) ruling out that the targets just happened to be preferred a priori.
Experiment 2: Similarity Versus Dissimilarity Mind-Set
In the previous experiment, the a priori comparisons required to establish a preference ranking did not eliminate the NSSB. The second experiment took a priori comparison a step further by testing whether thinking about the similarities (vs. differences) among category exemplars eliminates the superiority bias. According to Mussweiler’s (2003) selective accessibility comparison model, when a similarity focus is adopted, the commonalities among items become more cognitively accessible and assimilation (i.e., displacement of evaluation or meaning) to the target should occur. If a dissimilarity mind-set is adopted, however, the target’s unique features should become more accessible so the target should be more readily contrasted with the group, and rated more extremely. The UAH and focalism accounts should make the same prediction about the effects of the difference mind-set.
In this study, some participants were given instructions designed to instill a mind-set that should work in opposition to judging the randomly selected target more extremely (i.e., favorably) than the rest. Specifically, participants were asked to think about ways the exemplar was similar to the others. In another condition, participants were instructed to think about ways the exemplar was different from the others, which, if anything, should increase the bias.
Study 2 had participants generate a list of healthy foods. By keeping the category the same as Study 1’s, but changing the nature of the judgment to “healthiness,” we tested the generalizability of the bias beyond “liking.”
Method
Participants and Design
One hundred sixteen undergraduate students enrolled in an elementary psychology course at a large Midwestern university earned partial course credit for participating in small group sessions in the laboratory. Participants were randomly assigned to one of three experimental conditions (similarity, dissimilarity, control).
Material and Procedures
Participants were asked to list six healthy foods on a sheet of paper printed with six rows. Then, the experimenter announced he or she would draw a random number from one to six from an envelope in plain sight of the participants; the food in the corresponding numbered row was then considered the target item. Participants were randomly assigned to write for 10 min about how the target food was similar or dissimilar to the rest of the foods they had listed on a blank sheet of paper. The remaining third of participants (control condition) worked on an unrelated writing task (i.e., describing their daily routines).
Then all participants rated the target food “in terms of how healthy/unhealthy it is in comparison to the other food items [they] listed” (1 = much less healthy than the other items on the list; 4 = just as healthy as the other items on the list; 7 = much more healthy than the other items on the list). Then subjects were debriefed and dismissed.
Results and Discussion
Manipulation check
To ensure that participants followed the similarity/difference mind-set instructions, two independent coders rated all essays. For the control essays, raters indicated whether participants followed instructions or not and were in perfect agreement. Two participants were removed from analysis for failing to follow instructions (i.e., they first began writing about the target foods instead of their daily routines because of an experimenter error). For the similarity and difference essays, coders separately counted the number of statements indicating the target was similar to the other items, and the number indicating the target was different from the others. Coder agreement for these counts was very high (rs > .90). A composite similarity ratio was created by averaging the ratings of the two coders and then dividing the number of similarity statements in each essay by the total number of statements (i.e., sum of similarity and difference items). A t-test comparing the ratios shows that participants in the similarity group provided more similarity (and less difference) statements, M = .92, SD = 0.17, than participants in the differences condition, M = 0.01, SD = 0.05, t(68) = 31.41, p < .001. Thus, the mind-set manipulations appear to have been effective.
Contrary to expectations based on selective accessibility (Mussweiler, 2003), or predictions of the UAH, in the similarity condition the target food was rated as healthier than the rest, M = 4.51, SD = 0.85, t(38) = 3.75, p = .001, d = .1.23. In the dissimilarity condition, the target was also rated significantly more healthy than the other foods M = 4.30, SD = 0.81, t(36) = 2.23, p < .051, d = .74. The control condition also demonstrated the bias, M = 4.50, SD = 0.92, t(37) = 3.34, p < .01, d = 1.10. t-Tests comparing the magnitude of ratings across the three conditions were nonsignificant.
In sum, even participants who wrote about a randomly selected exemplar’s similarities to other members of the same category exhibited the NSSB—the single exemplar tended to be rated as a healthier food than the other healthy foods. The same bias was obtained when participants wrote about how the single exemplar food was different from the other listed foods. The fact that focusing on similarities between the exemplar and the other stimuli seemed to have little effect on the superiority bias suggests that it is quite robust. Furthermore, thinking about the target’s differences with the remaining items did not inflate the bias. In short, what judges cognitively “did” with the items prior to making the comparison judgment did not affect the comparative rating.
Experiment 3: “Comparing Apples to Oranges”
In the third study, we modified the conventional procedure, whereby judges compare one randomly selected exemplar to a set of exemplars of the same positive category (Giladi & Klar, 2002). In light of the adage, “You can’t compare apples to oranges,” is the bias only obtained when all items represent the same category? Or do judges rate a randomly selected favorite movie more favorably than an array of several favorite things from diverse categories, such as favorite foods, vacation spots, and sports. Stated simply, does the NSSB occur even for the diffuse and global reference group of “likeable things?”
Method
Participants and Design
Eighty-one undergraduate students enrolled in an elementary psychology course earned partial course credit for participating. Each was randomly assigned to one of two between-participants conditions (self-selected “free categories” or experiment imposed “diverse categories” conditions).
Material and Procedures
Participants were recruited in groups of up to eight for a study of judgments about “people, places, and things.” Upon arrival at the lab, participants received a folder containing a manila envelope, six slips of paper clipped together, and an instruction sheet. One half of the participants were randomly assigned to the “diverse categories” condition in which they were to generate “one favorite” item from each of six different categories—books, musical groups, foods, different parts of the country, movies, and sports. The rest were assigned to the “free category” condition—to generate a list of six different “favorite things,” with no categories imposed. “They could generate the 6 items from the same category, a different category, or a mixture of things from same or different categories.” The only constraint was the items should be things they liked.
All participants were given 5–10 min to write the 6 items on individual slips of paper, to place all the slips in the envelope and shuffle them. Then, they were asked to draw one slip (without looking) and place it on one side of the desk. The remaining items were placed on the other side of the desk. (Side of desk was counterbalanced across participants.) Then participants made a direct comparative judgment of the randomly drawn target versus the other items: “How much do you like/dislike what is listed on the target slip compared to all others?” (−4 = dislike much more than the rest; 0 = about the same as the others; 4 = like much more than the rest).
People are likely to differ in their liking for general categories, which potentially may influence judgments of the randomly selected target in the diverse category condition. To assess this possibility, participants were asked to rate how much they liked each of the six general categories—musical groups, food, different parts of the country, movies, books, or sports on 7-point Likert-type scales (1 = not at all to 7 = very much). Finally, participants were debriefed and thanked for their participation.
Results and Discussion
Preliminary Analyses
To check whether participants assigned to the diverse category condition followed the instructions, two coders independently counted the distribution of categories represented by the exemplars generated by the participants. All of the participants assigned to the diverse category condition generated exemplars from each of the six different categories. For the free condition, two coders counted the number of items from different categories that were generated by each participant. The counts of the coders were highly correlated, r = .76, p < .01. Most participants (95.1%) generated items from at least three different categories.
In the diverse category condition, we tested whether variation in liking for the six imposed categories might introduce a confound. Mean preference ratings ranged from 4.9 to 5.9 (SDs ranging from 1.1 to 1.6); there was no differential favorability across categories.
Direct Comparisons
In the free categories condition, participants rated the target better than the referent group (M = 0.75, SD = 1.9, t(41) = 2.7, p < .01, d = .84). In the diverse categories condition, participants also rated the target to be above the average (M = 0.65, SD = 1.5, t(38) = 2.7, p < .01, d = .88). Thus, regardless of whether the target exemplar and the referent group were exemplars from different categories or a mix of exemplars from the same or different categories, the target was rated more favorably than the remainder. The participants’ randomly selected “favorite thing” was judged more positively than the rest of their favorite things—even if they were from different categories. As noted in the Introduction, the idea of a general evaluative standard for “things” seems implausible and raises concerns about the encroachment of a global standard that is integral to the LOGE model.
General Discussion
Regardless of whether all exemplars were initially ranked, similarities versus differences between the target and the other exemplars were considered, or the target item was compared with items from diverse categories, the NSSB was obtained. Although it is premature to dub the NSSB as “invulnerable,” none of the intuitively plausible limiting factors proved to be its “kryptonite.” The NSSB appears to be both a prevalent and a robust phenomenon.
The precise mechanisms that underlie the NSSB and make it so robust still remain unclear; however, the results provide some insights. The LOGE model (Giladi & Klar, 2002) posits the target is actually being compared to a more global group of items from the same category. Positing the existence of a global standard as an explanation for Experiment 3’s results, however, seems implausible. Participants compared a target to a group of items from different categories, so the global standard would have to be comprised of “all manner of things.” This seems far too diffuse to be a credible mechanism.
As mentioned earlier, focalism refers to the tendency to give more weight to information in the foreground (or salient hypothesis) than to background information (or to alternative hypotheses) (Chambers & Windschitl, 2004; Moore & Kim, 2003). The target item is more salient so it gets more attention and more information is recruited about its positive features than for the referent so the target is favored in the direct comparison judgment. This explanation could account for results found in Experiments 1 and 3, but it seems odd that prior recruitment of similarities (differences) between the target and the other items in Experiment 2 did not reduce (increase) the advantage conferred by focalism. It would appear that the subjects treat the prior comparison making (of ranking or similarity/difference finding) and the direct comparison judgment rating as distinct judgment tasks. Since the target is the focal item in the main direct comparison, it receives the judgmental advantages.
The UAH (Chambers, 2010) also seems to predict that prior similarity finding would decrease while difference finding would augment the NSSB by altering cognitive recruitment of the target’s unique standard. We observed, however, that although similarities and differences were found per the experimental instructions, this had no effects on the NSSB. Again, our favored explanation is that the prior cognitive comparisons do not “carryover” so the target’s salience still can evoke a unique standard of comparison for the direct comparison judgment.
We initially thought that ranking or finding similarities/differences would make information accessible that would carryover much the way priming procedures carryover to the interpretation of subsequent stimuli (e.g., Higgins, 1996). However, priming effects are weaker or are eliminated when the perceiver is aware of the interpretive bias and corrects for it (Bargh, 1989; Martin, 1986). The ranking and mind-set instructions were very explicit and reminiscent of manipulations that inhibit priming effects. Ironically, the explicit nature of the prior comparisons may have blocked effects of any information made accessible has on the final decision.
The other potential contributing factor is that it was not completely known to participants that the target item was going to be directly compared to the rest of the items until the dependent measure was administered. In Experiment 1, the target was not randomly selected until after the ranking of all exemplars. In Experiment 2, the target was selected prior to the similarities/differences mind-set instructions, but participants did not know before receipt of the dependent variable that the target was to be the focal element of the direct comparison. In accord with Windschitl et al.’s (2008) results, the target should receive more attention and weight because it was most recently identified as the “target.” This differential salience should aid and abet focalism and UAH, which should facilitate recruitment of more information supporting the target’s superiority to the rest.
We lack a complete resolution about mechanisms, but the practical implications of the results are clear—prior cognitive comparisons do not block the NSSB—and we do not yet have a de-biasing strategy. Highlighting the target prior to the other exemplars, as Windschitl et al. (2008) did, might be effective, but in many everyday decision-making contexts, prior exposure is logistically impossible or unfeasible. The NSSB appears to be a very robust and resilient effect with consequences for decision making and preference in employment, political, economic, and social policy contexts and may have downstream behavioral implications. The fact that we were unable to create conditions to de-bias the NSSB represents both an intellectual and a practical challenge.
Footnotes
Acknowledgments
The authors wish to thank the members of the CHIP lab for their assistance in collecting data used in these studies.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
