Abstract
Zell and Krizan (2014, this issue) provide a comprehensive yet incomplete portrait of the factors influencing accurate self-assessment. This is no fault of their own. Much work on self-accuracy focuses on the correlation coefficient as the measure of accuracy, but it is not the only way self-accuracy can be measured. As such, its use can provide an incomplete and potentially misleading story. We urge researchers to explore measures of bias as well as correlation, because there are indirect hints that each respond to a different psychological dynamic. We further entreat researchers to develop other creative measures of accuracy and not to forget that self-accuracy may come not only from personal knowledge but also from insight about human nature more generally.
In 2012, we published an article in which we asked nearly 100 college students to predict the score they would receive in an upcoming exam (Helzer & Dunning, 2012, Study 2). Participants displayed significant levels of accuracy, and their predictions correlated with actual scores well within statistically acceptable levels, r(97) = .40, p < .0001, and solidly above the average relationship reported by Zell and Krizan (2014, this issue).
Here, however, is the rub. When we looked at the absolute deviation between each student’s prediction and the actual score he or she attained in our data, we saw that participants missed their actual score by an average of 8.3 points. How far off would their predictions have been if they had adopted a more indirect strategy: setting aside their own particular circumstances and instead predicting that they would just get the score they forecast the average student would obtain? This “self-as-merely-average” strategy would have produced predictions that missed actual performance on average by only 9.4 points—a deviation only 13% larger and not statistically different from the typical miss seen in direct self-predictions, paired t(97) = 1.43, ns. How can this be? How can the direct self-prediction display such a strong relationship with actual performance, yet show no significant reduction in error compared to a strategy dismissing self-knowledge and assuming the self is no different from the average student?
In their review, Zell and Krizan furnish an intelligent and scholarly contribution to the literature on self-assessment with findings of empirical heft and authority. That said, the purpose of this comment is to file one reservation about the conclusions reached in the review, although this reservation implies no fault on the part of the authors. The issue, instead, is the literature. Zell and Krizan have traveled to every empirical territory to harvest as much data as they could from a broad, rugged, and often unmapped literature, but that literature narrows itself in one way that greatly constrains the insights that Zell and Krizan can discover.
That constraint is that researchers commonly limit themselves to measure self-accuracy by using one sole measure: the correlation coefficient. Researchers evaluate accuracy by gauging the strength of relationship between how positively people rate their ability and how well they subsequently perform. The correlation coefficient, however, is not the only measure of accuracy one can use, nor does it tell the full story about self-accuracy and error.
As such, one might be tempted, seeing how comprehensive Zell and Krizan have been, to conclude that one can close the book on self-assessment accuracy research. But we would consider that an incorrect conclusion. Instead, their review provides a welcome jumping off point for much informative work yet to be done. There are many issues to be addressed by stepping beyond the correlation coefficient.
Not One but Two Components Matter for Accuracy (and Error)
Returning to our exam study (Helzer & Dunning, 2012), how can we obtain both a significant correlation between self-prediction and performance yet find no significant enhancement in accuracy when we compare direct self-predictions to self-as-merely-average predictions? The reason is simple, but the correlation coefficient does not hint at it. Participants on average overestimated their exam score by 5.6 points, predicting an average score of 86.2 but achieving an average score of only 80.6. For most of our students, that systematic deviation contributed to the size of the individual “misses” in their predictions. In comparison, participants were comparatively more accurate in their predictions of the average student, underestimating the group mean by a mere 2.4 points (78.2 vs. the reality of 80.6). Thus, substituting one’s impression of an average student’s score for the self would have come close to the actual score that many typical students obtained.
The point here is that the degree of error in any self-prediction can be construed as a combination of two components: the correlation between predicted and actual performance (or, rather, the lack of correlation) and the overall tendency to over- or underestimate performance (Epley & Dunning, 2006). The first component can be labeled discrimination (i.e., how well do people’s predictions discriminate who will perform well and who will perform poorly); the second component can be labeled bias (i.e., the average overall mistake people make in estimating their performance). The names of these components change from literature to literature, but each contributes to predictive accuracy and error regardless of the name given.
In their comprehensive review, Zell and Krizan are largely forced to neglect the bias component in their discussion of self-predictive accuracy. To be sure, they note the importance of bias, and there are studies of bias out there in the literature (see Dunning, Heath, & Suls, 2004, for a review), but to our knowledge the literature using the correlation coefficient (i.e., discrimination) is much larger and more amenable to meta-analysis.
There are many reasons for researchers to study bias as well as discrimination in studies of self-accuracy and to include both measures within the same studies. One key question, not yet addressed in a comprehensive way, is whether the waxing and waning of self-accuracy as indexed by bias plays by the same rules as accuracy measured by discrimination. Even in the face of strong correlation, hefty bias in self-prediction may still arise. In one study of voting, self-predictions strongly correlated with who would vote (φ = .51), but participants overestimated the likelihood they themselves would vote by 24% (90% predicted vs. 66% actual; Epley & Dunning, 2006, Study 2). In another study in which roommates rated the traits of each other, the relationship between discrimination and bias measures of interpersonal agreement was quite meager (r = .22; Hayes & Dunning, 1997, Study 1), again suggesting that the two measures are sensitive to different psychological dynamics.
Examples of the Independence of Discrimination and Bias
Other research areas suggest that discrimination and bias may be responsive to different psychological factors, further corroborating the idea that each contributes independently to self-assessment accuracy.
Overconfidence
One such suggestion comes from the judgment and decision-making literature on overconfidence. In overconfidence studies, participants provide a series of judgments or answers to factual questions about the world, indicating their confidence in each judgment by estimating the probability that it will prove correct. Errors in such estimates can be measured in two ways. First, one can examine the correlation between confidence and accuracy (i.e., discrimination). Second, one can examine whether confidence on average tends to over- or underestimate rates of accuracy (i.e., bias). As a general rule, these studies tend to find modest to strong correlations between confidence and accuracy, but they also find notable overconfidence (see Lichtenstein, Fischhoff, & Phillips, 1982, for a review of classic work).
Errors in confidence judgments appear to be influenced by different factors, depending on which component of error one focuses on. Consider attempts to rid people of overconfidence. One way to rid people of overconfidence is to give them explicit feedback about their true rates of accuracy. People soon learn to lower their confidence estimates—thus reducing bias—but feedback does nothing to enhance discrimination or the correlation between confidence and accuracy across individual judgments (Lichtenstein & Fischhoff, 1980; Stone & Opel, 2000). A second way to try to rid people of overconfidence is to enhance their expertise. This method appears to reduce error in terms of discrimination but not necessarily in terms of bias. That is, educating people about how to make higher quality judgments—such as identifying the historical period from which a work of art is taken—results in greater correlation between confidence and accuracy. However, this education does little to diminish overall bias toward overconfidence (Stone & Opel, 2000).
Self-prediction versus peer prediction
Another example that discrimination and bias are influenced by different factors comes from research comparing the quality of self-predictions with the quality of peer predictions for anticipating future performance or behavior. On correlational measures, no overall advantage emerges for either type of prediction: Peer prediction is either as accurate as self-prediction (for reviews, see Dunning, 2005; Dunning et al., 2004) or each trades off having the upper hand depending on the type of trait under scrutiny (Vazire, 2010; Vazire & Carlson, 2011). However, when it comes to measures of bias, peer predictions tend to win out, in that they fail to contain the unrealistic levels of optimism so often seen in self-prediction (for examples, see Epley & Dunning, 2000, 2006; Helzer & Dunning, 2012; McDonald & Ross, 1999).
Other Measures Allow Additional Insights
Beyond examining whether discrimination and bias are differentially influenced by a variety of psychological factors, we urge future researchers to consider yet other measures of accuracy and error that might inspire or address questions not yet articulated. Sophisticated and interesting measures of accuracy and error already exist, such as the Brier (1950) score and the diverse components into which it can be decomposed (for a review, see Yates, 1982).
One can also explore accuracy and error using completely novel measures. As an example, let us return to the average “miss” measure (the absolute value of the difference between predicted and actual performance) discussed above in our exam study and ask a novel question of the Helzer and Dunning (2012, Study 2) dataset. With that measure, one question we can directly ask is whether subjective prediction or objective performance is most problematic for self-accuracy. Do errors in self-knowledge vary as a function of forecast? Or does the real problem in self-accuracy lie with the objective performance to come?
Figure 1 illustrates errors in prediction (the y axis) in two different ways: as a function of the prediction made (1A) and as a function of subsequent performance (1B). (A quadratic equation best fit the data for the latter—ps < .001 for both linear and quadratic components—and so that is the relationship displayed.) Figure 1A suggests that the problem of self-error may not really reside on the prediction side. Participants displayed the same degree of error no matter how pessimistic or optimistic they were about their upcoming exam. However, Figure 1B suggests that their actual ability at the task does matter. Participants were quite accurate when performing well. They just seemed not to be able to anticipate when their performance would go south.

Errors in prediction (the y axis) illustrated as a function of the prediction made (A) and as a function of subsequent performance (B).
Thought of in this way, self-prediction errors might not be a function of the prediction itself but rather the underlying event that people will subsequently encounter. That is, whether a person is predicting high or low will not tell the researcher whether this person will be more or less likely to be in error. The problem is not any psychological feature attached to the process of prediction. What is important, instead, is that some people are fated to encounter objective events (like poor performance) that they simply cannot foresee. In other words, the best way to improve self-accuracy is simply to make everybody better performers. Doing so helps them to avoid the type of outcome they seem unable to anticipate. Discerning readers will recognize this as an oblique restatement of the Dunning–Kruger effect (see Dunning, 2011; Kruger & Dunning, 1999), which suggests that poor performers are not in a position to recognize the shortcomings in their performance.
The literature on overconfidence echoes this observation, in that a prime determinant of whether people will prove overconfident is often not the level of subjective certainty they express but rather the objective accuracy they attain in their judgments (see Lichtenstein, Fischhoff, & Phillips, 1982). Those doing better show less overconfidence exactly at the rate that their accuracy rises to reach closer to their expressed levels of confidence.
Accuracy Is Not Always About the Individual
We have one last observation to make about going beyond the correlation coefficient. In his elegant analysis of human judgment, Cronbach (1955) noted that there are many variants of accuracy. In particular, he made a distinction between differential (or real) accuracy and stereotype accuracy. Differential accuracy refers to anticipating individual differences in reactions to the same situation or context— for example, when one correctly knows that Harry likes romantic comedies more than Sally. Stereotype accuracy means knowing how reactions change in people in general depending on the situation or context—for example, when one knows that people in general (including both Harry and Sally) like romantic comedies more than workplace safety training videos.
The correlation coefficient usually calculated focuses on the differential component of accuracy, missing the fact that accuracy about even a specific person often flows from knowledge about human nature in general—that is, stereotype accuracy. Does Jim know, for example, when he should seek out a medical specialist to answer a question about his trick knee rather than rely on his neighbor (or himself)? The answer depends on how accurately he estimates the prevalence of medical knowledge across laypeople, not on his specific neighbor.
Recent work suggests an interesting story about self-knowledge and stereotype accuracy. Peers appear to predict each other better than they do themselves because they apply a roughly accurate stereotype accuracy to peer prediction that they shun in self-prediction (Balcetis & Dunning, 2013). In one recent study, participants were asked the likelihood that they or someone else would help an experimenter who had just spilled a set of puzzle pieces on the floor. When predicting the likelihood of someone else helping, participants accurately recognized that a person would be less likely to help if there were others in the room rather than if that person were alone (average perceived reduction of 22% vs. a real observed reduction of 27%). That is, they displayed a degree of stereotype accuracy about human nature by possessing a roughly correct intuition about the bystander apathy effect in other people (Latané & Darley, 1970). For themselves, however, they dismissed any thoughts related to their stereotype accuracy and stated that the presence of others would have no effect on their own behavior (average perceived reduction of only 4%). In that, they were mistaken (Balcetis & Dunning, 2013).
Concluding Remarks
Zell and Krizan’s review is jam packed with information, but we hope that readers will recognize that it is also stuffed with potential inspiration. The review is comprehensive, but its comprehensiveness reveals just how devoted the study of self-assessment accuracy has been to the correlation coefficient. Further studies using such a measure would, of course, be valuable, but it is our hope that future researchers will strive to be more creative in how they assess accuracy. To the extent that future researchers use different, creative, and multiple measures of accuracy, the work on self-judgment itself can only become more complete—and can thus attain a greater accuracy in describing the human condition on its own.
Footnotes
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
The empirical study described was originally supported financially by National Science Foundation Grant 0745806 awarded to David Dunning.
