Abstract

Can the “wisdom of crowds” (Surowiecki, 2004) be exploited within a single mind? Yes, one can increase accuracy by averaging multiple estimates from the same person (Herzog & Hertwig, 2009; Hourihan & Benjamin, 2010; Müller-Trede, 2011; Rauhut & Lorenz, 2011; Stroop, 1932; Vul & Pashler, 2008; White & Antonakis, 2013; Winkler & Clemen, 2004). We proposed boosting this crowd-within effect with what we called dialectical bootstrapping (Herzog & Hertwig, 2009; hereafter, H&H): averaging a person’s first estimate with his or her second, “dialectical” estimate, derived from knowledge and assumptions different from those motivating the first estimate. A dialectical estimate ideally has an error with a different sign relative to the first estimate—which fosters the chance of error cancellation. There are different ways to elicit a dialectical estimate. We tested one, the consider-the-opposite strategy (Lord, Lepper, & Preston, 1984), and found that averaging first and dialectical estimates improved accuracy more than simply asking people to make an estimate anew and averaging the two estimates (i.e., reliability condition).
White and Antonakis (2013; hereafter, W&A) reanalyzed our data using a different accuracy measure, concluding that “dialectical instructions are not needed to achieve the wisdom of many in one mind” (p. 116). Here, we delineate where we agree and disagree with W&A.
We concur with W&A that the crowd within works. W&A observed (as have we and other researchers) that averaging two estimates from the same person improves accuracy. Moreover, they obtained this result across different measures of accuracy. We also agree with W&A that “dialectical instructions are not needed to achieve the wisdom of many in one mind” (p. 116); in our previous article, we pointed out (H&H, p. 236) that passage of time appears to be enough to boost the gains obtained by averaging (Vul & Pashler, 2008). Additionally, we highlighted that “accuracy in [our] reliability condition increased as a result of aggregation” (p. 234). Our disagreement with W&A concerns the following question: Can dialectical bootstrapping boost the crowd-within effect beyond the gains observed in the reliability condition (i.e., gains expected to occur when averaging any noisy estimates)?
Dialectical Bootstrapping: Does It Have Surplus Value?
We defined the gain obtained by averaging the responses of a given participant as the “median decrease in error of the average of the two estimates relative to the first estimate” (H&H, p. 234). W&A criticized this accuracy change measure. First, they reported that participants’ first and second estimates in our reliability condition 1 were, on average, identical in 20% of cases, but that first and second estimates were identical in merely 1% of cases in our dialectical condition. Furthermore, W&A reported that our accuracy change measure was confounded with the proportion of identical first and second estimates. Second, W&A noted that they “prefer to measure accuracy change independently of the proportion of identical first and second responses” (p. 115). Third, they noted that our measure has “awkward statistical properties” (p. 115). Finally, when they used a measure that decoupled accuracy change and the proportion of identical responses, accuracy gains did not differ between our dialectical and reliability conditions. From this, W&A concluded that there is no evidence that “encouraging people to alter their responses more often than they would if not given special instructions yields more accurate average responses” (p. 116).
There are various accuracy (and accuracy change) measures, and opinions about their respective merits differ—because of statistical considerations (Armstrong & Collopy, 1992) or because different measures imply different loss functions (Winkler, 2003). Using a robust measure of accuracy change on the item level, we found that dialectical bootstrapping results in accuracy gains that go beyond reliability gains; using a measure of accuracy change at the participant level, W&A found no such advantage. But how persuasive are W&A’s two key reasons to prefer their measure?
First, an alleged weakness of our measure is that it has awkward statistical properties (p. 115)—presumably because it includes a ratio of two variables (W&A cited p. 22 of Pohl, 2007, where Pohl discussed the “awkward statistical properties” of a “quotient of two variables”). W&A, however, also analyzed two variants of our measure that employed either a more reliable denominator or no denominator, and in both cases, the findings obtained with our original measure were confirmed (see their online Supplemental Material).
Second, W&A conjectured that it is better to measure accuracy change independently of the proportion of identical responses, but what is wrong when a genuine psychological fact—that people hesitate to alter their opinion—enters the accuracy analysis? W&A motivated their independence requirement by reference to hindsight-bias research, in which cases of perfect recall must be separated from cases of reconstruction (e.g., Hoffrage, Hertwig, & Gigerenzer, 2000)—because hindsight bias can occur only in the latter cases. This analogy with the hindsight bias, however, is misleading. The goal of dialectical bootstrapping is to increase accuracy by fostering independence between repeated estimates, and the low proportion of identical estimates indicates that this goal was met. Therefore, we disagree with W&A’s stipulation that a measure that gauges accuracy change independently of the proportion of identical responses is preferable.
Conclusion
W&A and we agree that the crowd within works. The question is whether, when, and how dialectical bootstrapping can foster its potential. We are grateful for W&A’s comments. Their reanalysis highlights the necessity of including different accuracy measures in future studies and analyzing whether the results obtained using them converge (and if not, why not). When there is no good reason to prefer one measure over others, aggregating them may be one solution to their plurality (Armstrong & Collopy, 1992, p. 75).
Does dialectical bootstrapping improve accuracy beyond mere reliability gains? Clearly, as W&A showed, when one employs the consider-the-opposite strategy (as in H&H), the advantage of dialectical bootstrapping depends on the accuracy measure. This, however, should not be taken as a general verdict on the dialectical-bootstrapping framework, which we explicitly did “not confine . . . to the consider-the-opposite strategy” (H&H, p. 236). There are many ways to leverage “people’s capacity to construct conflicting realities” (H&H, p. 236), and thus to achieve dialectical bootstrapping. For instance, we are currently exploring the extent to which averaging different non-Bayesian strategies makes people more Bayesian and the extent to which averaging holistic and analytical judgments improves accuracy. The dialectical-bootstrapping framework poses a wealth of questions, including questions concerning how to design successful and robust dialectical techniques, the ecological conditions under which dialectical bootstrapping pays, and whether people intuitively use this strategy. The work on how and when to poll the crowd in one’s head has just begun.
Footnotes
Acknowledgements
The authors thank Laura Wiles for editing the manuscript.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
A Swiss National Science Foundation Grant (100014_129572/1) to both authors supported their research discussed in this Commentary.
