Abstract
Human judgment often violates normative standards, and virtually no judgment error has received as much attention as the conjunction fallacy. Judgment errors have historically served as evidence for dual-process theories of reasoning, insofar as these errors are assumed to arise from reliance on a fast and intuitive mental process, and are corrected via effortful deliberative reasoning. In the present research, three experiments tested the notion that conjunction errors are reduced by effortful thought. Predictions based on three different dual-process theory perspectives were tested: lax monitoring, override failure, and the Tripartite Model. Results indicated that participants higher in numeracy were less likely to make conjunction errors, but this association only emerged when participants engaged in two-sided reasoning, as opposed to one-sided or no reasoning. Confidence was higher for incorrect as opposed to correct judgments, suggesting that participants were unaware of their errors.
Keywords
Human judgment often violates normative standards, ignoring basic rules of logic and probability (Kahneman, Slovic, & Tversky, 1982). One of the most famous examples of fallacious thinking is the conjunction fallacy. In the well-known Linda problem (Tversky & Kahneman, 1983), a woman is described in a way that makes her seem like a stereotypical feminist: “Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.” Following that description, participants are asked to judge whether it is more likely that (A) Linda is a bank teller, or (B) Linda is a bank teller and active in the feminist movement. 1 Considerable research has shown that the vast majority of people (~80%) believe that the correct answer is B. However, this answer violates a basic rule of probability, because the latter option (Linda being both a bank teller and a feminist) is contained within the first (Linda is a feminist). Tversky and Kahneman (1983) attributed this error to a heuristic, which they named the “representativeness heuristic,” because people appear to base their judgment not on probability concepts such as containment, but on how representative Linda is of a stereotypical feminist.
Observations of major thinking fallacies such as the conjunction fallacy have historically served as evidence for dual-process theories of reasoning (Evans & Stanovich, 2013; Kahneman & Frederick, 2002; Stanovich & West, 2000). One type of mental process (Type 1) is described as being fast, automatic, effortless, associative, and intuitive. The other type (Type 2) is described as being slow, controlled, effortful, and requiring working memory. According to default-intervention dual-process models, Type 1 processing is the usual state of mental operation (i.e., the “default”) because it requires little or no effort, whereas Type 2 processing intervenes only when the need for more elaborative and effortful thought is detected (e.g., detecting that Type 1 processing has resulted in an error). The notion that conjunction errors are the product of fast, effortless thinking is so ingrained in the psychological literature that the presence or absence of conjunction errors has been taken as evidence for the operation of Type 1 or Type 2 processing (e.g., Alter, Oppenheimer, Epley, & Eyre, 2007; Bodenhausen, 1990). However, surprisingly few experiments have been conducted that directly test this assumption. Hence, the present research was designed to test the extent to which the representativeness heuristic, and resultant conjunction errors, reflects the absence of effortful mental processes, and also to test predictions based on further specifications of default-interventionist dual-process models (e.g., Evans & Stanovich, 2013). That is, does effortful thinking reduce reliance on the representativeness heuristic and the frequency of conjunction errors, and if so, under what circumstances and for whom?
The Relationship Between Conjunction Errors and Effortful Versus Noneffortful Thought
Positive evidence for a dual-process view of conjunction errors may include four types of findings: (a) correlations between response time and accuracy, in which people make fewer errors after taking more time to respond; (b) observing more errors under conditions that limit cognitive resources, such as cognitive load; (c) observing fewer errors following manipulations that increase effortful thought; and (d) associations between conjunction errors and individual differences in thinking propensity and ability. There is evidence that people make fewer conjunction errors after taking more time (De Neys, 2006). Moreover, use of the representativeness heuristic and conjunction errors are positively correlated with faith in intuition (Alós-Ferrer & Hügelschäfer, 2012) and are negatively correlated with intelligence (Stanovich & West, 1998), numeracy (Liberali, Reyna, Furlan, Stein, & Pardo, 2012), and performance on the Cognitive Reflection Task (Toplak, West, & Stanovich, 2011), a measure that assesses individual differences in the propensity to engage in analytic thought (Frederick, 2005). However, these findings are correlational and cannot establish a causal relationship between effortful thought and conjunction errors.
There are relatively few examples of research that has directly tested the influence of manipulated effortful versus noneffortful mental processes on participants’ tendency to be influenced by representativeness and make conjunction errors. In fact, we are aware of only two experiments that have tested this hypothesis directly (De Neys, 2006; Villejoubert, 2009). One study showed that participants were more likely to make conjunction errors under high versus low cognitive load (De Neys, 2006). Although suggestive, this study offered somewhat weak evidence for the hypothesis because it involved a small sample (n = 22 per condition) and the critical effect was just barely significant (p < .045 using a one-tailed test). Another study placed participants under response-time pressure versus not, and found that if anything, participants were in fact less influenced by representativeness under time pressure (Villejoubert, 2009).
Other research has tested the hypothesis under conditions that may indirectly influence processing effort, finding, for example, that cognitive disfluency (e.g., resulting from difficult-to-read font sizes) reduces conjunction errors (Alter et al., 2007), and that conjunction errors are more likely for morning-types solving conjunction problems in the evening, and for evening-types solving the problems in the morning (Bodenhausen, 1990). However, it is unclear whether these manipulations influenced effortful thought processes per se, and recent research failed to replicate the former findings (Meyer et al., 2015; Thompson et al., 2013). Finally, other research suggests that people frequently make conjunction errors even after explicitly describing their reasoning (Yates & Carlson, 1986), indicating that these errors might be quite common even after considerable effortful thought. Hence, the evidence that conjunction errors reflect the absence of effortful thought is somewhat slim, and this seems like an important gap in the literature given the tendency to interpret errors versus correct responses on conjunction problems as providing evidence for the presence or absence of effortful thinking.
Predictions Derived From Default-Interventionist Dual-Process Models
Although default-interventionist dual-process models emphasize the distinction between effortful versus noneffortful thought, these models also recognize that it takes more than just effortful thought to override incorrect intuitive judgments. According to one perspective, people make errors not simply because of cognitive miserliness (i.e., lack of thought) but also because of a lax monitoring system. That is, errors are often not overridden by Type 2 processing because errors are not detected in the first place (Evans, 1984; Kahneman, 2002). This is the lax monitoring perspective.
By contrast, an alternative override failure perspective asserts that people are often implicitly or explicitly aware that their initial response is likely an error, but they experience conflict between what they know about probabilities versus what they feel about representativeness. Hence, people answer incorrectly because of a failure to inhibit their intuitive feelings (De Neys & Glumicic, 2008). In support of this perspective, research has shown that people express lower confidence on incongruent as opposed to congruent conjunction problems (De Neys, Cromheeke, & Osman, 2011). The Linda problem from our opening paragraph above is an example of an incongruent problem; by contrast, in a congruent problem participants are given the same description of Linda and are asked to judge the likelihood that “Linda is active in the feminist movement” versus “Linda is a bank teller and active in the feminist movement.” The relatively low confidence on incongruent problems (for both correct and incorrect answers) suggests that people experience more conflict when responding to incongruent problems; thus, people make conjunction errors not because they fail to notice the conflict between the responses, but rather because they fail to override their feelings about representativeness. A similar finding was reported by Villejoubert (2009).
A third and final perspective, which is not necessarily mutually exclusive from those described above, is the Tripartite Model perspective. Recently, Evans and Stanovich (2013) refined their default-interventionist model, acknowledging many weaknesses of older theorizing (cf. Keren & Schul, 2009), and offered a revised perspective in which Type 1 processing was defined as not requiring working memory and operating autonomously, whereas Type 2 processing was defined as requiring working memory and often involves “cognitive decoupling,” a mental action that depends critically on fluid intelligence and involves suspending current beliefs and imagining alternatives. This revised model (Evans & Stanovich 2013; Figure 1) has elsewhere been called the Tripartite Model (Stanovich, 2009; Stanovich, West, & Toplak, 2014). It proposes that Type 2 processing will successfully override the responses produced by Type 1 processes via the workings of two critical components: the “algorithmic mind,” defined as individual differences in fluid intelligence (e.g., problem-solving ability, as opposed to acquired knowledge; Cattell, 1963), and the “reflective mind,” defined as a disposition to engage in effortful thought (Evans & Stanovich, 2013). Hence, according to the Tripartite Model, people are most likely to override the representativeness heuristic and avoid conjunction errors when there is a combined presence of mental effort and skill, with the critical skill being fluid intelligence.

Combined data from Studies 1a and 1b: Association between conjunction problem judgment and numeracy, by condition.
In the present research, we tested whether conjunction errors are reduced by manipulated effortful thought, and we additionally tested predictions borne out of these three perspectives. According to the lax monitoring perspective, undirected effortful thought is unlikely to correct conjunction errors because people do not realize that it is wrong to respond on the basis of representativeness. As a result, the effort that they use in thinking about the problem is likely to simply confirm the representativeness-based response. However, when people are forced to generate reasons in favor of the correct response, this will be more likely to draw their attention to their error, and as a result the overall rate of conjunction errors will be reduced. Moreover, according to lax monitoring, people who choose incorrectly do not realize that they have made an error, and therefore, these individuals should be just as confident in their responses—or perhaps even more confident—than correct responders.
By contrast, according to the override failure perspective, people experience conflict between the correct and incorrect response on incongruent conjunction problems. The override failure perspective might be supported if many participants considered the veracity of the correct response in their reasoning without explicit instructions to do so. If this were to occur, it would suggest awareness of the competing reasons for choosing each response. However, it is also possible that the experienced conflict is more implicit and not easily articulated in conscious reasoning (De Neys & Glumicic, 2008). As a result, since this perspective predicts that participants who make errors will have some awareness that their answer is wrong, this would presumably result in less confidence in errors relative to correct responses.
Finally, the Tripartite Model (Evans & Stanovich, 2013) predicts that a reduction of conjunction errors should result from the combination of deliberative thought and intelligence. This perspective does not appear to make predictions with regard to confidence. However, the model does appear to assume that highly intelligent individuals reason more effectively than relatively unintelligent individuals even in the absence any specific prompting to reason in a particular way.
The Present Research
To our knowledge, no research to date has provided a strong test of the notion that experimentally induced effortful thought reduces the likelihood of conjunction errors, which is surprising considering that conjunction errors are often taken as evidence that a Type 1 process—and not an effortful Type 2 process—has occurred. Furthermore, no research has attempted a combined test of the three aforementioned theoretical perspectives. In this research, the predictions were tested using the “transparent” version of conjunction problems (described in our opening example), which are thought to be easier to solve correctly relative to versions that present a list of possible answers. Hence, the study was designed to maximize the likelihood of reduced errors following effortful reasoning.
In three studies, participants were randomly assigned to either solve a conjunction problem with no instructions (control) or with instructions to think carefully before making a judgment. Instructions to think carefully were experimentally varied such that participants were asked to simply think carefully and generate reasons for the response that they ultimately chose, to generate reasons why a particular answer was correct, or to generate reasons in favor of both the answers. A long history of psychological research has explored the benefits of different types of reasoning strategies, such as considering perspectives that are opposite to one’s own or considering arguments for more than one possible judgment (Anderson, 1982; Arkes, Faust, Guilmette, & Hart, 1988; Hirt & Markman, 1995; Koriat, Lichtenstein, & Fischhoff, 1980; Lord, Lepper, & Preston, 1984; Mussweiler, Strack, & Pfeiffer, 2000). The present research follows that tradition but also specifically tested hypotheses relevant to different dual-process perspectives.
To address predictions based on the Tripartite Model, in Studies 1a and 1b numeracy was assessed as a measure of participants’ cognitive skill. Numeracy is thought to be one component of intelligence (for a review, see Peters & Bjalkebring, 2014), although it has been shown to be predictive of certain judgments and behaviors even when controlling for other measures of intelligence (Brooks & Pui, 2010; Dieckmann et al., 2015). In our review of the literature, we noted that numerical proficiencies appeared to be related to conjunction errors (Agnoli & Krantz, 1989; Liberali et al., 2012), and the logic of conjunction problems involves reasoning about probabilities, which is considered to be a numerical skill (Peters & Bjalkebring, 2014). Hence, in Studies 1a and 1b, we tested the possibility that conjunction errors are related to individual differences in numeracy, and in Study 2, we included both a measure of numeracy as well as a nonnumerical measure of fluid intelligence.
Studies 1a and 1b
Study 1a was the first study conducted in this line of research, and Study 1b was a direct replication conducted 2 years later (and after Study 2). With minor exceptions noted below, both studies used the same survey protocol and outcome measures.
Method
Participants
Participants were students from an introductory psychology course who participated in this experiment in exchange for partial course credit. Our aim was to recruit 100 participants for each of the three between-subjects reasoning conditions, to obtain 80% power to detect a difference in binary responses (conjunction error vs. correct response) of at least 15% between these conditions, assuming a baseline high rate of errors, for example 85% to 90%. Although subjective, we targeted a 15% reduction in errors because we felt that this effect size would indicate a worthwhile decision intervention. There were 294 participants in Study 1a and 291 participants in Study 1b; 38 participants were later removed from 1a and 53 were removed from 1b as a result of having seen the conjunction problems before and/or had incomplete numeracy data. Final analytic samples were N = 256 for Study 1a and N = 238 for Study 1b, resulting in somewhat lower power (70%-75%).
Design
Participants read and responded to a conjunction fallacy problem (Tversky & Kahneman, 1983). After reading the problem, but before making a judgment, participants were randomly assigned to one of the three judgment strategies: no reasoning instructions (control), writing a one-sided argument, or writing a two-sided argument. Study 1a randomized participants to either the “Linda” or “Bill” version of the scenario (see the “Procedure” section of “Studies 1a and 1b”); Study 1b used only the Linda problem.
Procedure
Participants were seated in a private testing room for the duration of the 30-min experiment. 2 The survey was computerized and used Qualtrics software. Students were told that they would read a scenario and that their job was to make the best possible judgment about that scenario. Next, participants read a conjunction fallacy problem (Study 1a randomized participants to one of the two following problems; Study 1b used only the Linda problem):
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in antinuclear demonstrations.
Which of the following is more likely?
Linda is a bank teller
Linda is a bank teller and active in the feminist movement
Bill is 34 years old. He is intelligent, but unimaginative, compulsive, and generally lifeless. In school, he was strong in mathematics but weak in social studies and humanities.
Which of the following is more likely?
Bill plays jazz for a hobby
Bill plays jazz for a hobby and is an accountant
After reading one of these problems, participants were randomly assigned to one of the three judgment strategies. In the control condition, participants simply read the scenario and made a judgment. In two other conditions, participants were instructed to engage in either one- or two-sided reasoning prior to making a judgment. In both Study 1a and 1b, the critical difference between the two reasoning conditions was that in the one-sided reasoning condition participants were prompted to write their arguments in a single textbox and they could write whatever argument(s) they liked, whereas in the two-sided condition there were two textboxes that appeared below the instructions, each with its own prompt, for example, Reasons why “Linda is a bank teller” is the more likely one, and Reasons why “Linda is a bank teller AND active in the feminist movement” is the more likely one. Participants were not allowed to go on until they had written something in each box. In Study 1a, participants were forced to write for at least 2 min before they could advance, whereas in Study 1b there was no minimum time imposed.
In Study 1a, the instructions were as follows:
One-sided reasoning instructions
Take some time to think carefully about your judgment. In the space below, we would like you to spend at least 2 minutes writing about why you might choose either option.
Two-sided reasoning instructions
Take some time to think carefully about your judgment. In the space below, we would like you to spend at least 2 minutes writing about why you might choose either option.
We tried to improve these instructions in Study 1b; in particular, we were concerned that the one-sided reasoning instructions may have inadvertently discouraged any natural tendency to engage in disconfirmatory reasoning. Therefore, in Study 1b, both groups received the same instructions: In your writing, you should try to make the best argument or arguments you can. Take a moment to think about which option is most likely, and write down your thoughts. Ask yourself: Why might either of these options be the smart option to choose? Hence, the only difference between the one- and two-sided reasoning conditions in Study 1b was the presence of one textbox in which participants could write whatever argument they liked versus two textboxes that prompted writing reasons for each response.
After making a judgment, participants indicated whether they felt confident in their judgment (1 = strongly disagree, 5 = strongly agree) 3 and then completed two measures of numeracy: the Berlin Numeracy Test (Cokely, Galesic, Schulz, Ghazal, & Garcia-Retamero, 2012) and an 11-item test developed by Lipkus et al. (Lipkus, Samsa, & Rimer, 2001). Two numeracy measures were used because pilot testing indicated that undergraduates tended to perform well on the Lipkus et al. (2001) test and relatively poorly on the Berlin Numeracy Test (Cokely et al., 2012), and so we expected the two scales in combination to potentially provide greater variability in scores. After reporting age, gender, and other demographics, participants were debriefed and dismissed.
Results
Manipulation checks: Deliberation time
Due to a programming error, only time taken on the deliberation pages was recorded in Study 1a (this error was rectified in subsequent studies). In that study, participants took M = 3.56 min (SD = 59 s) in the one-sided reasoning condition and M = 3.75 min (SD = 88 s) in the two-sided reasoning condition, and these were not significantly different, F(1, 169) = .95, p = .33. In Study 1b, the problem, the reasoning instructions, and the textbox(es) all appeared on one page, and so the timing data included time taken to read the problem and the instructions, write in the box(es), and make a judgment: Participants took an average of 17.98 s in the control condition, 5.74 min in the one-sided reasoning condition, and 5.02 min in the two-sided condition, F(1, 238) = 102.03, p < .001, ηp2 = .465 (note that there was no minimum time restriction in Study 1b). In both the studies, the association between numeracy and deliberation time across all conditions was in a positive direction (Study 1a: r = .13, p = .081; Study 1b: r = .15, p = .015). Also, participants in the deliberation conditions who made a conjunction error spent just as much time writing as participants who responded correctly, Study 1a: F(1, 169) = .61, p = .435, ηp2 = .004; Study 1b: F(1, 159) = 2.36, p = .126, ηp2 = .015. In Study 1b, where timing data were available for the control condition, there was not a significant difference in time taken to make a correct versus incorrect judgment, F(1, 77) = .06, p = .797, ηp2 = .001.
Influence of judgment strategy and numeracy on judgments
The two numeracy measures were moderately correlated in both the studies, r = .35, p < .001 for Study 1a and r = .44, p <.001 for Study 1b. In these analyses, we used a combined numeracy score calculated as the proportion correct out of all possible points. It is worth noting that across the three studies the effects involving numeracy were stronger and more consistent for the Lipkus et al. (2001) measure (see online supplementary analyses). Nonetheless, the combined numeracy score was used throughout for the sake of consistency.
The effects described below were not significantly moderated by problem type in Study 1a (Linda vs. Bill) and so that variable will not be considered further. Participants in the control condition were most likely to choose the incorrect conjunction (Study 1a: 84.9%; Study 1b: 88.3%). The one-sided reasoning condition did not reduce errors relative to control in Study 1a (88.3%), χ2(1) = .45, p = .502, and although errors were reduced more-so by one-sided reasoning in Study 1b (to 80.5%) this was not significantly different from control, χ2(1) = 1.82, p = .177. In both the studies, participants were least likely to choose the incorrect conjunction in the two-sided reasoning condition (Study 1a: 76.3%; Study 1b: 73.4%); this represents a significant reduction in errors relative to one-sided reasoning in Study 1a, χ2(1) = 4.23, p = .040, but not in Study 1b, χ2(1) = 1.12, p = .288.
A logistic regression on scenario judgment was conducted that included dummy-coded condition variables contrasting the control condition to each of the writing conditions. Incorrect judgments were coded as 0 and correct judgments coded as 1. Predictors also included the combined numeracy score and two-way interactions between numeracy and condition. Results showed a trending effect of control versus two-sided reasoning in Study 1a (B = −3.65, SE = 2.00, p = .069) whereas this effect in Study 1b was not significant (B = −2.64, SE = 1.86, p = .155). However, in both the studies there was an interaction between control versus two-sided reasoning and numeracy (Study 1a: B = 5.85, SE = 2.69, p = .030; Study 1b: B = 5.49, SE = 2.71, p = .043). No other effects were significant (all p ≥ .14 in both studies). Combining the data from both studies, the same logistic regression analysis showed that errors were slightly reduced by two-sided reasoning relative to control (B = −2.66, SE = 1.29, p = .040), and there was also an interaction between control versus two-sided reasoning and numeracy (B = 4.96, SE = 1.80, p = .006). No other effects were significant (all p > .24).
Decomposing the observed interaction in the control condition participants who were higher in numeracy were somewhat less accurate, although this was not significant in either study or in the combined data (Study 1a: rpb = −.15, p = .137; Study 1b: rpb = −.04, p = .695; combined rpb = −.09, p = .240). In the one-sided reasoning condition, the relationship between numeracy and accuracy was unstable and not significant in the combined data set (Study 1a: rpb = −.14, p = .190; Study 1b: rpb = .18, p = .102; combined rpb = .02, p = .782). By contrast, there was a positive relationship between numeracy and accuracy in the two-sided reasoning condition (Study 1a: rpb = .19, p = .106; Study 1b: rpb = .29, p = .008; combined rpb = .22, p = .004). Figure 1 displays these effects using the combined the data.
Judgment confidence
A linear regression used dummy-coded conditions, numeracy, and judgment to predict judgment confidence. In Study 1a, participants who incorrectly chose the conjunction were more confident (M = 3.75, SD = .96) than those who chose the correct option (M = 3.30, SD = 1.03, B = −.39, SE = .16, t = −2.38, p = .018). In Study 1b, this effect was not significant (Mincorrect = 3.55 vs. Mcorrect = 3.40, B = −.15, SE = .18, t = −.86, p = .39). In Study 1a, there was also a marginal increase in confidence in the one-sided reasoning condition (M = 3.81, SD = .87) as compared with control (M = 3.55, SD = 1.09, B = .27, SE = .14, t = 1.87, p = .062); in Study 1b, this effect was significant (M = 3.78, SD = 1.06 vs. M = 3.32, SD = 1.04, B = .47, SE = .17, t = 2.80, p = .006). Interestingly, two-sided reasoning did not increase confidence relative to control and no other main effects or interactions were significant in either study (all p > .20). Combining both the studies, there was no effect of numeracy on confidence (p = .367), but participants who made errors were slightly more confident than those who did not (B = −.29, SE = .12, p = .017) and one-sided reasoning increased confidence relative to control (B = .35, SE = .11, p = .001; there was no effect of two-sided reasoning vs. control, p = .271).
Discussion
Studies 1a and 1b indicated that effortful reasoning without explicit instruction to argue in favor of both responses does not reliably reduce conjunction errors, even for individuals who are more numerate. By contrast, instructions to argue both sides improved judgments, but only for more numerate individuals. These results are consistent with the lax monitoring perspective insofar judgment errors were only improved when participants were asked to argue both sides, with the assumption being that this kind of reasoning is more likely to call attention to the possibility of an error. Participants who answered incorrectly did not appear to be aware of their errors; they expressed equal or higher confidence in their judgments than individuals who responded correctly. Hence, these data are least supportive of the override failure perspective, which predicted that participants would be explicitly or implicitly aware of their errors, and therefore, would be less confident in incorrect as compared with correct answers. Also notable was the fact that one-sided reasoning increased confidence and two-sided reasoning did not, even though the latter was more effective at improving judgments (cf., Koehler, 1991; Koriat et al., 1980; Scherer, de Vries, Zikmund-Fisher, Witteman, & Fagerlin, 2015).
Study 1 supported the Tripartite Model assertion that deliberation should be of particular benefit to the highly intelligent (in this case, the more numerate). However, the Tripartite Model does not appear to make predictions with regard to differing effects of skill across different kinds of reasoning; instead, the model appears to assume that intelligent individuals will bring their skills to bear on the problem regardless of whether they are asked to explicitly consider the correct response or not. We found that more numerate participants did not automatically reason effectively without being prompted to consider the correct response, a finding consistent with the notion that even highly intelligent and skilled individuals reason in ways that confirm their preexisting beliefs (Kahan et al., 2012; Stanovich, West, & Toplak, 2013).
Study 2
Study 2 was completed after Study 1a but before Study 1b. The purpose of Study 2 was to replicate Study 1a and also to address some lingering ambiguities. First, we have not yet tested an assertion of the Tripartite Model, which is that error correction depends on fluid intelligence specifically (Evans & Stanovich, 2013). It is possible that the effects that we observed in Study 1 were due to participants’ underlying intelligence, which is associated but not entirely overlapping with numeracy (Peters, 2012; Peters & Bjalkebring, 2014). To address this issue, in Study 2, we included the same measures of numeracy as in Study 1 as well as Raven’s Progressive Matrices, a nonnumerical test that is thought to assess fluid intelligence (Raven, Raven, & Court, 2003).
Second, in Study 1, participants who were asked to argue one side were not told which side to argue for, and most argued in favor of the conjunction. By contrast, participants who argued both sides were forced to provide at least one argument in favor of the correct option. As a result, it is not clear whether error reduction (for those high in numeracy) was the result of insights gleaned from comparing two different arguments, or if instead, it was the result of arguing in favor of the correct option in particular. We addressed this confound in Study 2 by including three writing conditions, in which participants were asked to (a) argue both sides, (b) argue one side, in favor of the incorrect option, or (c) argue one side, in favor of the correct option.
Method
Participants
Participants were students from an introductory psychology course who participated in exchange for partial course credit. This study was conducted in a spring semester during which the undergraduate subject pool was limited. We based our data collection stopping rule on the following considerations: In Study 1a, the two-way Numeracy × Condition interaction of interest was stronger for the subset of the sample assigned to the “Linda” problem, even though interactions involving problem type were not significant. Hence, we used only the Linda problem in Study 2 and aimed to collect a sample of n = 50 per study arm to replicate the sample size for the Linda problem in Study 1a, and continue running the experiment beyond that number to improve power until the semester ended. 4 In this way, we made a trade-off between power concerns and resource limitations, and obtained a total sample of 215 students across four study arms. Of those, 22 were later excluded because they reported having seen the conjunction problem before and four were excluded because they lacked numeracy data, leaving a total of 189 participants for the present analyses. As this final sample size was close to our original goal of N = 200, we chose to proceed with the analyses.
Design and procedure
All participants read the Linda version of the conjunction fallacy problem and then were randomly assigned to one of four judgment strategies. The control and two-sided reasoning conditions were identical to Study 1a. Two other groups were asked to write only reasons why “Linda is a bank teller” is the more likely option, or why “Linda is a bank teller and active in the feminist movement” is the more likely option. Participants in all writing conditions were not allowed to continue until 2 min had elapsed. All other measures were identical to those in Study 1, with the exception that 22 items from Raven’s Progressive Matrices were added (Items D3-D12 and E1-E12; D1 and D2 were used as practice items; Raven, Raven, & Court, 2003). This subset of Raven’s items was selected because it was known to show good variability in scores in this undergraduate population.
Results
Manipulation checks: Judgment time and word count
Time taken to make a judgment was recorded, and in this study, it was computed as the sum of the time taken to read the problem, complete the writing instructions, and make a judgment. Participants took significantly less total time in the control condition (M = 30.16 s, SD = 12.00) as compared with the deliberation conditions, F(3, 189) = 118.23, p < .001, ηp2 =.657. The three deliberation conditions had means ranging from 3.28 to 3.63 min (SD = 49.27-70.89 s) and were not significantly different from each other, F(2, 140) = 1.23, p = .293. Across all conditions, the association between numeracy and time was marginally significant (r = .13, p = .064), as was the association between Raven’s score and time, r = .12, p = .084. Longer time was also marginally associated with greater judgment accuracy, rpb = .14, p = .052.
Influence of judgment strategy and numeracy on judgments
Judgment Strategy had a significant effect on participants’ judgments, χ2(3) = 12.98, p = .005. Participants in the control condition and those who argued in favor of the incorrect response were the most likely to make an error (Ms = 91.3% and 92.1%, respectively), whereas participants who argued in favor of the correct response and those who argued both sides were the least likely to make an error (Ms = 70.0% and 73.8%, respectively).
To test our main hypotheses, we conducted three planned logistic regression analyses that examined the effects of Judgment Strategy, Numeracy, and Raven’s Progressive Matrices on judgments. Each regression compared the control condition with one of the writing conditions and included the two-way and the three-way interactions. We first compared judgments made by control participants with those who argued both sides. These conditions were identical to Study 1. Results showed a main effect of Numeracy (B = −15.74, SE = 6.49, p = .015) and the predicted interaction between Numeracy and Judgment Strategy (B = 5.82, SE = 2.86, p = .042). No other effects were significant, all p > .12 (all effects involving Raven’s score p > .64). In the control condition, participants with higher numeracy were less accurate, rpb = −.40, p = .005, a strong association that was driven by uniformly low numeracy among just four participants who chose the correct response in this condition (scores ranged from 40%-53% correct; average numeracy for other participants = 71% correct). Nonetheless, in the two-sided reasoning condition, this relationship reversed (rpb = .25, p = .100), which although not significant, was similar in size to the effects observed in Study 1a (rpb = .19) and 1b (rpb = .29). These results are displayed in Figure 2.

Study 2: Association between conjunction problem judgment and numeracy, by condition.
Next, we compared control participants with those who wrote about why the (incorrect) conjunction was correct. This analysis revealed only a main effect of Numeracy (B = −19.53, SE = 8.88, p = .028) in which participants higher in numeracy were less likely to answer correctly. Finally, we compared control participants with those who wrote about why the (correct) marginal event response was correct, and there was a main effect of Numeracy (B = −23.08, SE = 9.88, p = .019) and a trending Numeracy × Condition interaction (B = 11.91, SE = 6.46, p = .065). There were no effects involving Raven’s score in either regression, all p > .22. In these regressions, the effects involving numeracy appeared to be driven by the negative association between numeracy and judgments in the control condition: In both of the one-sided reasoning conditions, there was virtually no association between numeracy and judgment (rs = .069 and .065, ps > .63).
Confidence
A linear regression with dummy-coded condition variables examined the effects of Condition, Numeracy, and Judgments on confidence. Results showed only that participants were more confident in errors (M = 3.63, SE = .96) than correct responses (M = 3.18, SE = 1.08), B = −.41, SE = .19, t = −2.12, p = .035. No other effects were significant.
Discussion
Compared to control, participants became less likely to rely on representativeness and make a conjunction error in the two-sided reasoning condition, but this effect depended on participant numeracy. Numeracy was negatively associated with correct responses in the control condition, but was positively associated with correct responses in the two-sided reasoning condition (although the latter association was not significant). By contrast, responses were not associated with numeracy in the one-sided reasoning conditions.
Also, Study 2 showed that a measure of fluid intelligence was unassociated with conjunction errors. Instead, only numeracy—controlling for fluid intelligence—was associated with judgments. This contrasts somewhat with the Tripartite Model’s focus on fluid intelligence as a critical skill. Nonetheless, an important limitation of this study was that it was underpowered, and perhaps fluid intelligence would have emerged as a significant predictor in a larger sample. Hence, these data support the general assertion that both deliberation and cognitive skill matter for conjunction responses, but may be somewhat inconsistent with specific predictions regarding fluid intelligence. Finally, participants were more confident in incorrect responses than correct ones, which is consistent with the lax monitoring perspective, which asserts that people are unaware of their errors.
Meta-Analysis
Next, we meta-analytically combined the three studies. In light of concerns pertaining to selective reporting of positive results, especially when the reported p values cluster just below .05 as they did in these studies (Kühberger, Fritz, & Scherndl, 2014; Simonsohn, Nelson, & Simmons, 2014), we wish to emphasize that these three studies represent the extent of our data examining effects of numeracy and reasoning strategies on conjunction errors, and that we adhered to the aforementioned predetermined stopping rules for all studies. Data for all studies are available here: https://osf.io/tdhfy/ The purpose of these final analyses was to obtain a more highly powered estimate of the influence of numeracy on judgments across the reasoning conditions and also provide insight with regard to participants’ written responses, which were coded for content (see Analysis of Written Responses section). In these analyses, we collapsed across the different one-sided reasoning conditions to determine whether numeracy is associated with conjunction errors when reasoning is two sided but is not associated with conjunction errors when reasoning is one sided (regardless of the type of one-sided reasoning was elicited).
In total, 12.5% of participants responded correctly in the control condition (27/216), 16.6% in the one-sided reasoning condition (46/277), and 25.4% in the two-sided condition (50/197). Logistic regression using dummy-coded condition variables contrasting the reasoning conditions to control showed a significant increase in the probability of correct responses following two-sided reasoning (B = −3.32, SE = 1.15, p = .004) but not one-sided reasoning (B = −1.78, SE = 1.11, p = .110). There was also an interaction between numeracy and two-sided reasoning versus control (B = 6.09, SE = 1.63, p < .001) and a somewhat weaker interaction between numeracy and one-sided reasoning versus control (B = 3.17, SE = 1.59, p = .046). A separate logistic regression also showed an interaction between numeracy and one-sided reasoning versus two-sided reasoning (B = 2.91, SE = 1.50, p = .053). These interactions were driven by a negative association between numeracy and judgments in the control condition (rpb = −.14, p = .029), no association across the one-sided reasoning conditions (rpb = .03, p = .569), and a positive association in the two-sided reasoning condition (rpb = .23, p = .001). Together, these results further indicate that numeracy was most strongly associated with fewer errors in the two-sided reasoning condition.
Analysis of written responses
Participants’ written responses were coded and analyzed to provide insight into the effects involving reasoning and numeracy. The first author (L.S.) read the written arguments and developed five categories that could potentially classify most responses:
These categories were discussed and their definitions were refined with one of the coauthors (K.D.V.) who then trained two undergraduate research assistants to code each participant response independently. A single response could receive more than one code. Kappa reliability coefficients for each resultant category were all ≥71 (>.90 for 3 out of 5 codes). K.D.V. and the research assistants resolved the remaining inconsistencies with discussion.
Table 1 displays the percentage of participants who used each argument in the one-sided versus two-sided reasoning conditions for the combined Studies 1a and 1b as well as Study 2. Although few participants demonstrated use of the correct extension–containment logic, Table 1 shows that participants who were forced to argue in favor of the correct response were more likely to use the correct logic. Table 2 shows that participants who used the correct logic were highly likely to choose the correct response. Moreover, a significant association between numeracy and use of the correct logic emerged only in the two-sided reasoning condition, suggesting that it was the more numerate participants who were more likely to understand the problem following two-sided reasoning. The only other argument that was positively associated with choosing the correct response was arguing that the trait description was irrelevant (i.e., nondiagnostic); this argument was more weakly associated with correct responding and was not associated with numeracy in the two-sided reasoning condition (Table 2; although this type of argument was associated with numeracy in the one-sided reasoning condition).
Participants’ Use of Each Kind of Argument Across Reasoning Conditions.
Meta-Analytic Associations Between Use of Each Argument Type and Judgments, and Associations Between Use of Each Argument Type and Numeracy.
Also, striking was that among participants who were forced to argue in favor of the correct response, approximately half chose to argue that Linda’s traits were consistent with being a bank teller, a role that was counterstereotypical to the trait description. For example, participants would explain how Linda, who once was clearly a feminist, might have become a bank teller as a way of making needed money or after having lost interest in youthful passions. This style of argument was negatively associated with correct responses (Table 2), suggesting that participants who used this reasoning not only did not understand the problem but also were not particularly convinced by their own reasoning.
General Discussion
The goal of the present research was to determine whether effortful deliberation reduces reliance on representativeness and resultant conjunction errors, and if so, under what circumstances and for whom. Three relevant theoretical perspectives were considered: lax monitoring, override failure, and the Tripartite Model. We consider the evidence in favor of each of these perspectives in turn.
Results were consistent with the lax monitoring view, which suggests that people make conjunction errors not only because of a lack of effortful thought, but also because they do not recognize that they have made an error. Deliberative reasoning generally resulted in fewer errors when participants were specifically asked to reason in favor of the correct response, but deliberative reasoning did not reliably result in fewer errors in the absence of these instructions. This indicates that people did not detect a need to override their initial response, and the majority engaged in confirmatory reasoning unless explicitly prompted to do otherwise. Moreover, participants were more confident in incorrect as compared to correct responses, indicating that they were largely unaware of their errors.
Results were mostly inconsistent with the override failure perspective, which argues that people are aware of the conflict present in these problems and that errors result from a failure to override their intuitive feelings. As noted above, people appeared to believe that their errors were correct, insofar as they were confident in incorrect responses and did not consider the correct response unless explicitly prompted to do so. As this conclusion is different from interpretations of prior findings, the apparent inconsistency warrants some explanation: To date, evidence for the override failure perspective for conjunction problems has come primarily from the comparison between congruent versus incongruent problems, finding that people are less confident on incongruent problems (response options: “Linda is a bank teller” and “Linda is a bank teller and active in the feminist movement”) relative to congruent problems (response options: “Linda is active in the feminist movement” and “Linda is a bank teller and active in the feminist movement”) (De Neys et al., 2011). One interpretation of this finding is that on incongruent problems people can sense the conflict between responding on the basis of probability versus representativeness, which reduces their confidence in their answer. Yet an alternative explanation could be that for congruent problems the desired representative answer (“Linda is a feminist”) is a possible response, whereas for incongruent problems both available responses contain a role (e.g., bank teller) that is not only nonrepresentative but also somewhat counterstereotypical to the trait description. Hence, people may be unconfident in their responses on incongruent problems not because they recognize the conflict between the two responses, but instead, because both answers contain information that is inconsistent with a representativeness-based response. Interestingly, evidence for override failure appears to be stronger for base-rate problems (De Neys, Vartanian & Goel, 2008; De Neys et al., 2011), suggesting that for these problems people may indeed choose the incorrect response in spite of a sense that it is wrong. Hence, the lack of evidence for the override failure perspective for conjunction problems should not be interpreted as evidence that the perspective is incorrect in a more general sense.
Finally, the present results support, but also appear to qualify, the Tripartite Model. The model suggests that fluid intelligence is the central skill necessary for overriding incorrect responses, whereas we found that conjunction errors are associated with numeracy, controlling for a nonnumeric intelligence measure (Raven’s matrices). Numeracy and nonnumeric intelligence are related but separable constructs (Peters & Bjalkebring, 2014), and the disparate results for numeracy versus Raven’s matrices suggest that perhaps numeracy is a particularly important skill for conjunction problems. Moreover, a positive association between numeracy and correct judgments was only observed when participants were asked to make two-sided arguments, indicating that even highly numerate and intelligent people receive no unique advantage from their skill set unless they are explicitly encouraged to engage in disconfirmatory reasoning. This corroborates other evidence that very knowledgeable or intelligent individuals may often use their skills in the service of confirmatory processing (Kahan et al., 2012; Kahan, Peters, Dawson & Slovic, in press; Nyhan, Reifler, & Ubel, 2013; Vallone, Ross, & Lepper, 1985), expending effort arguing in favor of their initially preferred judgment rather than considering the validity of alternative judgments (Mercier & Sperber, 2011). In this way, these data point to the power of confirmation bias or “myside bias” in reasoning, even among the highly skilled (Stanovich et al., 2013). Together, these results lend support for a model that combines the Tripartite Model’s focus on the role of individual differences in intelligence (but potentially narrows this focus to specific relevant skills, such as numeracy) and the lax monitoring perspective’s focus on individuals’ failure to notice errors and the need for certain types of reasoning to overcome those errors even among the highly intelligent and/or skilled.
In spite of these insights, there is also an important caveat, which is that only a handful of participants reasoned about the problems using the correct logic and in all conditions the majority of participants chose the wrong response. This means that many people who are intelligent, numerate, and who engage in disconfirmatory reasoning may still fall prey to the conjunction fallacy. In this sense, the original conceptualization of the conjunction fallacy as a “cognitive illusion” (Tversky & Kahneman, 1983) might be a more apt description of the error than one that primarily emphasizes the presence versus absence of effortful thought.
Limitations, Strengths, and Future Directions
As with any single article, these data should not be taken as the final verdict with regard to these hypotheses and theoretical issues; future research should seek to further corroborate (or refute) these findings. One unexpected finding was that in the control condition participants who answered correctly were somewhat less numerate than those who answered incorrectly. This conflicts with prior findings, some of which served as the impetus for examining numeracy in the first place (e.g., Liberali et al., 2012). One issue is that a very small proportion of participants, particularly in Study 2 where just four participants in the control condition responded correctly, drove the negative association. Notably, the observed meta-analytic correlation in the control condition for Studies 1a and 1b was negative but very small and not significant, r(168) = −.09, p = .240, and the correlation became larger and significant, r(214) = −.14, p = .029, when adding Study 2. Furthermore, although the direction of the association in the control condition was consistently negative, the effect size was quite divergent across studies (rs = −.06 to −.40). By contrast, the association between numeracy and judgments in the two-sided reasoning condition was more consistent in size across the two studies (rs = .19 to .25).
Another limitation of the present studies is that the fluid intelligence hypothesis was tested with only one measure and in an underpowered study. It is possible that some other measure of intelligence or executive function would have shown that the effects involving numeracy could be explained by a broader or different underlying cognitive ability, and future research should explore this possibility. Nevertheless, the fact that the Raven’s scores and numeracy scores were moderately correlated, showed good variability, and yet had completely different relationships to conjunction errors, indicates that the numeracy measure assessed skills that were more relevant to conjunction errors. This is the first research to our knowledge that compares different measures of cognitive ability and their association with conjunction errors, whereas much of the past research has measured individual differences in intelligence using less specific measures of intelligence, such as standardized test scores.
In the present research, we were interested in conjunction errors because they have been used so frequently as evidence in favor of dual-process models. It may or may not be the case that the present findings are relevant to other judgment problems. Even though our data indicate that the lax monitoring view explains conjunction errors better than the override failure perspective, the latter perspective might better explain other judgments, such as base-rate neglect. Indeed, it may ultimately prove to be a mistake to assume that the same cognitive processes underlie responding on different types of judgment problems, even when the problems seem very similar. Future research should explore not just the similarities between these problems but also the differences, which may shed further light on the cognitive processes that are necessary to solve them correctly.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
The supplemental material is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
