Abstract
To explore hypotheses based on Stanovich’s proposal that analytic processing comprises a reflective-level, an algorithmic level, and specific mindware, 342 participants completed measures of thinking dispositions, general ability (GA), numeracy, and probabilistic and nonprobabilistic reasoning. In a control condition, numeracy predicted probabilistic reasoning at high levels of both thinking dispositions and GA, and GA predicted nonprobabilistic reasoning at high levels of thinking dispositions. In a logic instruction condition, numeracy predicted probabilistic reasoning when GA was high, and GA affected nonprobabilistic reasoning directly. Thinking dispositions moderated neither relationship. Instead, instructions facilitated reasoning for low thinking disposition/high-ability participants, suggesting that logic instructions cued low thinking disposition individuals to engage in higher order reflective processing. The evidence is consistent with the proposals that reflective processes are essential to the allocation of algorithmic resources, and algorithmic resources are necessary for effective mindware implementation.
Keywords
Introduction
The evolution and refinement of dual-process theories (e.g., De Neys, 2012, 2015; Evans, 2012; Evans & Stanovich, 2013) have included increasing specification of the functions and operations commonly attributed to Type I (automatic, preconscious) processing. For instance, findings from experimental studies (e.g., Handley & Trippas, 2015; Thompson & Johnson, 2014) not only support speculations that Type I processing often leads to normative reasoning and decision-making (Evans, 2012; Stanovich, 2011) but also indicate that the biases often associated with nonnormative responses are sometimes produced through Type II (conscious and deliberative) processing. Indeed, it is now recognized that Type II processing often leads to errors due to failed decoupling, inability to sustain decoupled representations, post hoc response rationalizations, and/or faulty mindware (see Bago & De Neys, 2017; Frey, Johnson, & De Neys, 2017; Newman, Gibb, & Thompson, 2017; Pennycook, Fugelsang, & Koehler, 2015; Stanovich, 2018) and that individual differences in intelligence may reflect differences in Type I processing (Thompson, Pennycook, Trippas, & Evans, 2018). However, hypotheses derived from other revisions of dual-process theories, such as those proffered by Stanovich, Evans, and coworkers (e.g., Evans & Stanovich, 2013; Stanovich, 2011; see also Pennycook et al., 2015) have received less empirical attention.
Specifically, Stanovich (2009, 2011, 2018; Stanovich, West, & Toplak, 2014) has argued that analytic processing occurs in two partially independent levels, the reflective and algorithmic levels. The reflective level, comprising beliefs, goals, and penchants to evaluate task requirements and utilize intellectual faculties, involves the metacognitive processes that guide algorithmic operations (e.g., initiating override of Type I processing, governing the expenditure of cognitive resources, determining when to utilize “mindware”). The algorithmic level comprises two related sublevels: (a) domain-general abilities (e.g., fluid intelligence) and resources (e.g., working memory) and (b) mindware (Stanovich, 2009). Mindware includes acquired strategies, rules, and procedures that can be applied to specific problems.
It follows from Stanovich’s (2011, 2018) theoretical position that the efficiency with which algorithmic resources and abilities are used is partially determined by the adequacy of reflective-level operations. That is, if reflective-level thinking guides algorithmic processing, the effectiveness of the latter should depend on the quality of the former. It also follows from this position that, within the algorithmic level, the efficacy of mindware is partially determined by domain-general resources. Thus, people who have the requisite mindware (e.g., numeracy) for a task, but lack sufficient resources to inhibit interference and sustain decoupled representations, should have difficulties utilizing that mindware.
Consistent with the postulate that algorithmic resources and abilities partially determine response quality, general intellectual ability—which indexes algorithmic-level functionality—predicts responses on several decision-making and reasoning tasks (see, however, Thompson & Johnson, 2014; Thompson et al., 2018). Consistent with the thesis that reflective processes guide algorithmic operations, thinking dispositions (TDs) (i.e., epistemic beliefs, intellectual/motivational dispositions)—which partially index the quality of reflective-level operations—often account for variance in reasoning beyond that explained by general ability (GA) (for reviews, see Stanovich, 2009, 2011). Finally, extant evidence is consistent with a partial partition between general algorithmic resources and specialized mindware. Numeracy, for instance, is mindware involving one’s understanding of, and ability to assign meaning to, mathematical concepts (Peters, 2012; Reyna & Brainerd, 2008). Recent findings indicate that numeracy is not only related to GA but also explains variance in probabilistic reasoning beyond that attributable to GA and other aspects of algorithmic competence (Klaczynski, 2014; Liberali, Reyna, Furlan, Stein, & Pardo, 2011; see also Peters, 2012).
One of the foremost goals of this study was to test hypotheses derived from Stanovich’s (2011, 2018) theory by examining the relationships of TDs, GA, and numeracy to probabilistic and nonprobabilistic reasoning. A first general hypothesis based on the theory is that, if reflective-level operations initiate algorithmic-level processing, then high reflective-level functioning should be a necessary (but not sufficient) condition for consciously generating normative responses. A second general hypothesis is that, if domain-general algorithmic resources and abilities are required for sustaining decoupled representations and effectively utilizing specialized mindware, then high algorithmic capacity should be a necessary (but not sufficient) condition for generating normative responses. A third implication is that, to respond normatively on probabilistic—but not on nonprobabilistic—reasoning tasks, a third necessary (but not sufficient) condition is high numeric ability. This interpretation led to the general hypotheses that TDs and GA should moderate the effects of numeracy on probabilistic reasoning and, because numeracy should not affect nonprobabilistic reasoning, that TDs should moderate the effects of GA on nonprobabilistic reasoning. More specific hypotheses are presented below.
1
Hypothesis 1: Numeracy would affect probabilistic inferences primary among individuals high in both TDs and GA. Hypothesis 2: GA would affect nonprobabilistic reasoning primarily among individuals high in TDs.
2
The effects of logic instructions
Somewhat different hypotheses were tested for a condition in which participants were instructed to think logically. Several investigations have shown that such instructions increase cognitive effort (Handley, Newstead, & Trippas, 2011) and, at least on some tasks, facilitate reasoning (e.g., Epstein, Lipson, Holstein, & Huh, 1992; Ferreira, Garcia-Marques, Sherman, & Sherman, 2006; Klaczynski, 2001b; Neilens, Handley, & Newstead, 2009). In addition, logic instructions impact the reasoning of high intellectual ability individuals more than that of low-ability individuals (Evans, Handley, Neilens, & Over, 2010; Morsanyi, Primi, Chiesi, & Handley, 2009). However, the reflective–analytic association described in the foregoing paragraphs suggests that logic instructions should facilitate the reasoning of high-ability individuals through their effects on operations at the reflective level. In a general sense, instructions may serve as “externally-imposed surrogates for well-calibrated thinking dispositions” (Klaczynski, 2014, p. 10) and consequently cue engagement in higher order reflective operations. High-ability individuals should be particularly likely to improve because they understand logic instructions—and can keep those instructions in mind across different tasks—while also retaining sufficient resources to maintain decoupled representations, select appropriate mindware, and conduct accurate computations. If logic instructions prompt thinking similar to that characteristic of high reflective-level functioning, TDs should moderate neither the numeracy–ability–probabilistic reasoning relationship nor the ability–non–probabilistic reasoning relationship. That is, when logic instructions are given, neither effects related to GA nor effects related to numeracy should depend on TDs. Therefore, with the exception of the last hypothesis, the logic instruction hypotheses were limited to GA and numeracy. Hypothesis 3: If logic instructions act as surrogates for TDs, then numeracy should directly affect probabilistic reasoning when GA is high. Thus—regardless of whether TDs were low or high—GA was expected to moderate the effect of numeracy of probabilistic reasoning. That is, if GA indexes algorithmic resources and if algorithmic resources are necessary for mindware to function adaptively, then a high level of GA is necessary for the implementation of numeric skills which, in turn, are important for the solution of probabilistic reasoning problems. Hypothesis 4: Again, if logic instructions act as surrogates for TDs, then GA should directly affect probabilistic reasoning. That is, the relationship between ability and nonprobabilistic reasoning should not be moderated by TDs. Because it is not a prerequisite for performance on nonprobabilistic problems, the effect of GA on nonprobabilistic performance should be found regardless of numeric ability. Hypothesis 5: Low TD participants in the instruction condition would perform better than low TD participants in the control condition. However, prior research indicating that logic instructions facilitate reasoning more for high ability than for low-ability individuals suggested a more nuanced hypothesis: Low TD/high GA participants in the instruction condition would perform better than low TD/high-ability participants in the control condition.
Methods
Participants
Participants were 342 (178 female; M age = 18.93 years, SD = 1.19 years; range = 18–23 years) undergraduates who earned course credit. Experimental sessions were conducted with groups of three to eight participants.
Design and procedure
Participants were randomly assigned to control (no logic instructions; N = 169) and logic instruction (N = 173) conditions. Control condition participants were told to act as though they were in the situations described in the problems and select the best answer to each problem. Participants in the instruction condition were told to “act as if you are a perfectly logic person,” use only the information in the problems, avoid making decisions quickly, consider response options carefully, and select the best answer (see Epstein et al., 1992; Klaczynski, 2001b; Macpherson & Stanovich, 2007).
After responding to demographic questions and providing self-reported Scholastic Aptitude Test (SAT) scores, participants completed a numeracy test, GA tests, a TD questionnaire, and eight reasoning problems. Because the ability measures were timed, they were administered before the other measures. For about half of the participants, the reasoning problems were presented next, followed by the TD questionnaire and, finally, the numeracy measure. For the remaining participants, presentation order was ability tests, TDs, numeracy test, and reasoning problems (order was not related to performance on any task, largest r = .11).
Materials
Thinking dispositions
The 48-item TDs questionnaire comprised five subscales. The impulsive decision-making (reverse scored) scale (12 items; see Patton, Stanford, & Barratt, 1995) tapped reliance on first impressions and beliefs that “spur of the moment” decisions are good decisions. The deliberation scale (10 items) measured valuation of complex thinking and deliberation. The reflectiveness versus intuition scale (10 items) assessed beliefs that logical decisions are superior to decisions based on intuition (see Epstein, Pacini, Denes-Raj, & Heier, 1995). The flexible thinking scale (10 items; see Macpherson & Stanovich, 2007) measured beliefs that complex decisions cannot be reduced to “either-or” choices and willingness to change decisions after reflection. The epistemic regulation scale (six items) assessed beliefs in the mutability of “truths,” the value of considering conflicting arguments, and the importance of inhibiting judgments that “come to mind” automatically. Composite scores were used because the subscales were correlated (smallest r = .27), internal consistency was higher for the total scale (α = .81) than for four subscales, and total scores (M = 196.61; SD = 23.22; average rating = 4.09) predicted responses better than subscale scores.
General ability
Verbal ability, indexed by vocabulary scores, was assessed because verbal ability is among the foremost indicators of general and crystallized intelligence. A measure of fluid intelligence, indexed by inductive reasoning scores, was given because fluid intelligence has been viewed as one of the best indicators of algorithmic functioning (Stanovich, 2011). Each test was modified from the original version by removing items that were almost always answered correctly or incorrectly in pilot testing (N = 78).
Verbal ability
The 3-minute 30-item vocabulary test (M = 15.21, SD = 2.25) was modified from the Shipley-2 vocabulary test (Shipley, Gruber, Martin, & Klein, 2010; pilot testing r = .89 between the original and modified tests). Shipley-2 vocabulary scores relate moderately/strongly to academic achievement, general intelligence, and crystallized intelligence (Kaya, Delen, & Bulut, 2011). On each item, a target word (e.g., jocose) was followed by four options (e.g., humorous, paltry, fervid, plain). Participants were instructed to select the word with same meaning as the target.
Inductive ability
The 12-minute 20-item inductive reasoning test (M = 11.41, SD = 1.88) was modified from the Primary Mental Abilities (PMA) Letter Sets test (Thurstone, 1962; pilot testing r = .84 between the original and modified tests). PMA-Letter Sets (LS) scores correlate well with general intelligence and fluid intelligence (Hertzog & Bleckley, 2001). From five sets of four letters (e.g., ACDE, MOPQ, FGIJ, DFGH, TVWX), participants indicated the set that did not belong (e.g., FGIJ).
A composite score (M = 26.62; SD = 3.56) was used because inductive and vocabulary scores correlated moderately (r = .47) and because the composite–response correlations were higher than the verbal ability–response (r range = .22 to .26) correlations and the inductive ability–response correlations (r range = .19 to .27).
Numeracy
The numeracy test (M = 12.30, SD = 3.01; α = .79) assessed attentiveness to numerical information and basic probability skills (e.g., understanding outlier relevance, computing conjunction likelihood; see Klaczynski, 2014) and contained 20 problems from, or adapted from, other instruments (e.g., Garfield, 2003; Irwin & Irwin, 2005; Lipkus, Samsa, & Rimer, 2001). On each problem, participants selected the correct response from 3 to 5 options. Sample items are presented in the Appendix A.
Heuristics and biases tasks
Eight reasoning problems were presented in one of four randomly determined orders. Of the four probabilistic problems, two were gambler’s fallacy (GF) problems and two were ratio bias problems. Of the four nonprobabilistic problems, two were “if-only” (IO) problems and two were experimental reasoning (EXP) problems. The nonprobabilistic problems were selected because performance on these problems has been of considerable interest to heuristics and biases researchers and because we could find no obvious reason that performance on the tasks we selected would relate to numeracy.
Scores could range from 0 to 2 on each task, but proportions of normative responses are presented in the “Results” section (sample problems are presented in Appendix B).
GF problems
When event occurrences are independent of previous occurrences, the GF is indicated by beliefs that “steaks” make future event probability lower (or, for losing streaks, higher) than the objective probability. For example, in evaluating winning streaks, people commit the fallacy when they believe the likelihood of winning on the next trial is lower than the actual likelihood. Consequently, despite knowing that occurrences are independent, people often neglect objective probabilities and reason as though probabilities are self-correcting (Terrell, 1994).
On each problem, participants were told (a) the single-trial event probability; (b) event probability was identical on each trial; and (c) in a recent series, the event occurred at a rate that was higher than the single-trial probability. For instance, the single-trial probability that a ball revolving around a spinning wheel would fall into a winning pocket was 8 in 40. However, in the six most recent trials, the wheel stopped on the winning color five times. From a range of probabilities (expressed as percentages from 0% to 100% in 5% increments), participants indicated the probability that the event would occur on the next trial. Responses were normative when the objective probability was selected.
Ratio bias problems
On the ratio bias (RB) problems (Kirkpatrick & Epstein, 1994), participants judge whether targets (e.g., winning lottery tickets) are more likely to be selected from large numerator/large denominator samples (e.g., 10 winners in 100 lottery tickets) or small numerator/small denominator samples (e.g., one winner in 10 tickets). Although the actual probability of selecting a target is higher or identical in the small numerator/small denominator problem, people often believe that targets are more likely to be selected from the larger sample than from the smaller sample. From three options (i.e., the small sample, the large sample, and neither sample), participants indicated if one sample was more likely than the other to yield a target. Responses were normative when the “neither sample” option was selected.
IO problems
The “IO” fallacy occurs when a behavior is judged more negatively when it appears that a negative consequence could have been more easily anticipated and avoided in one of two logically identical situations (Epstein et al., 1992). Problem contained information about the decisions of two people and the identical negative outcomes associated with those decisions. Because the negative outcomes were equally unforeseeable, neither decision was riskier. The circumstances of one decision appeared more mutable than that of the other decision and therefore were more likely to provoke counterfactual thinking about “what would have been.”
For example, on a shopping trip, Tom parked his car in a half-empty parking lot and decided to take a spot closest to the stores at which he wanted to shop. Robert parked in the same parking lot when there was only one empty space. When later backing out, both Tom and Robert had accidents. Because Tom had more control over where to park, people often believe his accident could have been avoided and therefore that his parking decision was inferior to Robert’s decision. Participants judged which, if either, person made a better decision (e.g., 1 = Tom’s decision was much better; 2 = Tom’s decision was somewhat better; 3 = Tom and Robert made equally bad decisions; 4 = Robert’s decision was somewhat better; 5 = Robert’s decision was much better). Judgments that the decisions were “equally bad” were normative.
Experimental reasoning
To assess EXP, participants judged the efficacy of social policy interventions. Because the implementation of the interventions was flawed, definitive conclusions could not be drawn. For example, a governor began a program to reduce crime by increasing the frequency of community night watches and eventually coupled this initiative with a program to strictly enforce drug laws. Despite later finding that crime rates decreased, causal claims attributing the decline to either initiative were not possible (adapted from Lehman, Lempert, & Nisbett, 1988). Problems contained descriptions of interventions, apparent effects, and the conclusions of individuals who evaluated the interventions. Responses were normative when participants indicated that poor design prohibited definitive conclusions.
Results
As expected, the correlations among TD, GA, and numeracy were significant (r range = .19–.23, ps < .001). In support of the probabilistic/nonprobabilistic distinction, the probabilistic tasks (GF-RB: r = .35, p < .001) and the nonprobabilistic tasks (IO-EXP, r = .29, p < .001) related positively. Although three of the four correlations between probabilistic and nonprobabilistic reasoning (i.e., GF-EXP, RB-IO, and RB-EXP: rs = .13, .13, and .16, respectively, ps < .05) were significant, only the GF-RB (r = .26, p < .001) and IO-EXP (r = .23, p < .001) relationships were significant after controlling for variance associated with TD, GA, and numeracy. A factor analysis yielding a two-factor solution (probabilistic and nonprobabilistic reasoning; eigenvalues = 1.58, 1.08; loadings > .75) explained 66.53% of response variance, further justifying use of composite probabilistic and nonprobabilistic scores in subsequent analyses.
Effects of logic instructions
Mean composite and individual task scores, by condition and ability group (for these analyses, based on a median split), are presented in Table 1. Instructions facilitated reasoning on RB and IO problems, Fs (1, 340) = 8.15, 6.15, ps = .005, .014, respectively, ηp2s = .02, but not on GF and EXP problems. Nonetheless, logic instructions improved reasoning on both the probabilistic and nonprobabilistic reasoning tasks, Fs (1, 340) = 7.11, 5.09, ps = .008, .026, respectively, ηp2s = .02. Replicating prior work (e.g., Evans et al., 2010; Morsanyi et al., 2009), the benefits of instructions were limited to high GA participants (see Table 1).
Mean proportions (and SDs) of normative probabilistic and nonprobabilistic responses by condition and ability group.
Note: GA: gambler’s fallacy.
*p < .05; **p < .01.
Relationships of TDs, cognitive ability, numeracy to reasoning
The correlations of the hypothesized predictors to individual task scores and composite PROB and NPROB scores are presented in Table 2. Consistent with expectations, TD, GA, and numeracy associated positively with individual tasks and composite scores; further, the numeracy–PROB correlation was stronger than the numeracy–NPROB correlation (z = 1.93, pone-tailed < .05). The same general patterns were found when the predictor–reasoning correlations were examined separately in the control and instruction conditions. However, when GA and numeracy were controlled, the TD–PROB and TD–NPROB correlations were significant in the control condition (rs = .22, .25, ps < .01), but not in the logic condition (rs = .12, and .14, lowest p = .078), providing preliminary support for the logic instructions as surrogate for TDs hypothesis.
Correlations of predictors to individual task scores and to composite probabilistic and nonprobabilistic scores.
Note: TD: thinking disposition; GA: gambler’s fallacy; RB: ratio bias; IO: if-only; EXP: experimental reasoning.
*p < .01; **p < .001; ***p < .05.
Because hypotheses differed for control and instruction conditions, two analyses were conducted to ensure that separate analyses by condition were justifiable and determine whether the TD–GA–numeracy–PROB and the TD–GA–NPROB relationships interacted with condition. In two hierarchical multiple regression analyses, SAT and NPROB (or PROB) scores were entered first (as potential covariates). In the probabilistic analysis, TD, GA, numeracy, condition and the TD × GA × Numeracy interaction were entered next, followed by the TD × GA × Numeracy × Condition interaction. In the nonprobabilistic analysis, TD, GA, and condition and the TD × GA interaction were entered second, and the TD × GA × Condition interaction was entered last.
The predictors accounted for 23.8% of the variance in probabilistic reasoning, F (8, 333) = 13.03, p < .001, and 19.5% variance in nonprobabilistic reasoning, F (7, 334) = 11.57, p < .001. The key findings, however, were that the TD × GA × Numeracy × Condition interaction predicted unique variance (2.3%) in probabilistic reasoning, FΔ (1, 333) = 10.03, p = .002, and the TD × GA × Condition interaction predicted unique variance (1.4%) in nonprobabilistic reasoning, FΔ (1, 334) = 5.75, p = .017. These findings justified separate analyses for the two conditions.
The analyses that follow, performed to examine Hypotheses 1 to 4, relied on contemporary techniques for testing moderation hypotheses (Hayes, 2013a). Specifically, Hayes’ (2013b) SPSS “process” macro was used in the analyses. The macro uses ordinary least squares regression to estimate coefficients for each predictor and interaction and computes three levels of each hypothesized moderator (i.e., centered around the mean and ± 1 SD from the mean; subsequently referred to as low, moderate, and high levels). 3
Moderation analyses in the control condition
A first moderation analysis tested the hypothesis that numeracy would directly affect probabilistic inferences only at relatively high levels of TDs and GA (Hypothesis 1). A moderation second analysis tested the hypothesis that GA would directly affect nonprobabilistic reasoning only for participants at relatively high levels of TDs (Hypothesis 2).
Probabilistic reasoning
In “moderated moderation” (Hayes, 2013a), effects associated with an “independent” variable depend on two moderating variables (i.e., a three-way interaction). In Hayes’ (2013b) SPSS macro, “process model 3” tests moderated moderation hypotheses, such as Hypothesis 1. In this analysis, PROB was the dependent variable, numeracy was the independent variable, TD was a first potential moderator, and GA was a second potential moderator (SAT and NPROB were covariates).
A significant TD × GA × Numeracy interaction would indicate that the GA-numeracy–PROB relationship differed as a function of TD level. This analysis also revealed whether the GA × Numeracy interaction was significant at each TD level and whether, within TD levels, the effects of numeracy differed for low, moderate, and high GA participants. Thus, in a broad sense, this analysis paralleled a 3 (TD) × 3 (ability) × 3 (numeracy) analysis of variance (ANOVA); unlike analyses of variance, however, bootstrapping was used to obtain 95% confidence intervals. Confidence intervals were used to infer whether, at each GA level within TD levels, numeracy predicted PROB scores. In the results presented below, LLCI and ULCI refer to lower level and upper level confidence intervals, respectively. Effects were significant when confidence intervals did not cross zero (Hayes, 2013a).
TD, GA, numeracy, and the covariates accounted for 38.48% of the variance in PROB, F (9, 159) = 16.21, p < .0001. Neither this analysis nor subsequent analyses showed effects associated with covariates significant. Of the individual predictors, GA had a significant effect (β = .0163, t = 2.750, ps = .0067, LLCI/ULCI = .0045/.0280; for TD, β = .0016, t = 1.792, p = .0750; LLCI/ULCI = −.0002/.0033; for numeracy, β = .0179, t = 1.687, p = .0936; LLCI/ULCI = −.0031/.0389). A GA × Numeracy interaction (β = .0087, t = 3.249, p = .0014; LLCI/ULCI = .0034/.0139) was qualified by the anticipated TD × GA × Numeracy interaction (β = .0004, t = 3.229, p = .0015; ULCI/LLCI = .0002/.0007) which indicated that effects associated with numeracy depended on TDs and ability. 4
The analyses shown on the left-hand side of Table 3 indicate that the GA × Numeracy interaction was significant when TD was moderate and high. Consistent with Hypothesis 1, the table also shows that numeracy directly affected probabilistic reasoning only when both TDs and GA were at least moderate. These findings are depicted in Figure 1.

Control condition: Effects of ability and numeracy on probabilistic reasoning at low (upper graph), moderate (middle graph), and high (bottom graph) levels of thinking dispositions. Estimates based on covariates set to sample means.
Probabilistic reasoning in the logic instruction and control conditions: Moderated moderation results with numeracy as the independent variable and with TD and GA as moderators.
Note: TD: thinking disposition; GA: gambler’s fallacy.
*p < .05; **p < .01; ***p < .001.
Nonprobabilistic reasoning
A basic moderation analysis (Hayes, 2013b; “process model 1”) was conducted to determine whether the effects of ability on nonprobabilistic reasoning were moderated by TDs and, specifically, whether GA impacted nonprobabilistic reasoning only at high levels of TD. GA was entered as the independent variable, and TD was entered as the moderator (covariates were PROB, SAT, and numeracy). A significant GA × TD interaction would indicate that TDs moderated the effects of GA.
The predictors and covariates accounted for 23.37% of the variance in NPROB, F (6, 162) = 10.31, p < .0001. TD (β = .0023, t = 2.317, p = .0218, LLCI/ULCI = .0003/.0043), and GA (β = .0256, t = 3.910, p = .0001; LLCI/ULCI = .0127/.0386) were significant independent predictors. A significant TD × GA interaction (β = .0007, t = 2.834, p = .0052; LLCI/ULCI = .0002/.0012) indicated that effects linked to GA varied by TD level. As anticipated by Hypothesis 2, follow-up analyses showed that GA affected NPROB significantly only at moderate and high TD levels (see Table 4 and Figure 2).

Control condition: Effects of ability on nonprobabilistic responses at low, moderate, and high levels of thinking dispositions. Estimates based on covariates set to sample means.
Nonprobabilistic reasoning in the control and instruction conditions: Moderation results with GA as the independent variable and TD as the moderator.
Note: TD: thinking disposition.
*p < .001; **p < .01; ***p < .05.
Moderation analyses in the instruction condition
The general hypotheses underlying the analyses below were that logic instructions reduce the effects of TDs. The first analysis examined the conjecture that numeracy—regardless of TDs—would directly affect probabilistic reasoning only when ability was relatively high (Hypothesis 3). The second analysis tested the prediction that—regardless of TDs—ability would affect nonprobabilistic reasoning directly (Hypothesis 4).
Probabilistic reasoning
In a moderated moderation analysis, numeracy was the independent variable, TD was the first moderator, and GA was the second moderator (covariates were SAT and NPROB scores). The predictors, their interactions, and the covariates accounted for 18.43% of the response variance, F (9, 163) = 5.08, p < .0001. Effects related to GA (β = .0232, t = 3.045, p = .0027; LLCI/ULCI = .0082/.0382) and numeracy (β = .0309, t = 2.205, p = .0288; LLCI/ULCI = .0032/.0586) were significant (TD approached significance; β = .0020, t = 1.707, p = .0897; LLCI/ULCI = −.0003/.0043). Of the potential interactions, only the GA × Numeracy interaction was significant (β = .0079, t = 2.083, p = .0388; LLCI/ULCI = .0004/.0154). As shown in the right-hand side of Table 3, at each TD level, numeracy predicted PROB only when GA was moderate and high. These findings indicate that GA, but not TDs, moderated the numeracy-probabilistic reasoning association in the logic instruction condition.
To more precisely examine the moderating effects of GA on the numeracy–probabilistic reasoning relationship, a basic moderation analysis (“Process Model 1”; Hayes, 2013b) was conducted. Numeracy was the independent variable and GA was the potential moderator. Numeracy, GA, the GA × Numeracy interaction, and the covariates explained 18.29% of the variance in PROB, F (6, 166) = 6.63, p < .0001. Effects related to GA (β = .0225, t = 2.898, p = .0043; LLCI/ULCI = .0072/.0378) and numeracy (β = .0303, t = 2.687, p = .0080; LLCI/ULCI = .0080/.0525) were significant but subsumed by a GA × Numeracy interaction (β = .0080, t = 2.579, p = .0108; LLCI/ULCI = .0019/.0142). Additional analyses showed that effects associated with numeracy were not significant when GA was low (t < 1) but were significant when GA was moderate (β = .0303, t = 2.687, p = .0080; LLCI/ULCI = .0080/.0525) and high (β = .0576, t = 4.136, p = .0001; LLCI/ULCI = .0301/.0851). These effects are depicted in the upper graph in Figure 3.

Logic instructions: Effects of ability and numeracy on probabilistic reasoning across levels of thinking dispositions (upper graph) and effects of ability on nonprobabilistic reasoning by thinking disposition level (lower graph). Estimates based on covariates set to sample means.
Both analyses indicated the GA alone moderated the numeracy–probabilistic reasoning relationship when logic instructions were given. Regardless of TD, only when GA was relatively high were the effects of numeracy significant.
Nonprobabilistic reasoning
To determine whether ability affected nonprobabilistic reasoning directly (i.e., across TD levels), a basic moderation analysis, with GA as the independent variable and TD as a potential moderator, was conducted (covariates were SAT, numeracy, and PROB). These variables accounted for 15.01% of variance in NPROB, F (6, 166) = 7.23, p < .0001. GA (β = .0230, t = 3.250, p = .0014; LLCI/ULCI = .0090/.0369) was a significant predictor, although TD approached significance (β = .0018, t = 1.801; p = .0724; LLCI/ULCI = −.0002/.0037). In contrast to the control condition, the TD × GA interaction was not significant (β = −.0001, t < 1; LLCI/ULCI = −.0006/.0004). Consistent with Hypothesis 4, the results in the lower portion of Table 4 show that GA affected NPROB across levels of TDs (see also lower graph, Figure 3).
Between-condition comparisons
The findings presented thus far are consistent with the hypothesis that logic instructions triggered thinking similar to that of high TD participants in the absence of such instructions. However, if this hypothesis is accurate, then low TD participants in the instruction condition should have outperformed than their counterparts in the control condition and, more specifically, low TD/high-ability participants in the instruction condition should have outperformed low TD/high-ability participants in the control condition (Hypothesis 5).
To test this hypothesis, low and high TD groups were created in the control (Ns = 83) and logic instruction (N = 85) conditions. As anticipated, the reasoning low TD instruction condition participants surpassed that of low TD control participants: For PROB, F (1, 166) = 7.79, p = .006, ηp2 = .05; for NPROB, F (1, 166) = 4.08, p = .044, ηp2 = .03.
It therefore appears that the primary beneficiaries of logic instructions were low TD participants. To further explore Hypothesis 5 and reconcile this finding with evidence that logic instructions facilitate reasoning more for high than for low-ability individuals, additional analyses were conducted to compare low TD/high GA participants in the control (N = 33) and instruction (N = 32) conditions. Again, findings were consistent with expectations: Low TD/high GA participants in the instruction condition had higher PROB and NPROB scores than low TD/high GA participants in the control condition, Fs (1, 63) = 15.13, 15.47, ps < .001, ηp2s = .19, .20, respectively. Thus, logic instructions facilitate the reasoning of high-ability individuals primarily when they also are high in TDs.
The “cueing” hypothesis is further supported by comparisons of high TD participants in the control condition and low TD participants in the instruction condition. If instructions cued low TD individuals to engage in reflective processing like that of high TD participants, then the GA × Numeracy interaction on probabilistic reasoning and the direct effect of GA on nonprobabilistic reasoning should have been significant regardless of condition. To examine this conjecture, two moderation analyses were conducted on high TD/control condition participants and low TD/instruction condition participants. In the analysis of probabilistic reasoning, numeracy was the independent variable, ability was a first moderator, and condition was a second moderator. In the analysis of nonprobabilistic reasoning, ability was the independent variable and condition was entered as a moderator (covariates were identical to those in previous analyses).
In both analyses, the direct effect of condition was significant (β = .1370; t = 2.09, p = .0383; LLCI/ULCI = .0075/.2665 for PROB; β = .2595; t = 3.44, p = .0007; LLCI/ULCI = .1108/.4083 for NPROB). For probabilistic reasoning, the only significant interaction was between numeracy and GA (b = .0138, t = 4.58, p < .0001; LLCI/ULCI = .0079/.0198): In both conditions, numeracy affected probabilistic reasoning when GA was moderate (ps < .05) and high (ps < .001). For nonprobabilistic reasoning, GA had a significant effect (β = .0353, t = 4.65, p < .0001; LLCI/ULCI = .0203/.0503) indicating that, regardless of condition, increases in GA were linked to increases in nonprobabilistic reasoning. Table 5 provides estimated means for the high TD control and low TD instruction participants. These findings, with those from the analyses of variance, are congruent with the notion that logic instructions cue low TD individuals to process task information in the same way as individuals who are typically more reflective.
High TD, control condition versus low TD, instruction condition participants: Effects of ability on probabilistic reasoning at different levels of numeracy and effects of ability on nonprobabilistic reasoning.
Note: Estimates based on covariates set to sample means. TD: thinking disposition; GA: gambler’s fallacy.
Discussion
This study addressed the general hypotheses that reflective-level processing influences the expenditure of algorithmic resources and that both reflective and algorithmic processing affect the operation of mindware. Based on Stanovich’s (2009, 2018) theory, reflective-level processing was considered superordinate to both general algorithmic resources and specific mindware because adequate reflective-level functioning is necessary for the management of general cognitive resources. General algorithmic resources, in turn, were viewed as superordinate to mindware because adequate resources are necessary to implement rules and conduct mindware-relevant analyses.
Correlational analyses were consistent with this position. First, the indexes of reflective-level processing, algorithmic capacity, and mindware correlated positively. Second, numeracy was tied more closely to probabilistic reasoning than to nonprobabilistic reasoning. Third, the relationships of TDs to probabilistic and nonprobabilistic reasoning were significant in the control condition, but not in the instruction condition (when ability and numeracy were controlled). In the following paragraphs, the principle findings in each condition are discussed, as is their relevance to the “instructions as surrogate” hypothesis for reflective-level processing.
Reasoning in the control condition
The most important findings are those bearing on hypotheses derived from the analytic processing theory of Stanovich (2009, 2011). The results supported the proposed relationships among the reflective level, the algorithmic level, and mindware. Specifically, deficiencies at the reflective level appear to have limited the efficacy of algorithmic functions and mindware. Consistent with Hypothesis 1, when both TDs and ability levels were relatively high, numeracy was related to probabilistic reasoning. By contrast, as shown in Figure 1, when the reflective tendencies were poor—or general cognitive resources were lacking—numeracy was unrelated to probabilistic reasoning. Further, GA was related to nonprobabilistic reasoning when TDs were high but did not predict nonprobabilistic reasoning in the absence of the reflective-level qualities tapped by TDs. Consequently, even the most intellectually able and the most numeric (for probabilistic reasoning) participants solved few problems correctly when TDs were ineffectively calibrated to problem content.
It is, however, critical to recognize that the reflective-level dispositions (e.g., to engage in additional processing, deliberate) are constrained by intellectual ability and resource availability: Without adequate resources, even reflective individuals cannot successfully analyze task requirements, maintain decoupled representations, and consciously generate normative responses. The present findings are compatible with this proposal: When ability was low, TDs did not predict reasoning. Indeed, when the requisite cognitive resources were lacking, performance at the highest level of reflective functioning was little better than performance at the lowest TD level (see Figures 1 and 2).
The algorithmic resources-specific mindware relationship is also likely to be bidirectional. Although general resources are necessary for solving probabilistic reasoning problems consciously, normative responses are unlikely when numeric skills are poor. When numeracy was low, even those high in TDs and ability solved few probabilistic reasoning problems (see Figure 1).
These findings are in accord with Stanovich’s (2011, 2018; Stanovich & West, 2008) proposals that (a) reflective-level processes regulate algorithmic operations/resources and are pivotal to the selection of appropriate mindware and (b) algorithmic resources limit mindware operativity. On probabilistic tasks, relatively high levels of both reflective functioning and general algorithmic functioning appear necessary for numeracy to influence responding positively. Thus, when participants lacked either higher order TDs or adequate intellectual resources, highly numeric individuals performed no better than less numeric individuals.
The numeracy data are congruent with arguments that, at least for certain tasks, mindware is as important to reasoning as algorithmic resources and general abilities. The findings also implicate numeracy as an important contributor to probabilistic reasoning, illustrate some conditions under which numeracy predicts responding (on probabilistic reasoning tasks when TDs and ability are fairly high), and thereby extend research on the associations among reasoning, numeracy, ability, and TDs (e.g., Liberali et al., 2011; Toplak, West, & Stanovich, 2014). In addition, this research addressed a question left open by prior research. Specifically, with different tasks than those used here, Klaczynski (2014) found that the numeracy–probabilistic reasoning relationship was moderated by TDs and ability. However, because Klaczynski examined only probabilistic reasoning, determining whether TDs and ability moderated the relationship of numeracy to nonprobabilistic reasoning was not possible. In the present research, numeracy was of little value in predicting nonprobabilistic reasoning and the moderating effects of TDs and ability on the numeracy–probabilistic reasoning relationship did not extend to nonprobabilistic reasoning.
Nonetheless, the nonprobabilistic reasoning findings provide additional credence to the proposal that the quality of reflective-level processes determines the efficiency with which algorithmic resources and abilities are utilized. As with probabilistic reasoning, only when TDs were relatively high did ability predict nonprobabilistic reasoning. The fact that mindware relevant to nonprobabilistic reasoning was not assessed may explain why more variance was not explained in nonprobabilistic reasoning. An important direction for future research will be to examine reflective–algorithmic–mindware associations for nonprobabilistic reasoning.
Reasoning in the instruction condition
Observations that instructions to think logically augment reasoning were replicated for probabilistic and nonprobabilistic reasoning. Also replicated were findings that higher ability individuals profit more from instructions than lower ability individuals (Evans et al., 2010; Morsanyi et al., 2009).
The interactive effects of logic instructions and GA may be better understood by returning to the proposition that low TD individuals are the primary beneficiaries of logic instructions. How can this speculation be reconciled with the finding that more intellectually able individuals profit more than logic instructions than those with fewer intellectual resources? The finding that low TD/high-ability participants in the instruction condition outperformed low TD/high GA participants in the control condition supports the contention that instructions cue low TD individuals to engage in deliberation similar to that characteristic of high TD individuals. However, available algorithmic resources limit the efficacy of logic instructions such that improvements are found primarily in high-ability participants with suboptimal reflective-level inclinations.
Presumably, the reasoning of low TD/high-ability individuals improved because (a) they were able to keep the instructions in mind as they processed problem information and simultaneously maintained accurate representations and inhibited interference from irrelevant contents and memories (see also Kahneman & Frederick, 2002; Klaczynski, 2001a; Reyna, Lloyd, & Brainerd, 2003) and (b) logic instructions cued them to engage in complex reflective operations. Also concordant with the hypothesis that logic instructions cued reflective thinking are the findings that GA moderated the effects of numeracy on probabilistic reasoning, that GA “directly” affected nonprobabilistic reasoning, and that TDss moderated neither relationship.
Conclusions
This study makes several unique contributions to the reasoning and decision-making literatures. First, the interactive effects of TDs, ability, and numeracy on both probabilistic and nonprobabilistic reasoning had not been investigated previously. Second, no prior research examined effects associated with these interactions under different instructional conditions. Finally, the findings extend the empirical basis for the proposed relationships among the reflective level, the algorithmic level, and specific mindware. In this view, reflective-level processes are involved in assessing task requirements, determining how to utilize algorithmic resources, selecting mindware, monitoring reasoning, and evaluating responses. TDs presumably index some, but not all, of these operations. However, more precise research is required to examine these processes and explore how they relate to, for instance, metacognitive “feelings of rightness” (Thompson, 2013; Thompson, Evans, & Campbell, 2013) and “metacognitive status” (Amsel et al., 2008).
Nevertheless, aligning well with Stanovich’s (2009, 2011) propositions are the control condition findings that (a) numeracy influenced probabilistic reasoning when TDs were well-developed and general intellectual ability was relatively high and (b) GA affected nonprobabilistic reasoning only at high levels of TDs. The instruction evidence is also consistent with the position that processing at the algorithmic level is subordinate to reflective-level processing: Logic instructions apparently cued low reflective-level participants to activate more complex reflective operations than they were accustomed to using. Consequently, the moderating effects of TDs on reasoning related to GA and numeracy disappeared. The data thus suggest that instructions facilitate reasoning through the reflective level and indicate which high-ability individuals benefit most from logic instructions (i.e., those low in TDs).
Several caveats should accompany these contentions, one of which has serious implications for interpreting the findings. First, the correlational nature of many findings leaves open doors for alternative interpretations. Second, the hypotheses concerned the processes that ensue after attempts to override autonomously triggered responses with analytically generated responses. Third, even if the results were indicative of causal relationships, an unanswered question is, why did the predictors explain no more than 38% of variance in any analysis? As mentioned earlier, more variance in nonprobabilistic reasoning would likely have been explained had appropriate mindware been assessed. However, even had such mindware been measured, it is unlikely that TDs, GA (or other indexes of algorithmic resources, such as inhibition), and mindware would have fully explained either probabilistic or nonprobabilistic reasoning.
Even individuals inclined to engage higher order reflective processes and who possess both adequate algorithmic resources and relevant mindware sometimes make poor judgments, inferences, and decisions because of decoupling mistakes, post hoc rationalization, implementation errors, evaluation errors, task misrepresentations (Pennycook et al., 2015; Stanovich, 2018), and failures to inhibit interference from memories triggered by nonessential task contents (De Neys, 2012, 2015; Evans, 2008, 2011; Stanovich, 2011; see also the Reyna et al., 2003 discussion of “levels of rationality”). The second caveat suggests another reason that individual differences at the reflective and algorithmic levels are unlikely to fully account for reasoning and biases. Specifically, responses may be normative even if analytic processing is only minimally involved or not at all involved. Consistent with a foundational assumption of dual-process theories, automatically activated responses are sometimes normative (Handley et al., 2011; Thompson, 2013); indeed, recent data suggest the possibility that none of the responses generated here were based primarily on Type II (analytic, deliberative) processing. Specifically, it is entirely possible that considerable variance was unexplained because automatically activated responses were frequently normative. When this occurs, there is little need to fully utilize general cognitive resources or specific mindware, particularly if responses elicit strong feelings of rightness (De Neys, 2012). Strong feelings of rightness appear to decrease the likelihood that individuals will attempt to override initial responses. An issue that should be the subject of future research is determining whether these purportedly “metacognitive” feelings enlist reflective operations or instead whether feelings of rightness are generated by more or less “fluent” initial responses and are therefore products of autonomous processing (Thompson et al., 2013).
In a recent series of studies, Thompson et al. (2018) found that normative logical and statistical reasoning responses were not only based on Type I processes but also were more common among more intelligent people than among less intelligent people. This finding suggests that the frequently observed relationship between GA and performance is at least partly attributable to Type I processes (Thompson et al., 2018). For instance, for more intelligent people, base rate problems—instead of automatically cueing stereotypical responses—activate intuitive probabilities. If the ability–performance relationship is due, at least in part, to automatic processes, then it could be that the TDs–performance and numeracy–performance relationships were also due to Type I processes. Working against this argument are the findings from the logic instruction condition. Because they required the conscious allocation of resources (which had to be sustained throughout the experimental session), it is unlikely that the logic instructions would have had an effect similar to that of TDs if all normative responses were generated via Type I processing. Although it remains possible that many normative responses were the products of Type I processing, it is unlikely that this explanation can account for all of the observed results.
Keeping in mind the cautionary note that normative responses could have been generated automatically, we return to the conscious processes on which this research focused. The theory assumes that TDs constrain the utilization of algorithmic resources and GA and that available algorithmic resources constrain the functionality of specific mindware (e.g., numeracy). These hypothesized causal relationships could not be examined here.5 However, even if the causal relationships operate as hypothesized, it is not clear why (a) on some tasks, only TDs predict performance, (b) on other tasks, only GA predicts performance, and (c) on still other tasks, neither TDs nor GA predict performance. To some extent, these mixed and sometimes null findings may be explained by the fact that TDs and GA are imperfect indexes of reflective and algorithmic functioning; to some extent, these mixed findings may have arisen because many normative responses are generated via Type I processing. Nonetheless, studies utilizing more extensive and precise measures of algorithmic (e.g., inhibition) and reflective (e.g., metacognitive monitoring) functioning, and examining how these relate to variables that may lie at the intersection of autonomous (Type I) and analytic processing (e.g., feelings of rightness) will likely provide valuable insights into the mechanisms responsible for biased judgments. To determine the generalizability of the present findings, research examining other nonprobabilistic reasoning tasks (e.g., logical reasoning and belief biases) would also be beneficial.
Appendix A
Examples of numeracy problems
A teacher wants to change the seating arrangement in her class in the hope that it will increase the number of comments her students make. She first decides to see how many comments students make with the current seating arrangement. A record of the number of comments made by the 11 students during one class period is shown below.
The teacher wants to summarize this data by computing the typical number of comments made that day. Of the following methods, which would you recommend she use?
a. Use the most common number, which is 2. b. Add up the 11 numbers and divide by 11. c. Throw out the 21, add up the other 10 numbers and divide by 10. d. Throw out the 0, add up the other 10 numbers and divide by 10. 2. In a class of 29 students, each of 21 students have a dog and 7 students have a cat. Jane is a student selected randomly from the class. What are chances that Jane has both a dog and a cat? greater than 90% greater than 60% and less than 90% below 60% and greater than 30% less than 30% and greater than 15% less than 15% 3. You are about to flip an unbiased coin. What are the chances of the coin turning up “heads” four times in a row? 50% Less than 50% and more than 25% 25% Less than 25% and more than 10% Less than 10%
Appendix B
Examples of probabilistic and nonprobabilistic problems:
Probabilistic problems
Gambler’s fallacy
On a TV show called Spin & Win, players spin a ball around a wheel with 40 pockets. Of the 40 pockets, 32 are blue and 8 are red. A player wins if the ball falls into a red pocket and loses if the ball falls into a blue pocket. Before it is played, the wheel is tested to make sure that chances of the ball falling into each pocket are the same every time the wheel is spun (1 in 40). In each round, players are given 10 tries to win. Steve has just spun the wheel five times and won four times. When Steve spins the wheel on his next try, what are his chances of winning?
Ratio bias
Imagine that you are playing a lottery and will receive $10,000 if you win. There are two jars (both are covered, so you cannot see inside either jar) from which you can select a winning ticket. In the first jar, there are only 10 tickets, and 1 of these is the winning ticket. In the second jar, there are 200 tickets, and 20 of these are winning tickets.
Which jar, if either, would you select from to have a better chance of winning the lottery?
Nonprobabilistic problems
If-only
Tom parked his new car in a parking lot that was half empty. His wife asked him to park in a spot closer to where she wanted to shop, but he parked, instead, in a spot closer to where he wanted to shop. As luck would have it, when he backed out after shopping, the car behind him backed out at the same time, and his car sustained about $1000 worth of damage.
Robert parked his car in the same parking lot when there was only one parking place, so he took it. As luck would have it, when he backed out after shopping, the car behind him backed out at the same time, and both cars sustained about $1000 worth of damage.
Tom parked his new car in a parking lot that was half empty. His wife asked him to park in a spot closer to where she wanted to shop, but he parked, instead, in a spot closer to where he wanted to shop. As luck would have it, when he backed out after shopping, the car behind him backed out at the same time, and both cars sustained about $1000 worth of damage.
Robert parked his car in the same parking lot which when there was only one parking place, so he took it. As luck would have it, when he backed out after shopping, the car behind him backed out at the same time, and both cars sustained about $1000 worth of damage.
Experimental reasoning
The high crime rate of Northern Wistrick led the new governor to initiate a program for reducing crime rates. The program focused on improving relationships with local communities members and increasing the community watches. Six months after beginning this program, not only had the numbers of community watches increased considerably, but Northern Wistrick 's crime rate had decreased by 7%. The governor decided to continue the community involvement program and begin a program to “cracking down” on drugs and penalizing offenders more harshly. Six months later, the crime rate had dropped another 11%. In the chief's first year, the city's crime rate dropped a total of 18%.
The task of determining whether the governor's programs effectively reduced crime was given to a committee of seven government officials. After reading the governor's report, each official wrote a brief statement containing his or her main conclusion. Select the statement you believe is most accurate.
Conclusion A: The report did not show how much the city's population changed in the past year. It is more likely that crimes decreased because the city's population decreased than because of anything the governor did.
Conclusion B: The evidence in the report is strong: Crime was effectively decreased because the governor increased community watches and strictly enforced anti-drug laws.
Conclusion C: The governor is in a political position. Because the findings will increase his chances of reelection, it is likely that the governor did not report crime rates accurately.
Conclusion D: The interventions were conducted too poorly to allow us to decide whether either community involvement or drug crackdowns was effective. Indeed, it is possible that neither intervention worked. (Normative response)
Conclusion E: Increasing community involvement and community watches effectively reduce crime. These interventions are even more effective in combination with a policy for harsh penalties for violating anti-drug laws.
Conclusion F: No conclusions can be made because the governor is not qualified enough to design policies and has no training to help him determine whether or not they work.
Conclusion G: Both of the governor's initiatives were effective in decreasing crime. Moreover, of the two interventions, it would appear that cracking down on drugs is more effective than increasing community involvement.
