Abstract
Hypothesis generation is the process people use to generate explanations for patterns of data, which is an act vital to everyday problem solving. It is the basis for decision making in many professions, such as medicine, intelligence and reconnaissance analysis, auditing, and fault detection in nuclear power plants. Even laypeople’s impressions of acquaintances’ personalities based on behavioral patterns can be considered a case of hypothesis generation. This article provides an overview of research elucidating the cognitive processes that underlie hypothesis generation and decision making.
Imagine that you are walking to the mail room of your place of employment when you suddenly experience excruciating burning in your arms emanating from your shoulder and neck. What could be the cause of the pain? You remember tweaking your arms and elbows earlier that morning when catching a trash can awkwardly to keep it from hitting the ground. Could the pain be due to wrestling the trash can? Did you rupture a disk, pinch a nerve, or just exacerbate your carpal-tunnel disease? Alternatively, are your symptoms explainable by something potentially more serious? Is it a heart attack? Feeling a little off, you decide to drive yourself to the hospital just in case. Although you never felt heart pain or pressure in your chest, a few hours later it is confirmed: You are suffering from a heart blockage and in imminent danger of suffering a heart attack.
Diagnostic reasoning of the sort described above is a common occurrence. We observe data or information in our environment, and we generate likely explanations of those data (hypotheses). In many situations, the actions we take and the decisions we make depend on the outcome of the hypothesis-generation process. In some cases, our lives depend on these processes, and in other cases, such as medical diagnosis, the lives of others do (Elstein, Shulman, Sprafka, & Allal, 1978). Hypothesis-generation processes are pervasive and occur in tasks ranging from mundane (e.g., predicting what might happen next on your favorite TV show) to life-and-death (e.g., medical diagnosis). The question addressed in this article concerns the underlying cognitive processes: What impact does hypothesis generation have on the choices we make, the way we search for information, and how likely we judge different causal explanations?
Memory and Hypothesis Generation
In the opening example, the few explanations considered for the arm pain were dredged up from experience or memory. You remembered hitting your arm that morning, you reflected on your experiences with carpal tunnel, and you called up what you knew about heart attacks. Our work has focused on how memory constrains hypothesis generation and downstream decision making. Figure 1 depicts a general theoretical framework for hypothesis generation. This framework assumes that there are three primary processes involved: retrieval (retrieving memories from storage), maintenance (sustaining retrieved hypotheses in consciousness), and judgment (making decisions).

A general theoretical framework for hypothesis generation. In this example, a diagnostician processes information contained in a radiographic slide and generates possible hypotheses to explain the presenting symptoms and visual patterns contained in the x-ray. The diagnostician can use the hypotheses maintained in working memory to render a probability judgment that the patient is suffering from a particular disease, via the comparison process. Alternatively, the hypotheses maintained in working memory can be fed back into the search process to help guide the diagnostician’s search for cues within the x-ray or to inform decisions concerning which medical tests it would be most informative to order.
Retrieval from memory operates when information in the environment (e.g., symptoms of a patient) prompt the recovery of associated hypotheses from experience and memory. Returning to the opening example, if you made it to the hospital after experiencing arm pain, we assume that your symptoms would prompt the physician to retrieve diagnostic hypotheses from long-term memory. One such hypothesis might be heartburn. Another might be some kind of blockage or heart attack. Once hypotheses are retrieved, they are maintained in an active state in working memory. In other words, you keep hypotheses mentally handy so that you can compare them to the available data and target the most likely explanation. The ability to maintain data (e.g., symptoms) and hypotheses (e.g., diseases) has been found to differ between individuals, such that people who have lower working memory capacity behave as though they maintain fewer hypotheses in working memory (Dougherty & Hunter, 2003a).
Importantly, only hypotheses actively maintained in working memory seem to influence the downstream processes involved in judgment (Dougherty, Thomas, & Lange, 2010; Thomas, Dougherty, Sprenger, & Harbison, 2008). For example, probability judgments—how likely you think each option is—appear to be calculated on the basis of a comparison of a “focal” hypothesis with the alternative hypotheses that are active in working memory. For example, your physician’s probability judgment that your symptoms are the result of appendicitis will differ depending on the quality and the number of alternative diagnoses being actively considered. Comparing appendicitis to only a single low-quality alternative diagnoses (e.g., a brain tumor) will make appendicitis seem more likely to your physician than comparing appendicitis to several higher-quality alternative diagnoses (e.g., indigestion, gallbladder disease, hernia, cracked ribs). Similarly, the composition of the set of hypotheses being actively maintained is used to identify the relevant information to search (e.g., medical tests) to test the validity of particular hypotheses—a notion we refer to as hypothesis-guided search. These relations between hypothesis generation and decision processes imply a dependence of decision making on memory—a dependence that we have explored in some detail.
Because the various memory processes underlying hypothesis generation and decision making interact in complex ways, it is useful to instill these processes in cognitive models. Cognitive models allow researchers to study the implications of their theoretical assumptions and the complex interactions among processes by observing the behavior of the model. Thomas et al. (2008) developed a cognitive model (HyGene, short for “hypothesis generation”) of hypothesis generation that allows exploration of these complex interactions while also providing a mechanistic account of human decision-making behavior.
Memory Processes Constrain Hypothesis Generation
A wide variety of reasoning tasks involve hypothesis generation. For any given task, including diagnosing the cause of arm pain, there are a large number of possible causal diagnoses. Yet, with some regularity, participants and professionals tend to generate only a small subset of these hypotheses (Dougherty, Gettys, & Thomas, 1997; Dougherty & Hunter, 2003a; Gettys & Fisher, 1979; Mehle, 1982; Weber, Böckenholt, Hilton, & Wallace, 1993). The HyGene model accounts for this phenomenon by assuming that newly retrieved hypotheses are explicitly generated only if they are better matches to the data than the poorest-matching hypothesis in working memory. Failures in this process characterize profound examples of mental illness—like entertaining the hypothesis that the kitchen faucet is communicating with you when the dripping is perfectly consistent with its just being leaky. At the same time, the HyGene model predicts that the most likely hypotheses will be generated first, another prediction consistent with the literature (Dougherty et al., 1997; Dougherty & Hunter, 2003a; Sprenger & Dougherty, 2012; Weber et al., 1993). The HyGene model assumes this result because hypotheses that have occurred more often in a person’s experience are more prevalent in long-term memory, which results in greater activations of hypotheses with higher a priori likelihood. In turn, the more activated hypotheses are more likely to be generated from long-term memory into working memory.
Hypothesis Generation Affects Probability Judgment
Your physician might tell you that there is a 42% chance (a .42 probability) of your dying from (any type of) cancer. But if pressed further, he or she might tell you that the probability of your dying from breast cancer, lung cancer, or colon cancer is .18, .15, and .17, whose sum is greater than the previously stated probability of dying from any type of cancer—this effect is typically labeled subadditivity. Subadditivity obtains when the probability assigned to an implicit disjunction (e.g., dying from any type of cancer) is exceeded by the sum of probabilities assigned to the explicit disjunction (e.g., breast cancer, lung cancer, or colon cancer).
The probability-judgment task given above is often characterized as involving a comparison process, wherein the strength of evidence for one hypothesis (breast cancer) is compared with the strength of evidence for the alternatives (lung cancer and colon cancer; Tversky & Koehler, 1994). Implementing this comparison process in HyGene (Fig. 1) enables the model to make clear predictions about subadditivity: All else being equal, the fewer hypotheses that one includes in the comparison process, the higher the probability assigned to each one. When summed across all judged hypotheses, this naturally leads to the prediction of subadditivity.
Individual Differences in Working Memory Capacity and Divided Attention Affect Hypothesis Generation and Probability Judgment
Subadditivity has been found in many studies investigating probability judgments (e.g., Dougherty & Hunter, 2003a, 2003b; Dougherty & Sprenger, 2006; Tversky & Koehler, 1994). However, the level of subadditivity is dependent on memory variables. For example, Dougherty and Hunter (2003a, 2003b) showed that the magnitude of subadditivity was correlated with working memory capacity. This finding is consistent with the notion of a capacity-limited comparison process (as demonstrated by the HyGene simulations illustrated in Fig. 2). In other words, individuals with low working memory capacity were unable to maintain as many alternatives for inclusion in the comparison process as individuals with high working memory capacity, leading them to make excessive probability judgments.

HyGene simulation demonstrating the effects of working memory capacity and cognitive load on probability judgment (left) and hypothesis generation (right). Note that the HyGene parameter governing working memory capacity accounts for low working memory capacity due to individual differences or the presence of a secondary task (i.e., high cognitive load). Adapted from “Diagnostic Hypothesis Generation and Human Judgment,” by R. P. Thomas, M. R. Dougherty, A. M. Sprenger, and J. Harbison, 2008, Psychological Review, 115, p. 174. Copyright 2008 by the American Psychological Association. Adapted with permission.
The HyGene model assumes that variables that constrain the generation process or reduce working memory capacity will lead to increases in judged probability because fewer hypotheses will be available to the comparison process, leading to excessive probability judgments. For instance, if you are making probability judgments while also trying to remember a list of letters, chances are good that you are going to assign higher probabilities to each outcome because you will not be considering as many alternative hypotheses to begin with. Consistent with these intuitions and with the HyGene model’s predictions (Fig. 2), Sprenger et al. (2011) found that participants’ probability judgments were higher when they were experiencing higher levels of cognitive load (resulting in greater subadditivity)—providing additional evidence that the comparison process for probability judgments is capacity limited.
Hypothesis Generation Influences Hypothesis-Testing Behavior
Hypothesis testing involves searching for or selecting information in order to test the truth of a hypothesis. A common finding in the literature is that of confirmatory search (for a review, see Sanbonmatsu, Posavac, Kardes, & Mantel, 1998), whereby people select information relevant for evaluating only a single hypothesis (Klayman & Ha, 1987). For instance, a physician who thinks a patient might be suffering from appendicitis might check for tenderness in the abdomen. Although there might be a high likelihood of yielding a positive result of abdominal tenderness if the patient has appendicitis, this confirming test could still be nondiagnostic or uninformative for revising beliefs. For instance, if competing hypotheses (e.g., gallbladder disease, cracked ribs) have likelihoods of yielding a positive result of abdominal tenderness that are similar to that of appendicitis, then the test is uninformative. A diagnostic test would be one in which the likelihood of a positive result differs between competing hypotheses (e.g., an elevated white-blood-cell count might be highly likely for appendicitis, but relatively unlikely for cracked ribs), providing the potential for revising beliefs about relevant hypotheses. Although some recent work has argued that confirmation testing can be a reasonable strategy once the complexity of real-world domains is accounted for (Dougherty et al., 2010; Navarro & Perfors, 2011), diagnostic search has been observed under conditions that facilitate the consideration of alternatives (Mynatt, Doherty, & Dragan, 1993).
The HyGene model assumes that decision makers search for information contingent on their currently held hypotheses (i.e., they engage in hypothesis-guided search; Dougherty et al., 2010; Thomas et al., 2008). If only one hypothesis is maintained in working memory, then hypothesis-guided search necessarily follows a confirmation-search strategy, which can lead to the selection of uninformative tests. However, if more than one hypothesis is in working memory, then the decision maker can search diagnostically. We argue that retrieving alternative hypotheses into working memory enables decision makers to prefer diagnostic tests (Dougherty et al., 2010; Thomas et al., 2008).
Thomas, Lange, and Dougherty (as cited in Lange et al., in press) tested the effects of self-generated hypotheses on information search. After a learning phase in which participants gained experience with particular symptom-disease associations, they were presented with a patient exhibiting a symptom and asked to select the medical test that they believed would be most informative to diagnose the patient, given the presenting symptom. The presenting symptom was strongly associated with either one or two hypotheses. Consistent with the principle of hypothesis-guided search, participants showed a confirmation-test strategy and a preference for confirmatory tests when only one hypothesis was highly associated with the presenting symptom. Alternatively, participants showed a preference for diagnostic tests when two hypotheses were associated with the presenting symptom.
These findings are consistent with the idea that diagnostic-test selection depends on the retrieval of alternative hypotheses into working memory. As reviewed above, the HyGene model identifies a number of variables that influence when more than one hypothesis will be generated by decision makers (e.g., amount of time pressure, individual differences in working memory capacity, and the level of cognitive load). Ongoing work in our laboratories is testing the effects of these variables on hypothesis-guided search.
Time-Based Processes Influence Hypothesis Generation
Another task characteristic assumed by the HyGene model to constrain hypothesis generation is the amount of time afforded to a decision maker to generate hypotheses. Figure 3 demonstrates the influence of time pressure on the behavior of the HyGene model, whereby greater time pressure results in the generation of fewer hypotheses, causing excessive probability judgments and increased subadditivity. In other words, if you have to make decisions more quickly, the model predicts that you will not consider as many hypotheses and that, as a result, you will think each possible explanation you have generated is more likely than it actually is.

HyGene simulation demonstrating the effects of time pressure on probability judgment (left) and hypothesis generation (right). Adapted from “Diagnostic Hypothesis Generation and Human Judgment,” by R. P. Thomas, M. R. Dougherty, A. M. Sprenger, and J. Harbison, 2008, Psychological Review, 115, p. 174. Copyright 2008 by the American Psychological Association. Adapted with permission.
These HyGene findings are consistent with the empirical findings of Dougherty and Hunter (2003b). Dougherty and Hunter (2003b) manipulated the presence or absence of time pressure while participants made probability judgments and found that participants’ judgments were more excessive when made under more time pressure, which suggests that more time pressure led to the generation of fewer hypotheses.
In hypothesis-generation tasks, data are often acquired serially, one after another. As a result, each datum is experienced in a position relative to the rest of the data. Although we should be unaffected by data order, on an intuitive level, the order in which we encounter data almost certainly affects the hypotheses we generate. Both primacy bias (early data has a larger influence than later data) and recency bias (later data has a larger influence than early data) have been demonstrated in decision making (Hastie & Park, 1986; Hogarth & Einhorn, 1992; Peterson & Ducharme, 1967). For instance, early data often have a larger influence on probability judgments than later data—demonstrating a primacy bias in judgment (Peterson & Ducharme, 1967).
Our recent work has demonstrated similar order effects in peoples’ hypothesis-generation (i.e., diagnostic) behavior. Lange, Thomas, and Davelaar (2012b) presented participants with one informative piece of data among three uninformative pieces of data sequentially, such that the position of the useful piece of data was manipulated to appear in each of four possible serial positions—allowing the influence of the data at each serial position to be measured. Later data contributed more to participants’ choice of diagnosis than early data, demonstrating a recency effect in hypothesis generation. As can be seen in Figure 4, the HyGene model captures the recency trend evidenced in the data.

Empirical and model data from Lange, Thomas, and Davelaar (2012b) demonstrating recency bias in the proportion of diagnoses consistent with the diagnostic cue. Adapted from “Temporal Dynamics of Hypothesis Generation: The Influences of Data Serial Order, Data Consistency, and Elicitation Timing,” by N. D. Lange, R. P. Thomas, and E. J. Davelaar, 2012, Frontiers in Psychology, 3, Article 215, Figure 6. Copyright 2012 by Frontiers Media. Adapted with permission.
The speed of data acquisition also influences diagnosis. Lange, Thomas, Buttaccio, Illingworth, and Davelaar (2013) presented a sequence of five symptoms to participants and asked them to select the more likely of two disease hypotheses. The sequence of symptoms was such that the first two symptoms suggested one hypothesis (Disease A) and the last two symptoms suggested the other hypothesis (Disease B). Under the slow rate of symptom presentation, a recency bias obtained in diagnosis, whereas under the fast rate of presentation, the recency bias attenuated and a primacy bias emerged.
Recent findings have suggested that a primacy bias is increasingly likely to obtain in hypothesis generation (i.e., diagnosis) as the complexity of the task increases (Lange, Thomas, & Davelaar, 2012a, 2012b) or as a function of increased working memory capacity (Lange, Davelaar, & Thomas, 2013). In sum, the same datum can have different impacts on which hypotheses are generated depending on where it is presented in the sequence.
Conclusions
Experimental laboratory procedures often provide clear options or goals for the participant—providing task structure that is often lacking in real-world domains. In the lab, you might be explicitly instructed to press one of two keys, on the basis of available data. In the real world, when you start to feel chest pain, there are no instructions. We suggest that the primary function of hypothesis generation is to impose structure on the complex and ill-defined problem spaces that often characterize the circumstances in which people must judge, choose, or act. Because physicians can generate a few good disease hypotheses to explain your symptoms, they can judge the likelihood of a diagnosis, order diagnostic medical tests, and select appropriate courses of treatment.
Thus, hypothesis generation serves as the critical bridge between peoples’ task environment and the decision processes enabling complex and intelligent behaviors. Although people generate good hypotheses to explain patterns of data, their hypothesis generation is impoverished as a result of underlying memory constraints, which leads to systematic biases in beliefs and information search. The patterns of behavioral findings reviewed here are consistent with our computational model, HyGene. However, we believe that one of the most fruitful uses of the HyGene model will be borne out through optimizing the model by lifting the human constraints. In this context, the HyGene model may serve as a support tool to aid the diagnostic decision making of professionals by inoculating them from the biases discussed in this article and to improve the robustness of existing applications of artificially intelligent classification systems (Thomas et al., 2010).
Footnotes
Acknowledgements
We thank Dr. Jennifer Barnes for providing critical comments on a previous version of this manuscript.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
This article was supported by National Science Foundation Grant SES-1024650 to Rick Thomas.
