Abstract
In defense of their “explanatory” theory of the proof process, Professors Ronald Allen and Michael Pardo maintain that a successful theory of this kind should correspond to the way that jurors actually reason, to the structure of American trials, and to typical jury instructions. They also demand that such a theory should be normatively defensible. This response suggests that using a single theory to cover such disparate ground obscures more than it clarifies, given the important gaps between psychological, doctrinal, and normative aspects of the fact-finding process.
Introduction
Professors Ronald Allen and Michael Pardo have spent many years pointing out potential problems with probabilistic models of juridical reasoning. Their latest entry in this line is wide-ranging, both defending their own ‘explanatory’ theory of the proof process and critiquing three other papers that employ quantified conceptions of uncertainty (see generally Allen and Pardo, 2019 : 13-14, 17–18). The authors maintain that a successful theory of this kind should correspond to the way that jurors actually reason, to the structure of American trials and to typical jury instructions (Allen and Pardo, 2019: 11, 17–18). They also demand that such a theory should be normatively defensible (Allen and Pardo, 2019: 11–12). Unfortunately, any model that can bridge the gap between these divergent grounds must be a vague approximation to any one of them. Even worse, blurring these lines will impede our ability to identify and evaluate potential reforms to our trial process.
The various functions that evidential models may serve
Let us begin by considering the kinds of questions that evidential models might help us answer. Many theorists describe their models as ‘descriptive’, but that formulation contains an important ambiguity. We might be interested in giving a psychological account of the ways that jurors actually reason. Alternatively, we might be interested in giving a doctrinal account of the rules, instructions and practices that courts use to guide jury decision-making. These are different topics, given that juries will not always behave exactly as the law expects them to do.
The umbrella of ‘normative’ theory likewise contains two sub-categories. First, we might seek to describe an idealized inference method, which would map patterns of evidence to results in a way that maximises our shared values. Properly conceived, such a normative model would give us a yardstick against which to measure institutional performance, even if it was not practicable to implement in ordinary cases. Call this an epistemic account. Alternatively, we might also wish to construct a prescriptive model of the rules, instructions or practices that would guide real-world decision-makers as close as is feasible towards the results prescribed by the epistemic model, subject to real-world cognitive and resource constraints.
There can also be hybrid models, which combine the work of different basic types, at the cost of some fidelity (see e.g. Nance, 2016: 11). All one needs is a reason to find such a blending helpful, and an awareness that in describing potentially conflicting things simultaneously, we may blur important distinctions.
Finally, we must be cognizant that practices can vary from country to country, from state to state or even from one local courtroom to another (Allen and Pardo, 2019: 7). As the aphorism goes, all models are wrong, but some are useful. All we can achieve are imperfect pictures that help us to understand how courts function and to identify ways in which their performance could be improved.
Allen and Pardo’s hybrid account sacrifices depth for breadth
Allen and Pardo state that their ‘primary aim’ is to ‘understand the general nature of juridical proof’, which is a descriptive inquiry (Allen and Pardo, 2019: 7). But they also explore whether ‘what is empirically true is normatively appropriate’ (Allen and Pardo, 2019: 7). Consistent with those goals, they refer to issues of psychological fit, doctrinal fit, prescriptive fit and normative fit when defending their own views and attacking the views of their opponents. This makes their model a rather broad type of hybrid, in that it seems designed to cover all possible aspects of proof simultaneously.
Despite their many valuable observations, Allen and Pardo’s desire to provide a normatively attractive account leads them to neglect many messy but important details of real-world fact-finding. Consider first their own theory’s application to the psychological aspects of fact-finding. They take it as a point of success that the relative plausibility account has some similarities with the Story Model of juror decision-making previously advanced by Pennington and Hastie (Allen and Pardo, 2019: 17–18; cf Pennington and Hastie, 1991: 519–520). But looked at more closely, there are sharp differences between their theory (which, for instance, allows a party to prevail using disjunctive explanations) and the Story Model (which presumes that juries pick one coherent narrative account and then choose the verdict that best corresponds with it). Nor do they spend much time canvassing the broader array of scholarship on the psychology of jury decision-making, which might reveal that there is more to choosing a verdict than constructing or choosing a story in response to the evidence. 1 What is worse, the Relative Plausibility account neglects significant bodies of experimental evidence suggesting that jurors’ decisions may be biased by factors unrelated to the plausibility of competing explanations. Without belabouring the point, we have good reason to think that judgments can be improperly influenced by gruesome but irrelevant photographs of a murder victim (see e.g. Grady et al., 2018), halo effects (see, e.g. Mobius and Rosenblat, 2006: 228–234; Reinhard and Sporer, 2010: 95–97), the order in which evidence is presented to the fact-finder (see generally Spottswood, 2015: 307–328) or even simple innumeracy (see e.g. Koehler et al., 1995). If psychological verisimilitude is required for a ‘theory of proof’ to succeed, the relative plausibility account stands on weak footing. 2
The authors might seem to be on firmer ground when it comes to doctrinal description, but even here there are significant gaps between their explanatory account and actual practices. They make a persuasive case that a comparative evaluation of explanatory strength could be used to decide cases in a manner that is consistent with varying burdens of persuasion, without any need to precisely quantify the probability of either explanation’s truth (Allen and Pardo, 2019: 26–29). But as they themselves point out when critiquing another theory, showing that something is logically consistent with an underlying practice is a far cry from showing that any of the participants understand their tasks in those terms (Allen and Pardo, 2019: 36–37).
If one scans typical jury instructions, one will not see any suggestion that the jury should proceed abductively, that their task is primarily comparative or any indication that the juries’ job is to assess the extent to which the parties’ theories of the case explain the evidence. Instead, typical instructions primarily focus a jury’s attention on whether the party who bears the burden of persuasion has made a convincing enough case to succeed. 3 And at least in the typical civil case, this framing is often probabilistic. Commonly formulated jury instructions describe the preponderance standard as an inquiry into whether the plaintiff’s claim is ‘more likely true than not true’, 4 which focuses attention on the likelihood that certain facts are true, not on the competing strength of explanations of the evidence. This sort of instruction is hard to justify if the jury’s job is merely to choose the stronger story, with no regard to its cardinal likelihood.
It is true, as the authors point out, that judges sometimes discuss the jury’s task in terms of comparing rival explanations. 5 But it is far from clear that these stray comments embody a commitment to the deeper implications of an explanatory approach. Consider, for instance, one of the hypotheticals they explore, in which a jury believes that a plaintiff’s explanation is 0.4 likely and the defendant’s explanation is 0.2 likely. They acknowledge that, under the probabilistic account, the plaintiff should lose, and then criticize this result on normative grounds, because they believe it fails to promote the goals of accuracy and equalising the risk of error (Allen and Pardo, 2019: 18). But in the only case they cite during their discussion, Reeves v Sanderson Plumbing Products, the Supreme Court makes it clear that, even if it has been shown that a defendant’s explanation is quite unlikely, the plaintiff still has the burden to show that their own account is probably true. In the court’s words, ‘it is not enough to disbelieve the employer; the fact-finder must believe the plaintiff’s explanation of intentional discrimination’. 6 And in fact it is more generally understood that a plaintiff’s burden of persuasion requires convincing the jury that their own account is probably true, not merely convincing them that the defendant’s account is unlikely. Thus, although it may be normatively attractive to award victory to whichever party has the stronger of two weak explanations, there is little evidence that such a standard actually exists in American legal doctrine, let alone that it is dominant.
Thus, there may be a tension inherent in the project of trying to make a theory at once descriptively accurate and normatively attractive. Allen and Pardo’s approach seems to be an excellent way for lay jurors to reach verdict decisions. But it falls short as a psychological model because of its failure to explain sources of error and bias, and it bears only a fuzzy resemblance to existing doctrine. Of course, this need not be fatal if we understand their project to be either a purely epistemic model or an ‘interpretive’ model that seeks to find a middle ground between description and prescription. But even if we view their project as a success on those levels, that should leave plenty of room for models that focus more narrowly on the psychological, doctrinal or normative aspects on their own, without making compromises in order to bridge the gaps between them.
Overbroad critiques of models that employ subjective probability
Their insistence that a theory do so much work at once also leads Allen and Pardo to overstep in their criticisms of competing approaches. For instance, they attack the use of subjective probability estimations in models of legal decision-making (Allen and Pardo, 2019: 11-12, 33–35). First, they quite correctly note that one cannot simply rely on relative frequency data to decide the appropriate probabilities to attach to legal issues in a case. For one thing, there may be multiple competing reference classes available. Even worse, there will often be issues that do not seem typical of the available reference classes or for which no generalized frequency data can be found. These limitations are why most sophisticated probabilistic models rely on subjective probability assessments (although in a normative model, such probabilities would incorporate frequency data to the extent possible and helpful) (cf Cheng, 2009: 2096–2097; Gelman et al., 2013: 114–115).
Allen and Pardo, however, remain highly skeptical that subjective probability assessments should play a role in describing any aspect of the proof process. They worry that such probabilities are ‘truly subjective’ with ‘no necessary relationship to advancing accurate outcomes’ (Allen and Pardo, 2019: 12). Relying on subjective probabilities, they caution, ‘does not advance the fundamental goals of the proof process regarding accuracy and the risk of errors’ (Allen and Pardo, 2019: 12). Finally, they contend that relying on subjective probability assessment to generate a decision is somehow inconsistent with the notion of assessing the reasonableness of verdict results (Allen and Pardo, 2019: 12). These differing criticisms blend the ways that subjective probability might be used at the psychological, the doctrinal and the normative levels. But when we untangle the different levels and consider how they should be modelled in isolation, each of the critiques falls flat.
Consider first the use of subjective probability for modelling how juries actually reason. 7 In such a context, considerations about error allocation or judicial review are simply beside the point. Nor would a sensible model maintain that probabilities are ‘literally just made up by the decision maker’ (Allen and Pardo, 2019: 33), as if they bear no relationship to the evidence in the case or the jurors’ background beliefs. Rather, they will be produced causally in response to the evidence, those background beliefs, and the arguments made by counsel. Now, this does not guarantee that jurors’ probability judgments will be well-calibrated, 8 but there is no reason that a probabilistic model of actual fact-finding should assume perfect reasoning on the part of everyday jurors. Rather, it could either take their probability estimations as a given and focus on how those judgments are combined and turned into verdicts, or it could attempt to unpack the sources of miscalibration so that they can be better understood (see e.g. Spottswood, 2013: 197–199). And of course, we can talk about the notion that jurors have varying levels of credence, and even quantify such levels mathematically in our theorizing, without making the silly assumption that jurors typically attach explicit numbers to their own levels of confidence throughout the process.
If we focus on the doctrinal level, the only argument from Allen and Pardo that seems relevant is their claim that a subjective account is inconsistent with rules requiring courts to review verdicts for reasonableness. As they elaborate, ‘[u]nder the subjective account, every decision is reasonable (at least so long as it is otherwise internally consistent)’ (Allen and Pardo, 2019: 12–13, fn. 50). This concern is unconvincing. One can conceptualize probabilities as subjective states of credence, and still think that some such judgments are more reasonable to hold than others. For instance, suppose that Alfred tells me he heard a meteorologist say there is a 60% chance that it will rain tomorrow. Suppose further that, on average, it rains 20% of the days this time of year, and that Alfred is not always the most reliable source, due to his careless memory and his fondness for practical jokes. On this evidence, I might reasonably believe that there is a 50% chance of rain tomorrow (discounting his suggestion only modestly towards the base rate). Or I might take a more skeptical position, and believe the chance is 30%. But if I were to subjectively believe, based only on the above information, that it was either 99% or 1% likely to rain, any sensible person would question my judgment, because I have no warrant for thinking the likelihood is either so high or so low. Probability assessments are, in this respect, no different than any other sort of subjective belief. Similarly, if a person reports a belief that the moon landings were faked, I might think that he is being sincere while also finding his conclusion to be unreasonable. And if we can do this with belief versus non-belief, we can surely do the same for quantified levels of credence.
Lastly, there is the question of whether it makes sense to use subjective probability updating in a normative model of jury decision-making. As discussed above, we might wish to identify an epistemic model that represents ideal decision-making, or we might wish to devise prescriptive rules that get us the best results possible under real-world constraints. If we focus on Bayesianism or explanationism as theories of idealized inference, I suspect there will be little daylight between the two approaches (cf Lipton, 2004: 107–117). Whether you are trying to tally probabilities or examine the plausibility of explanations, a careful thinker will consider most of the same things if they wish to accurately track reality. Thus, either approach, if used normatively, should appropriately discount for the likelihood of conjunctions, account for varying base rates in behaviour, and discount the weight given to remote hearsay evidence compared to live, confronted testimony. The only potential gap I am aware of is one I discussed above. If Allen and Pardo really do insist (cf Lipton, 1993) that a juror should decide in favour of a plaintiff with a 0.4 likely explanation, just because the defendant’s account is only 0.2 likely, then the probabilistic framework has a superior recommendation. Faced with two unlikely stories, an ideal reasoner should go on to consider what other factual theories might account for the evidence, and persist until there is some set of alternative accounts that makes one party’s case or the other more probable than not.
Alternatively, perhaps Allen and Pardo mean to argue that the relative plausibility approach to deciding cases gives better prescriptive guidance than a probabilistic approach, given real-world constraints of cognition and resources in typical trials. This claim has more intuitive appeal than the former one, but it is not self-evidencing. Let us start by considering some facts on their side of the ledger. There is evidence from mock jury studies and other psychological experiments suggesting that lay people often make elementary errors when asked to reason probabilistically (see e.g. Koehler, 1993: 212–216). Such experiments suggest that explicitly updating probabilities could quickly go astray due to simple innumeracy. Moreover, the cognitive effort involved in trying to construct a plausible likelihood ratio for each piece of evidence would quickly become intractable. For this reason, few advocates of probabilistic reasoning advocate that jurors should do this for every individual statement that a witness makes. At best, they might holistically estimate probability based on a large body of evidence, and then be guided by an expert witness towards combining it with a small quantity of explicitly probabilistic evidence. Ergo, the ability of methodological Bayesianism to improve jury decisions is probably fairly limited.
Does this mean that relative plausibility is a better prescriptive tool? That is where my own intuitions lie, but we should still consider a few potential obstacles. The first concern is that the explanationist approach might not be very constraining, so that a juror who was otherwise inclined to err might receive little correction from coaching in proper abduction. Allen and Pardo provide a laundry list of factors that go into judging the quality of an explanation: ‘consistency, coherence, fit with background knowledge, simplicity, absence of gaps, and the number of unlikely assumptions that need to be made’ (Allen and Pardo, 2019: 16). But they give no instructions for weighing the factors when they come into conflict. Moreover, it seems unlikely that a deliberating jury would be able to systematically examine the consistency of each item of evidence with multiple competing explanations, or rigorously justify intuitive feelings that one theory was more coherent or simple than another. Instead, they might often default to defending an existing intuition about which party’s case felt stronger, using the factors in an ad hoc way, rather than proceeding systematically from first principles with an open mind (see Spottswood, 2013: 190–191). If so, the prescriptive benefits of an explanationist approach may be surprisingly minimal.
Furthermore, we might question the assumption that one prescriptive theory is best, no matter the identity of the decision-maker. In an important recent study, Philip Tetlock found that numerate lay-people who pay close attention to base rates, specify an initial probabilistic estimate and then update it frequently as new information becomes available outperformed both experts and the averaged judgments of large groups when forecasting the likelihood of specific future events (see generally Tetlock and Gardner, 2015). Such skill was rare among people who participated in Tetlock’s experiments, but it could be cultivated through training and it tended to improve with practice. It is hard to imagine applying such insights to quasi-randomly selected lay jurors. But many cases are decided by professional judges, arbitrators and ALJs, who could plausibly be taught similar methods and trained in their use. Such musings are obviously speculative, but they do suggest that we might want to vary our guidance to best suit the needs and abilities of different sorts of fact-finders, rather than taking a ‘one-size-fits-all’ approach.
Conclusion
To sum up, if we wish to assess how well our trials are working, or find ways to improve them, we will need to answer four different kinds of questions. First, how do juries typically reason, if left to their own devices? Second, what does current doctrine assume or encourage about that process? Third, what approach, if applied ideally, will most often leave to normatively optimal trial outcomes? And finally, what kind of prescriptive instructions will get fact-finders to reach those optimal decisions most often, given real-world constraints? If we instead insist on telling the same story about what juries do, what judges want them to do and what they actually should be doing, we might think that the judicial process was working well, but only because we have viewed it through a lens that would hide any of its flaws.
Footnotes
Author’s note
Associate Professor, Florida State University College of Law. I am grateful to Ron Allen and Maggie Wittlin for their edifying comments.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
