Abstract
Attempts to establish a quantitative framework for policy-making in the criminal justice system in recent decades have coalesced around the problem of the standard of proof and Kaplan’s influential 1968 paper. The central thread of work continues to use an equation he put forward while abandoning some of his foundational assumptions, an approach I call ‘Kaplanism’. Despite a growing awareness of deficiencies, elements of this school of thought, such as the parsing of concerns into the two categories of ‘error reduction’ and ‘error distribution’, have entered the general jurisprudential discourse. Here I launch a methodological attack and claim to kill this approach. This allows me to refute Laudan and other ‘consequentialist’ approaches to the standard identified by Walen, Walen’s own approach and an important part of Stein’s underpinnings. The same tools allow me to also refute Laudan’s earlier m/n meta-epistemology, Lippke’s ‘adage’, Stewart’s formalisation of Dworkin, Dahlman’s Bayesian work and (at least in criminal law) Kaplow’s law and economics approach. I also refute Hamer’s ‘conventional rationale’ for the current standard, Lillquist’s approach to the same and what Epps reports as ‘the Blackstone principle’. The law is left with no epistemic basis for policies, which, I argue, leaves it struggling for public trust in the modern era.
Keywords
The law has to make decisions in a state of uncertainty—how? This is what, following Shapiro, we can call the ‘problem of proof’. Its centrality cannot be overstated and clumsy handling can provoke a crisis with potentially catastrophic consequences. One way to resolve it is to demand certainty, a position that drove traditional Muslim law into irrelevance and Romano-canon law to wed itself to torture (Shapiro, 1991: 241).
Shapiro has described the evolution of such a crisis in the early modern period when English law lost access to divine insight through the medium of Christian conscience. A vital part of the response that allowed the law to convince society at large that the jury retained the divine spark was the development of policies. Thus such policy-making can be seen fundamentally as a mechanism for cultivating trust.
This was the context in which Blackstone deployed his seminal dictum that, ‘…the law holds that it is better that ten guilty persons escape than that one innocent suffer’. 1 He used the dictum as the first step in a three-step argument, the next two being: a general principle that presumptive evidence should be admitted only cautiously; and two particular rules, to never convict of murder or manslaughter unless the body can be produced and to not convict of theft merely because someone will not explain how they came by the goods.
Such policies can of course be justified or denied on many grounds. Some kind of explanation therefore is required both for the enthusiasm with which the dictum was immediately taken up, so that Bentham felt obliged to warn against a vogue for ever higher numbers, and its enduring importance. This need is especially pressing since the basic idea was not new and the dictum is something of a riddle, with Blackstone making no attempt to justify any of the three steps of his argument. 2
The explanation seems to me to likely lie in a combination of four factors: the new need for a justification for policies identified by Shapiro; the usefulness therefore of the very general applicability of steps one and two in Blackstone’s argument; the dictum’s quantitative aspect, which chimed with the beginnings of the general adoption of the quantitative in society at large, a movement which at about the same time gave us Beccaria’s la massima felicità divisa nel maggior numero; and the ability of the dictum to be read as shoving policy-makers towards policies that reduce both the number of wrongful convictions and the ability of the state to seize and convict, which chimed with the new and popular demands of the emerging democracies and was in contrast with other contemporary frameworks, such as that noted by Franklin which invested judges under Robespierre with an absolute discretion (Franklin, 2015: 7).
The dictum therefore can be seen as a new kind of buttress of the law that was required in a new kind of society. By virtue of the 10 and the clear analytical framework it suggests of four objectively distinct outcomes, it partakes of the quantitative; by virtue of this apparent transparency and the shove, it partakes of the democratic; and these two aspects together give it one foot in modernity. But the lack of a clear and cogent argument, of rationality, and the lack of a mechanism of policy-making susceptible to democratic oversight leave it with one foot in the pre-modern.
Today, this half in, half out aspect can still be seen in jurisprudence. On the one hand, and primarily in the United States, attempts have been made to develop a rigorous quantitative framework for making policies in criminal law. These have coalesced around the standard of proof and have largely been based on Kaplan’s 1968 article introducing decision theory. 3
On the other hand, large swathes of jurisprudence—in the United Kingdom for example—have quietly rejected this vein of work and instead developed explicitly unquantitative approaches. Thus we have two bodies of what Kuhn called incommensurate scholarship, so that Hamer’s articulation of a ‘conventional rationale’ for the standard of proof derived from Kaplan co-exists with non-quantitative accounts such as Allen’s explanationism or the moralised work of Lippke or Duff (Allen and Pardo, 2018; Duff, 1991: ch. 4; Kuhn, 1970; Lippke, 2016).
The main point of this article is to definitively refute the contemporary legacy of Kaplan, what I call Kaplanism. I use the word ‘kill’ for this, which may seem aggressive, unscholarly even, but is intended to serve a constructive purpose. The non-quantitative schools have reservations towards Kaplanism that are sufficient for them but not for Kaplanists—the incommensurability. My aim is to convince also the Kaplanists, which can only be achieved through close engagement with their arguments, so that most of this article is devoted to detailed questions of methodology. This requires a fundamental sympathy of outlook and explains why the two most significant forerunners for me are Tribe and DeKay, who both also had a quantitative training; why a Kaplanist, Laudan, provides a critical part of my apparatus; and why a statistician, Dawid, is also essential. ‘Kill’ clearly expresses both how fundamental I consider the problems and how definitive I consider the refutation. Thus its use is an invitation to Kaplanists to either join battle or concede. Rather than going on with co-existence, I am trying to force the issue. However, my overarching purpose is not to hand victory to the non-quantitative approaches. The conclusion I reach is only that Kaplanism is fatally flawed, not that a quantitative approach to the problem of proof is intrinsically impossible. This article is intended as a clearing of the decks in preparation for the development of new quantitative approaches which cannot be so easily discounted by today’s unquantitative schools.
My focus will be on the foundational assumptions of Kaplanism and I will adopt some epistemological machinery that has been defined in more detail by me elsewhere. We will consider a criminal trial a kind of empirical inquiry, so that the rules that a court follows, its policies, are its epistemology. The principles, or meta-rules, that will allow us to choose between the potential policy options then form a meta-epistemology for the law.
A central concern is arguments, like Blackstone’s, in the form ‘because n, X’, where n is the ratio of false acquittals to false convictions and X a policy of some kind. The question is, by what steps of reasoning does one get from n to X? 4
If a ‘because n, X’ argument connecting n with X is cogent, then such an argument will qualify as a meta-rule. This is a ‘hard’ approach derived from Laudan, so that, for example, it is an open question whether the law has this kind of meta-epistemology at all (Cullerne Bown, 2018: 366; Laudan, 2006: 4).
There is no textbook of Kaplanism. In that it differs from Kaplan’s original approach, the differences have been introduced by degree, by many authors, often without any acknowledgement that there is an innovation requiring justification. To achieve the killing therefore I have ranged widely both in order to pull together an accurate picture of Kaplan’s legacy today and to refute all these related arguments. Many of the authors are dealt with only very briefly and it may seem at first glance that this can hardly amount to refutation of what is in many cases substantial work. But their approaches are related and if the methodology of a quantitative argument is flawed the entire edifice may indeed collapse; in each case my implicit claim is to knock out a keystone (or stones).
Some of the steps in my argument have been articulated already by others. I hope this will not inadvertently seduce readers into believing there is nothing original here; a new building may contain some old bricks. I make no effort to cite these earlier authors because very often an argument that is in my view valid is made within a larger (or beside a separate) argument that is invalid. For example, although expressed with different terminology, Risinger, drawing on Allen and Laudan, is aware of one of my points, that n is not a valid objective for the system. 5 But this comes within a discussion of Blackstone’s ratio that is in my view flawed (just as there are in my view flaws in Allen and Laudan’s approach). To cite Risinger (or Allen and Laudan) would then require me to sort the wheat from the chaff, carefully distinguishing which bits of their position are valid in my view and which are not, a task that, if taken up generally, would greatly lengthen this article to little effect. All this means, of course, is that I give up the opportunity to lean on others for my attacks and stand entirely on my own feet.
The article begins by clarifying the distinction between the original Kaplan and today’s Kaplanism. The next section sounds a brief note of caution about the original without claiming to kill it. There are then four sections in which I claim through a variety of arguments to kill Kaplanism four and a half times over. In the Conclusion I take stock of Kaplan’s legacy, consider the more general predicament of the law today in dealing with the problem of proof, argue that to frankly abandon the quantitative would be to risk losing popular trust, and suggest a way forward.
Finally, while I think ‘Kaplanism’ reasonable, I use the term ‘Kaplanist’ only out of explanatory convenience. I don’t think any of the authors mentioned here are defined by a reliance on Kaplan.
Kaplan and Kaplanism
Kaplan approached the question of the standard of proof from the perspective of decision theory. This field has its roots in psychological questions concerning the means by which people, and other creatures, make choices in the moment. Thus he explicitly framed his approach in terms of a ‘personalistic’ theory of probability (Kaplan, 1968: 1067).
Dawid has since provided a classification of the different kinds of probability theory that may be encountered in legal circles. Kaplan’s naturally falls under the ‘subjective’ heading, which may be adopted when it is impossible to conceive of a repeated sequence of events. Thus, if one wants to look at the standard of proof from the point of view of an individual jury or juror, Kaplan’s choice of a school of probability is natural. From this starting point arose Kaplan’s influential inequality:
where P is the probability of guilt as assessed by the jury, Dg the disutility of acquitting a guilty person and Di the disutility of convicting an innocent person. When the left-hand side is bigger than the right-hand side the jury should convict and thus if we replace the > with an = we get an equation defining the threshold for conviction. 6
Utility, or its opposite, disutility, here is shorthand for a term of art in decision theory, subjective expected utility, and Kaplan says, ‘The important thing to remember about utility is that…it is a personalistic measure.’ 7 Likewise, Dawid reminds us that this kind of approach retains an ‘irreducibly subjective element’ (Dawid, 2005: 36).
Things start to look quite different if we consider not the predicament of an individual jury but that of someone concerned with the system as a whole, a distinction highlighted by Tribe. 8 Then we wish to use our analysis to set a standard of proof for the whole of the legal system, that is, to establish a policy, and we do have a repeated sequence of events in the trials that come to court. In this case, the obvious kind of probability to adopt is that taught to schoolchildren, what Dawid describes under the headings of classical and statistical probability, and which, to emphasise my point, I will follow Kaplan’s own lead and collect under the heading objective (Kaplan, 1968: 1066). It is perhaps easier to see this distinction clearly now that the pitched battles between objective frequentists and subjective Bayesians of the late twentienth century concerning interpretations of probability have subsided (Franklin, 2015: 9).
This difference can also be expressed in epistemological terms. Dawid identifies subjective probability as that which is most suitable for reasoning in a court, saying: ‘The events at issue in a court of law are most appropriately regarded as unique to the case at issue, rather than as instances of repeatable phenomena.’ In other words, the use of such probability is natural within the epistemology of a court. But when we are defining policies for the entire system, we are not dealing with a unique case, we are dealing with an endless sequence of cases, and the means by which we arrive at our decision is not part of the law’s epistemology but part of its meta-epistemology.
If we take a statistical approach, then if we say that a crime committed by two criminals counts as two elements of our universe, then our starting point is what I call 4-gram, which partitions a universe of person-action pairs, some of which elements may be criminal-crime pairs where a conviction is obtained (rightful convictions or true positives), as shown in Figure 1.

The 4-gram, the basis of an objective quantitative analysis of the criminal justice system, but not Kaplan’s.
We may, in theory at least, count up the elements in these subsets to give us the four tallies in the diagram. Obvious as this may seem, and despite the shared terminology of ‘true positives’ and so on, this universe and 4-gram form no part of Kaplan’s approach, for the universe here is composed of the multitude of person-action pairs for which a conviction may be sought, the sequence of trials that is specifically excluded by Kaplan’s subjective approach. Indeed, until you have thrown this Venn diagram out of your head you have not truly embraced subjective probability at all.
This problem with Kaplan was touched on by Tribe, who accused him of ‘positing the wrong decisionmaker’ (Tribe, 1971: 1329). To investigate the issue let us first formalise it. Perspective is a pairing that combines both the extent of a universe of potentially criminal acts under consideration and a (sometimes notional) actor. We will distinguish particularly between four perspectives: that of a jury, the court system as a whole, Risinger’s concern for people arrested or otherwise ‘drawn into’ the system and the sovereign, which is to say the point of view of anyone concerned with criminal justice system as a whole and its role in society. 9
I will argue that the distinction between the jury and sovereign perspectives, between subjective and objective frameworks, has been muddied in the subsequent literature, leading to problematic elements in Kaplan being magnified into an edifice of reasoning that is incoherent. As a taster, consider as a kind of koan this from Stein: Following John Kaplan, the utility-based criminal proof standard can be formulated as I/(I + G) where I and G denote, respectively, the social damage inflicted by convicting an innocent suspect (I) and by erroneously acquitting a criminal (G).
10
If we then try to parse the statement, we start by noting that he is following Kaplan, so the framing is subjective. But then, where Kaplan uses the language of decision theory and refers to the ‘subjective expected utility’ of a decision, Stein strips out the subjective bit and refers instead to a ‘utility-based criminal proof standard’, which sounds like a general policy rooted in an objective modelling of the problem. This is emphasised by the talk of ‘social damage’, which again strips out any sense that the assessment is that of an individual trapped in the jury room, suggesting instead the concerns of a sovereign. 11 In the end, it is not clear that anything of Kaplan survives except the form of the equation. Stein, however, does not find it necessary to account for the shifting foundations; it is as if it is of no account. Importantly, and as suggested by Stein’s language and his decision to cite Kaplan without qualification, this not a peculiarity of Stein’s. All that he is doing here is continuing a well-established habit in this corner of jurisprudence.
The upshot of such shifting between the subjective and objective is twofold. First, it raises the question of the extent to which there is in fact any correspondence between the two kinds of probability that allows us to go back and forth in arguments between the two. Second, it implies that the contemporary users of Kaplan’s equations can be challenged in ways that Kaplan can’t, the irreducibly subjective character of Kaplan’s approach making it resistant to attacks that require an objective frame of reference. To clarify the discussion, let us call the approach, typical of contemporary work, in which the equations in Kaplan’s original article are used to justify a general policy ‘Kaplanism’. Thus Kaplanism is defined by its methodological foundations rather than any kind of motive or philosophy.
Not everyone working currently with Kaplan’s equations can be described as Kaplanist, precisely because their aim is not a general policy on the standard of proof but something quite different. An example is Lillquist, who overtly champions a variable standard of proof and, if he could think of a way to give understandable instructions to jurors on this point, would explicitly direct them to vary the standard to fit the circumstances presented to them (Lillquist, 2002: 185).
This takes us well beyond an appreciation of the flexibility given to the jury by the current formulations. Lillquist’s approach does not so much give us a basis for determining policy as abandon policy altogether. Kaplan’s foundations in subjective probability then logically become attractive precisely because of their lack of objectivity. The ramifications of this way forward are, however, truly radical and were well discussed by Ligertwood as far back as 1976. For example, if a rational jury considers the utility of the possible outcomes with regard to a poor defendant with family responsibilities, it may well take note of the likelihood of disastrous consequences from a guilty verdict and elect to acquit: The result would be that as against some defendants it would appear that the substantive law was not being enforced. One wonders at this point whether Decision Theory is not mediation under another name! (Ligertwood, 1976: 374)
Troubles with Kaplan
Kaplan is the root of Kaplanism, and Stein, Hamer and Walen, for example, still cite the original article without qualification (Hamer, 2004: 83; Stein, 2005: 172; Walen, 2015: 406). Clearly Kaplanism supersedes Kaplan, but this citing implies something about Kaplanism’s view of Kaplan, that the 1968 article was not itself flawed but merely the start of something new. I do not wish to inadvertently seem to be endorsing that view, but neither do I want to be distracted from my purpose, which is Kaplanism. So I will here briefly outline some preliminary concerns I have about Kaplan’s work while delineating concepts that I will rely on later.
First we should note a limitation of Kaplan. Since his approach is predicated on a decision made by the jury, it is plainly unsuitable for many questions of policy, for example the question of whether to put some evidence in front of the jury in the first place. Such policies may be more consequential than the standard of proof itself. Thus if we are seeking a general quantitative approach to justify policies and make ‘because n, X’ arguments, his approach is insufficient. The standard of proof here cannot function as a scholarly crucible, resolving questions that can then be applied to other questions of policy; this is at best a dead end. It hardly seems conceivable that policy-making in the law in general should be uniquely resistant amongst all human affairs to quantitative analysis and this, in itself, should make us, first, wonder about the almost exclusive reliance on Kaplan’s approach in the philosophy of law; and, second, seek to locate the discussion of the standard of proof within a generalised discussion of policy-making, to consider ‘because n, X’ arguments in the round.
There are then four arguments against Kaplan to which I wish to draw attention, though to an extent they overlap. At the risk of labouring the point, the first is that if we want the standard of proof to be consistent—that is, if we want to have a standard of proof at all—then Kaplan’s foundations in his personalistic theory of probability are unsuitable because of the subjectivity they introduce.
Then there are two arguments against Kaplan’s approach that derive from the nature of the relationship between the sovereign and a juror (or jury). Kaplan’s work is framed in terms of the subjective expected utility of the outcomes of a decision, which is defined in terms of the assessment of a rational person, and to ground the argument he cites the example of an employee making decisions on behalf a manufacturing company (Kaplan, 1968: 1065). Some kind of grounding is required because the most obvious utility to a juror of any outcome in a trial is zero—it makes no difference to them at all. This is a point that we could say is not only a matter of fact but of principle in that today we exclude jurors to whom the verdict would make a difference. However, the relationship between a manufacturer and an employee is different to that between the sovereign and the jury in ways that yield the second and third objections.
Second, before the employee is given any decision-making powers, these must be delegated to them by the manufacturer. The manufacturer does not simply delegate all decisions to all employees, it choses which decisions to delegate to which employees. Therefore, it needs a positive reason to delegate a particular decision to a particular employee. Kaplan provides no motivation for delegating the decision on the standard of proof to the jury. It can be done, but why? This again can be seen when Tribe accuses Kaplan of positing the wrong decision-maker.
Third, the manufacturer has an ongoing relationship with the employee that has a supervisory element that may be formalised through policies. An employee who persistently makes bad or perverse decisions can expect to be reprimanded and sacked, and this commonly inspires the employee to consider the interests of the employer when making a decision. This ongoing relationship is missing in the case of the jury. Thus we may assume the rationality or intentionality of the jury and philosophise our way to some kind of approach that a jury ought to follow when considering the standard of proof. This could be a question of utility or morals for example. But what makes us think that the jury will follow this approach? If the jury retains ‘the divine spark’, as Shapiro puts it, from the early modern period then perhaps we need not concern ourselves with this question. But otherwise, delegation of the decision on the standard to the jury is an invitation to the capricious, the replacement of rules with a lottery (Shapiro, 1991: 241). 12
The fourth objection lies in the equation Kaplan put forward for deriving the standard of proof. There are four possible outcomes that the juror has to consider the utility of: the rightful conviction (true positive), false conviction (false positive), rightful acquittal (true negative) and false acquittal (false negative). Kaplan discards two of these, the two true outcomes, but gives no justification for the simplification. So this narrowing move, which is essential to the actual equation he relies on to derive the standard of proof, and which is relied on by Kaplanism, lacks motivation. 13
It is not so much that Kaplan’s arguments are flawed; it is more that at critical junctures, like Blackstone, he makes no argument at all. As we shall see, these muddy waters have not been clarified by his successors but muddied further.
Death by irrelevance
Let us now move on to Kaplanism. This is exemplified in Walen’s recent account of what he calls the ‘consequentialist’ approach, though my list of ‘Kaplanists’ excludes one of the two consequentialists he lists (including Laudan but not Epps, who so far as I know never relies for his own arguments on Kaplan’s equations) and also include at least two that Walen explicitly excludes from this list (Stein, who Walen puts in the opposing ‘maximalist’ camp, and Walen himself) as both use Kaplan’s equation to justify policies. Hamer, although not on Walen’s lists at all, on at least one reading also falls under the heading of Kaplanist, as do Stewart and Dahlman. Walen’s account of the consequentialists nonetheless remains helpful as a reference for the mechanics of the methodology adopted by more ambitious Kaplanists. 14
An important point is that the narrowing move is retained, allowing the consequentialists (and Walen) to retain Kaplan’s main equation. This is the difference between Walen’s equations 5 and 6. In a development since Kaplan, Walen both offers principled reasons to support the narrowing and—a motivation that we will return to when considering Walen in more detail—alludes to the difficulty of making the problem tractable otherwise (Walen, 2015: 407).
One problem arises immediately if we adopt the sovereign perspective, as opposed to that of a jury or the courts, and with it a statistical conception of probability. For then the true negatives and true positives do not represent equally important concerns, on the grounds of achievement, measurability and incentives. To summarise these arguments, previously made by me elsewhere, we start by noting that the universe of concern to the sovereign must embrace all crimes. One then can say that if the criminal justice system convicts a murderer it has achieved something. On the other hand, if it does not convict someone who is not a murderer, it has achieved nothing. Given that someone has fallen under suspicion, then it certainly does achieve something. But that presumption is the starting point for a court and its smaller universe of cases brought to it, not a sovereign. And without that presumption the system’s failure to convict someone of an innocuous act is equivalent to driving around in a circle; nothing has been achieved and the best we can say of it is that we are pleased not to have had an accident along the way (Cullerne Bown, 2018: 380).
Also, the true negatives, actions where no crime is committed and no conviction is obtained, include innocuous handshakes and almost everything else we all do each day and are thus unmeasurable.
Now, one way to try to skirt these issues is to narrow concern down to a smaller universe. To demonstrate how this fails because of incentives, we can take a universe defined by Risinger. In response to Laudan’s assertion of a sovereign-like view, Risinger has said his primary concern is with a narrower universe of those arrested or otherwise ‘drawn into’ the criminal justice system. But this does not help, for Risinger’s universe lacks the fixed character of that considered by the courts; through the police, the system itself chooses who to draw into the system, a problem that infects any attempt to rely on an administratively-defined universe such as that of the crime numbers issued in England and Wales. Any attempt to use true negatives as a yardstick of success within Risinger’s universe would incentivise police and prosecutors to arrest and prosecute large numbers of the transparently innocent.
For these reasons, the sovereign must abandon the true negatives and look instead at the true positives. Clearly the narrowing move, which involves treating both the true outcomes as equally discardable, cannot survive such a fundamental difference and Kaplan’s equation must be abandoned.
If one wants to discuss the criminal justice system and its role in society, to draw for example connections between the actual level of crime and the way the system works, then one is obliged to adopt the sovereign perspective. Otherwise, we have excluded from consideration, for example, all rapes that are not reported. Thus it is immediately clear that Kaplanism’s reliance on the narrowing move entails with it a foundational assumption that, when properly appreciated, makes it irrelevant to the vast bulk of discussions in which it is deployed.
Death by omission
Let us put the above argument to one side now and continue with independent arguments. Look at the right-hand side of Stein’s version of the equation, which we can see depends purely on the ratio G/I. It might look at first glance as if there are two degrees of freedom here, but in fact it is just one. And this ratio is nothing but an assessment of the relative importance of a false acquittal and a false conviction. True, in Stein’s case the assessment that has been chosen is welfarist, but in the end the decision to adopt such an approach is itself a matter of choice. So G/I is nothing but a kind of one-dimensional value judgement about the relative importance of the two possible errors, which is to say, a form of n familiar to Blackstone himself.
Further, despite variations in terminology, this is the essential character of the formula for the standard of proof presented by all the Kaplanists. It doesn’t matter whether the four possible outcomes are nominally considered in terms of their disvalue, disbenefit, disutility, social damage, harm or cost; by the time the narrowing move has been adopted, the ratio constructed from them amounts to a value judgement of the relative importance of a false acquittal and a false conviction,. This is not to change the meaning of the author’s words, but merely to use algebra to abstract from the different equations that which is common to them all, and Kaplan’s, and which can be expressed as:
For clarity, I will call this the Kaplanist equation. Now, this expression generates a number and as a result in Kaplanism the standard of proof is conceived of as a number, the SoP-number as I will call it. For an example, Walen gives n the value of 10, Blackstone’s ratio, and the expression then yields for him a value of 0.91 (Walen, 2015: 408). But what does such a number signify?
Let us investigate by considering how Kaplanism generates a policy from the SoP-number, that is how it attempts to construct a meta-rule in the form ‘because n, X’. Despite the presence of n in the equation, this requires some effort because it isn’t clear that the number yielded by the equation is equivalent to a policy. If the guidance to a jury signified by the label ‘beyond a reasonable doubt’ is a policy on the standard of proof, then a bare ‘0.91’ by itself falls short. More work is required to turn the number into a policy.
One way to do this is to use the SoP-number as a guidance figure. So instructions are given to the jury along the lines of ‘convict if you are more than 91 per cent sure that the defendant is guilty’. As a threshold, this gives the number a similar role to that it plays in Kaplan’s decision theory, the difference being that the threshold is determined not by the jury but for them.
In this approach we can see that the guidance figures are significant in their relation to each other; giving the jury a value of 90 per cent is likely to lead to fewer convictions than a value of 50 per cent. But it is not immediately clear how the guidance figures relate to either: (i) existing standards such as ‘sure’ or ‘beyond reasonable doubt’; or (ii) outcomes, in terms of actual numbers of people falsely acquitted, falsely convicted, truly acquitted and truly convicted as a consequence of such a policy.
The first problem can be regarded as one of empirical calibration. Through a series of experiments we could estimate the guidance figures that lead to similar cases resulting in convictions as existing non-numerical standards such as ‘beyond a reasonable doubt’, ‘clear and convincing evidence’ and ‘the balance of probabilities’. Indeed, a flurry of such work followed Kaplan’s paper. 15
But we need to be careful about what it is that we are calibrating. This is the use of numbers in guidance to juries. It is a kind of psychological question and at the end of the calibration the numbers involved would still tell us nothing about the objective outcomes of the system.
Turning to the question of any significance for the SoP-number in terms of the four outcomes, if, like Kaplan, your horizons are narrowed down to those of a single jury and your approach is subjective, then such a question is beyond contemplation. But if your perspective is that of someone concerned with the system as a whole, such as the sovereign, the universe of outcomes from a series of trials are of concern.
There are ways to go about establishing a correspondence between the subjective and the objective. 16 But the very existence of such techniques demonstrates that there is no prima facie reason for expecting any correspondence between the subjective analysis that yielded the SoP-number and the objective concerns of the sovereign. And without a resolution to this problem we cannot reason in either direction between theory and the real world; no arguments can be made from empirical evidence, nor can any implications of any policy on the standard of proof be identified.
As far as I can tell, in the legal context possible techniques for establishing a correspondence remain unexplored (and the logic of this article is that they would fail to help anyway, instead merely allowing arguments presented below to flow back and kill Kaplan). What is certain is that none of the works cited in this article, which cover the breadth of contemporary scholarship, try to establish such a correspondence or cite any other to this end. Nonetheless, Kaplanism relies on such a correspondence so that the SoP-number is treated as determining both guidance to the jury of some kind and a ratio of (the risk of) errors in the outcomes in which this guidance will result. 17 Thus part of Kaplanism’s established vernacular concerns “the distribution of errors”, where the idea is to adjust the standard of proof to yield a particular ratio.
This yields a second kill of Kaplanism. It relies on a methodological assumption for which it has offered no positive justification. This is dangerous terrain as one may end up trying to make a soufflé with pebbles on the basis that we lack eggs. This is a fundamental failure of reasoning, and no one can trust conclusions built on an edifice with such ghostly foundations.
Death by DeKay
Disregarding the above and for the sake of argument, let us cede this ground to Kaplanism and continue to follow its course. The jump from the subjective to the objective is assumed to be valid, discussion is framed in terms of achieving a given distribution of errors and discussion turns to the question of what the SoP-number ought to be. 18 One way to appreciate the consequence of this move is to see that n, which started off as a value judgement, acquires a second role, that of an objective.
But DeKay already concluded in 1996 that, ‘To the extent that jurors’, judges’ and legal scholars’ notions of correct standards of proof are based on desires to bring about particular error ratios, such notions are founded on presumptions that are fundamentally invalid’ (DeKay, 1996: 132). Almost 20 years on, DeKay’s argument remains un-rebutted and this in turn has prompted two kinds of reaction: acknowledge-and-tolerate and a retreat into the non-quantitative.
Examples of acknowledge-and-tolerate include Walen and Laudan. 19 Both acknowledged DeKay’s conclusion but both then have gone on to create a body of work that is meaningful only if such a flaw can be sensibly tolerated.
In Laudan’s case, the movement spans a decade and can be seen in the rigour with which he attempts and then abandons a ‘because n, X’ argument for the standard of proof in 2006, partly thanks to DeKay,
20
but then goes on to rely on the distribution of errors in later work. Thus in 2016, he says: A legal system that produces four-to-five times as much harm from its false negatives as from its false positives (when the actual ratio of those harms is 2-to-1) is a system that is both making more overall errors and generating more overall harm than it would be if it were producing errors in proportion to the harm ratio between those errors. (Laudan, 2016: 80)
In Walen’s case, the movement is compressed into his recent article. He acknowledges DeKay at the outset and says that any calculations can only be ‘suggestive’ but (i) the grounds for even this are not set out; and (ii) the extent to which we can rely on the suggestiveness is not stated (Walen, 2015: 358). Since DeKay has not convinced him to abandon the distribution of errors, he naturally relies on it in his own cause, for example here: It is hard to find statistics regarding how often the actus reus is not in doubt in a trial, but if we assume that it happens in a quarter of cases, and if we continue to assume that the disvalue of convicting the innocent and letting the guilty go free are on a par, then we should adjust the harm of convicting the innocent downward slightly from 2 to 1.75. The SOP then becomes 1/[1 + 1/1.75] = 1/[1.57] = 64%, even lower than before. (Walen, 2015: 415)
The retreat from quantification is significant because if quantification is not the end goal, what is the point of the exercise at all? Examples of it can be found in Hamer’s work and the ‘Blackstone principle’ reported by Epps. Here is how Hamer reported ‘the conventional risk-allocation rationale’ for the criminal standard of proof in 2007: The application of the criminal standard determines the ultimate issue—whether the defendant is convicted or acquitted. There are two possible erroneous outcomes—wrongful conviction (false positive) and mistaken acquittal (false negative). The former is viewed far more seriously than the latter. Setting the criminal standard at a high level favours the defendant over the prosecution. It reduces the risk of the worse error, although at the same time it increases the risk of the less harmful error and the overall expected error rate. (Hamer, 2007: 326)
To avoid putting words into Hamer’s mouth we therefore have to take account of two lines of reasoning. One justifies the standard of proof through the argument that can be constructed from Hamer’s 2004 article, that is via the distribution of errors. In this case, ‘Hamer 1’ falls into a similar category as Laudan and Walen above; he acknowledges the subjective-objective problem (Hamer, 2004: 87) but then proceeds to treat it as tolerable (while, a slight variation, ignoring DeKay’s criticism of the distribution of errors per se). So let us put that behind us and concentrate on ‘Hamer 2’ which, by contrast, is cut off from such a justification. Let us look then at Hamer’s conventional risk-allocation ‘rationale’ on this naked basis.
Most of the text given by Hamer is descriptive; the statement accurately describes some consequences of adjusting the standard of proof. However, when we look for content that is deductive we draw a blank; there is nothing that allows us to derive today’s standard of proof. Acknowledging that there is a dilemma does not resolve it. We understand that moving the standard up or down will have consequences, but what justifies setting it here? What in this argument justifies choosing any particular policy option rather than another that is somewhat stronger or weaker, or indeed much stronger or weaker? 23 Scrutinising this statement through our epistemological lens, we can see that it is in the form ‘because n, X’, with n vaguely defined though clearly greater than 1. But there is no route from this vague n to the conclusion about the standard of proof. The quantitative framework that Kaplanism promises is notable only for its absence.
Or consider the ‘Blackstone principle’ reported by Epps in 2015 as being representative of a wide body of jurisprudential opinion in the United States and the justification for a wide range of policies. Kaplan is not mentioned by Epps, so the quantitative backdrop of Kaplanism, set out so scrupulously by Hamer, is implicit rather than explicit here, revealed by the absorption from Kaplanism of the language of the distribution of errors into the ‘principle’: Blackstone’s ten-to-one ratio and its variations can’t be taken literally. There’s no way to measure the exact ratio between the false convictions and false acquittals our system creates, and no one seriously advocates that it is critical to strive for exactly ten false acquittals for every false conviction. Instead, the ratio serves as shorthand for a less precise—but still important—moral principle about the distribution of errors: we are obliged to design the rules of the criminal justice system to reduce the risk of false convictions—even at the expense of creating more false acquittals and thus more errors overall. (Epps, 2015: 1072)
The retreat from the quantitative that we see in these two examples is also a retreat from cogency. There is no meta-rule here and we end up no clearer about why we are pursuing one policy as opposed to another than we were in Blackstone’s time. They contain in deductive substance nothing more than the observation, obvious to a child, that raising the standard of proof will lead to fewer wrongful convictions and more wrongful acquittals. This is the reason for my inverted commas around ‘principle’ and ‘rationale’, for although these statements purport to offer a justification for policies, they don’t. Kaplan’s theory and the later extensions have become merely a rhetorical or linguistic backdrop; like Alice’s Cheshire Cat, the body has gone and only the grin remains. Thus these formulations abandon the ground that Kaplan originally claimed, that of a quantitative and scientific basis for the standard of proof, and mark an implicit admission that the project has failed. 24 This kills the second line of defence against DeKay and hence yields the third substantive kill.
Death by logic
The continued reliance on the narrowing move and the Kaplanist equation shows that the attack of DeKay (and observations of problems by others), however right they may be, have left Kaplanism only half dead. Here I will provide new arguments to i) arrive at a similar conclusion to DeKay; and ii) go further and show that the problem identified by DeKay cannot be fixed. For those without training in the kind of statistics DeKay relies on, these arguments are I believe easier to follow and should allow readers to understand more clearly why the equation cannot be used to arrive at conclusions that are even suggestive. They will also provide some of the weaponry with which I claim to refute the quite independent law-and-economics approach of Kaplow.
The three roles of n
What I am setting out to do now is to kill again the use of Kaplan’s equation in the determining of policies, including the standard of proof. To clarify the picture, I will distinguish between three different and progressively more ambitious roles that we can give to n and which are usually treated by Kaplanism as indistinguishable, interchangeable and indivisible. These are:
n-the-value judgement, a number expressing a view about the relative importance of the two errors;
n-the-objective, a number that specifies an aspect of the criminal justice system’s performance that we want it to achieve
n-the-objective-as-meta-rule in which the extent to which a policy helps us to achieve the objective n determines which policy we adopt.
The potential importance of these distinctions can be seen in Laudan’s original attempt to develop a meta-epistemology for the criminal justice system, at the point when he turns away from using n to set the standard of proof. The reason he gives for abandoning the attack is that ‘…it is arguable that there is no machinery for generating a SoP that will capture that ratio in question for every conceivable distribution of guilty and innocent defendants’ (Laudan, 2006: 74). The ratio in question is n and this makes clear that it is n-the-objective-as-meta-rule that he is rejecting. This leaves untouched n-the-value-judgement, leaving open the interesting question of whether it might be possible to derive the standard of proof, or other policies, from a meta-rule that requires n to provide only a value judgement.
My approach will be statistical, but it could as easily be framed in terms of classical probability. For if our aim is to allocate the risk of the two errors, then if we are successful over a run of cases the ratio of the tallies of errors will tend towards the ratio of the risks. Thus adopting the distribution of errors strategy, even when expressed in terms of risks of error, is indistinguishable from giving n the role of an objective.
I will now set out to kill both n-the-objective and its use as a meta-rule.
n-the-objective leads to perversity
Let us subject n-the-objective to the scrutiny of statistics.
A performance indicator is a number that we can calculate from measurements of a system and to which we give an interpretation so that we know that movements in the number indicate that things are getting better or worse in some respect. It falls within the scope of what Hand refers to as ‘pragmatic measurement’. 25 That is, we are not measuring a quantity, such as weight, given to us by Mother Nature. Rather, we are seeking to invent a meaningful way to assess an aspect of the performance of a system that concerns us.
Universities, like other big organisations, are these days replete with performance indicators such as ‘the number of articles published in highly-ranked journals in the past three years’. The evident limitations of such numbers do not erase their essential character and purpose. Generally, higher is better, but it could be the other way around. Of course, if a higher value for a performance indicator indicates better performance, it is natural to make changes that we believe will lead to higher scores, that is, to make higher scores an objective. To establish that n is not a performance indicator I will demonstrate that it is unconventional and fails under any interpretation to reliably indicate an improvement in performance.
Binary classification is a generic name for processes that allocate things to one of two categories, not always perfectly. Thus all binary classification can be modelled with the 4-gram. Diagnostic tests in medicine are an example.
A conventional performance indicator in binary classification constructs a ratio in which an indication of error is compared with an indication of success. Such indicators include those by the name of precision, specificity, number needed to treat, accuracy and others. Here is a simple example used in medical diagnostics and information retrieval, the discipline of internet search engines, in which false negatives are assessed according to the yardstick of true positives:
Now n can be written as FN/FP. This evidently contains no indicator of success and is thus unconventional. We will now see that n suffers from defects that the conventional indicators do not.
The point of any kind of indicator is to tell us something useful. It’s only a number; it can only go up or down. Such movements need to have a natural interpretation so that we reliably know that we are doing better or worse. It is clearly fatal to the usefulness of an indicator if it tells us we are doing better when actually we are doing worse. But with n, neither higher nor lower scores are reliably better.
With n, higher is not obviously better. An increase in the score may result from an increase in the numerator, the false negatives, which is to say from an increase in the number guilty people acquitted.
At the same time, with n, lower is not obviously better. A decrease in the score may result from an increase in the denominator, the false positives, which is to say from an increase in the number of innocent people convicted.
Consequently, it is not clear whether movements in either direction are good or bad.
Even so, can n instead be retained as the objective for the system, so that if I choose a value for n, then the closer we get to that, the better? No, and for the very reasons given above. If we compare two results, both less than the n that is our objective, then the one that is closer to n may derive purely from an increase in the number of guilty people acquitted. Equally, if we have two results above our objective n, the one that is closer to n may derive purely from convicting more innocent people.
Let us illustrate this argument with two policy-making examples. Suppose we have chosen to use n as an objective and that our target is n = 10. Now suppose we discover that we are scoring too high and that our actual score is n = 20. To bring us closer to n = 10, we can choose to increase the false positives, that is the number of innocents we convict. For example, we could deny the defence the right to cross-examine prosecution witnesses.
Equally, suppose we are scoring too low and achieving only n = 5. Then we can improve our performance as measured by n by increasing the number of the truly guilty that we acquit. For example, we could deny the prosecution the right to cross-examine defence witnesses.
As policy choices, these examples may seem fantastical; they are certainly perverse; but within our purely epistemic frame of reference they are merely one possible and logical response to treating n as a sensible objective. By contrast, let us look again at sensitivity, which is TP/(TP + FN). This indicator provides no encouragement for perverse policies. If you want to improve your score you must either increase the number of true positives (rightful convictions) or reduce the number false negatives (false acquittals). Some may not care as much as Laudan about false acquittals, but even they can hardly say it is perverse to reduce their number.
Consideration of sensitivity would rule out the idea of denying the prosecution the right to cross-examine defence witnesses as this would likely increase the number of false acquittals (FN) and decrease the number of rightful convictions (TP). Its sister in diagnostics, specificity, would similarly rule out denying the defence the right to cross-examine prosecution witnesses.
Recall that to be used as a performance indicator it is not enough for n to be arithmetically computable; we must also give movements in it an interpretation; and movements in the indicator must then reliably indicate improvements or worsenings in actual performance. We have now considered the three possibilities and see that in all three cases n emerges as an unworthy objective because it may so easily mislead us into perverse policies.
The upshot of the above argument is that n cannot in general sustain the role of an objective. Consider a court to which we bring 1,000 cases, in which in point of fact half are truly guilty, half innocent. Suppose we have chosen n = 10 as an objective. Then we can achieve n with any of the outcomes shown in Table 1.
Since a single value of n can be consistent with systems that are vastly different otherwise, how can making it an objective constitute a meta-rule?
As can be seen, the number of true positives, of truly guilty defendants who we manage to convict, varies from 490 to 0. No kind of policy could possibly be justified purely on the basis of n because the variation in the performance of other aspects of the system that it allows is so extraordinarily wide. In particular, the sovereign cannot accept a system designed so that of 500 truly guilty brought to court, all are expected to be acquitted.
n-the-objective is useless as a meta-rule
If we focus specifically on X = the standard of proof, then an objection to this may be based on the chart from signal detection theory used by DeKay that has often been used in this debate (DeKay, 1996: 101) (Figure 2).

For a given pair of curves, a ratio of fn to fp, which is to say n, uniquely determines the position of the dotted line, which is the threshold of confidence, or standard of proof. 26
If we assume that the rest of the setting is consistent, this shows that in the real world a choice of n determines the standard of proof, and vice-versa. 26
However, observation of this connection does not provide a complete meta-rule. We have no idea what shape the actual curves are in reality for our system and any number of true positives from 0 to 490 may result from our choice of n. In fact, even if we did know what the actual curves are, if we make the decision on the standard of proof purely on the basis of n, we are choosing to ignore what they tell us about the true positives. To be willing to manage the system in this fashion is to be entirely indifferent to the number of true positives, and this is an unsustainable position for the sovereign. For example, if the result was actually 0 true positives, the sovereign would have to lower the standard of proof in order to increase the number of true positives even though this would reduce the value of n. Thus as a meta-rule, n-the-objective is inadequate.
A kind of counter-argument to this is to say, well, that is a rather unrealistic scenario. No one thinks our current systems are convicting none of the criminals brought before them. In the real world, we are not doing too badly at convicting the guilty and can use n in this way. But the full fabric of this argument is that it asserts a threshold of acceptability for the rate of true positives (along with a second assertion, that this is being met, and then a third assertion, that any policy option under consideration will continue to meet it). The truth of any of these assertions may be challenged but the key point is different, it is that to rely on these assertions is again to accept that n-the-objective by itself is inadequate as a meta-rule.
Now, this argument is not limited to high values of n. If n is less than one, say 1/10, the opposite of Blackstone’s ratio, a mirror image argument applies. No sovereign can accept a system in which of 500 innocent people taken to court all 500 are convicted. If we go to the middle, choosing n = 1 for example, then things get no better. In that case the sovereign may be asked to accept the doubly unpalatable proposition that out of both of the 500s, all are handed down the wrong verdict.
You may object, like Risinger, that I am setting up a straw man (Risinger, 2010: 999). Of course, you say, no one suggests that the system should be managed only on the basis of n. We also must try to minimise the number of errors altogether; that is an essential part of the context of the discussion about n. This point of view is so common that in 2006 Laudan described the twins of error reduction and error distribution as two of the three values driving the criminal justice system in the United States (Laudan, 2006: 1).
This is to deny n-the-objective a role as a sole meta-rule. Instead, it is now given the role of a mere component in some more complex meta-rule, a larger scheme that has to provide the basis for deciding between policies. Let us look into this. What actually is this larger scheme? It has not been articulated by Kaplanism and we might quite reasonably suggest that 50 years is surely long enough to find the answer if there is one. But let us do better than that and demonstrate that such a larger scheme cannot exist.
To go down this road we must have a second indicator to measure our success in error reduction. One way to do this would be with what is sometimes in information retrieval called accuracy:
Lippke in effect elects to use precision, which focuses on one error, the false positives, assessed against the yardstick of true positives. 27 Laudan instead plumps for the true negatives, putting forward the idea of generating a meta-rule to establish the standard of proof by combining n with what he calls m:
A fourth way, suggested by the counter-argument above concerning true positives, would be to use the rate of true positives, TP/(TP + FN + TN + FP).
However it is done, let us call this second indicator Factor 2. In terms of the policy choices we make, there is no reason why we should expect consideration of n and Factor 2 to point to the same answer. Indeed, the point of having Factor 2 is precisely that it should be capable of leading us to a different conclusion from consideration of n alone. Thus we will need some way of deciding what to do when they point to different conclusions. We can only resolve this question by making a second value judgement, this time about the relative importance to us of n and Factor 2.
This second value judgement, which for want of a better term we may call Judgement Q, is one for which we are unprepared. You can try and wriggle out of making a Judgement Q, for example by saying you will use Factor 2 (or indeed n) as a threshold. But such moves do not save you from making a value judgement. Where will you set a threshold? If it is high, the second consideration will never have any effect; set it low and the first consideration becomes irrelevant. Either choice is laden with both values and consequence. Set it in the middle and sometimes one consideration dominates, sometimes the other; which is again to determine certain outcomes. In short, the point at which the threshold is set is as decisive as any other form of Judgement Q.
So, before we turned to n, we knew we had a choice to make about how demanding the standard of proof should be. n was supposed to provide a meta-rule that would save us from ‘having to pull the SoP blindly out of a hat’ as Laudan puts it. But in fact, all it has done is to defer the rabbit-from-hat spectacle to the later, Judgement Q, stage.
Thus inadequacy is not the end of it. Ultimately, in order to fix the inadequacy, we have to make n-the-objective-as-meta-rule redundant by magicking up another value judgement to complete the meta-rule. It is in other words useless.
To put it another way, the choice of a standard of proof in a quantitative framework entails nothing more than choosing a number, which we understand represents a value judgement. By trying to use n-the-objective, we end up having to choose two numbers representing two value judgements.
Now, unhelpful as this evidently is when regarding the standard of proof in isolation, might this actually be OK if we could make decisions about all our policies using these two value judgements? Maybe we should adopt Judgement Q and build a quantitative framework around that.
Consider then a general framework in which we attempt to balance the competing demands of Factor 2 (error reduction) and n-the-objective (error distribution) by agreeing to trade a quantum of one for a quantum of the other according to some set of rules. To make this work, we will need to adopt one of the three possible (and flawed) interpretations of n-the-objective; for argument’s sake, let it be the same as was implicit in the standard of proof discussion above, where the closer we get to n the better. The trading then would entail allowing the overall rate of errors as measured by Factor 2 to increase in order to allow us to get closer to n. But as set out earlier, our sacrifice of accuracy may be buying no more than perversity. To avoid this, we will need to introduce at least one further indicator, call it Factor 3, and one further value judgement, call it Judgement R, the forms of which at this stage are not clear. And who knows whether this would be enough?
The point is, first, that we have not been able to generate a cogent meta-rule, even if we consign n-the-objective to the lesser role of a mere component; then, second, that n-the-objective is not serving the purpose we want of it. It is supposed to be a guiding principle that shapes our policy decisions and saves us from being obliged to make other ad hoc value judgements. But, like pre-Kepler astronomers convinced the planets move in circular orbits, we find ourselves being forced to make clunky fixes that, upon closer examination, leave us needing yet more fixes. The problem is the same in both cases: the underlying mathematisation of the system is flawed. In our case, put simply, the parsing of concerns into the categories of error reduction and error distribution, not found outside jurisprudence, does not work; it is a false dichotomy.
Conclusion
To summarise the kills I am claiming:
1 Death by irrelevance: In order to achieve relevance, Kaplanism is obliged to adopt the sovereign perspective. Once it does so, the Kaplanist equation, based on the narrowing move, ceases to be justifiable.
2 Death by omission: Kaplanism has not provided a positive justification for relying on the Kaplanist equation within an objective conception of probability, so that it has failed to justify the assumption that the SoP-number corresponds to a distribution of errors.
3 Death by DeKay: Kaplanism has failed to rebut DeKay and neither acknowledge-and-tolerate nor the retreat into the non-quantitative are effective in overcoming this: acknowledge-and-tolerate because it fails to set out the positive grounds for believing DeKay can be tolerated; the retreat into the non-quantitative because it entails abandoning the ground of a quantitative rationale for policies.
4 Death by logic: Kaplanism treats n as an objective and then uses that as a meta-rule, roles it cannot support. This demonstrates the failure of the distribution of errors approach, even when combined with attempts to also improve accuracy.
These four kills can be summarised as: Kaplanism was a mistake to start with; no good reason was initially given for adopting it; DeKay said it was a mistake and never received an adequate answer; here’s an explanation of why it can never work. To this we may add a point that is not methodological but is telling. Despite half a century of scholarship, Kaplanism has failed to yield a ‘because n, X’ argument for the standard of proof that the wider reaches of jurisprudence and the law itself are prepared to actually rely on, even in the United States; elsewhere it is largely ignored.
I hope that having been killed four and a half times over Kaplanism will stay dead. But we need to look back on it as a phenomenon in order to take stock of its legacy. I think Kaplanism is best seen as part of a more general attempt around the 1970s, documented by Volokh as ‘n law’, to finally escape the need for the jump in reasoning that can be traced to Blackstone and plant both feet in modernity. Given the central importance of the jury in the Anglo-Saxon imagination of the law, the lure of Kaplan’s juror’s-eye view has proved understandably enormous. The fact remains, however, that this cannot be married to the long-standing and essential reliance on policies. Flaws that were at least implicit in Kaplan’s original paper have been magnified rather than eliminated by later scholarship.
Stein is the leitmotif for in his case—by virtue of his very seriousness—the failures of Kaplanism spill out, affecting not only conclusions but also assumptions.
As we have seen, the problem of resolving our many epistemic policy dilemmas cannot be resolved by parsing concerns into the pair of error reduction and error distribution; this leads only to the incoherence of Judgement Q. But Stein is locked into this false dichotomy and finds himself driven to choose error distribution over error reduction and thus explicitly rejects the conventional position that a trial is first a search for the truth: Under this theory, the key function of evidence law is to apportion the risk of error in conditions of uncertainty, rather than facilitate the discovery of the truth. (Stein, 2005: x)
So for scholarship, the lesson is that Kaplanism is a foundational error and it is to be expected that its corrupting influence has been widespread both in thought and language. There are scholars in all the Anglo-Saxon jurisdictions working in this tradition. As we have seen, it spills across Walen’s division of authors into competing camps and out into those, such as Hamer, in neither camp. 29 Linguistically, it has seeped out into the Blackstone ‘principle’, which, if Epps is right about its fundamental position in US legal thinking, makes it highly consequential.
For the law itself, the lesson is the need to confront its incapacity. We can take as a case study the debate stirred up recently by Laudan in which he claims the system currently fails to convict too many criminals. Put simply, Laudan lacks a methodology capable of turning his view on the relative harms of false convictions and false acquittals into a decision on policy. His arguments from evidence therefore dissolve, not partially due to the poor evidence available and the difficulty of establishing a value for n that reflects our values in the first place, but entirely. But equally, his opponents have no basis for claiming that their preferred policies lead to better outcomes. From an epistemic point of view, the entire debate is ungrounded.
From my naturalistic perspective, I look at all this and see half a century lost in confusion.
*****
Since Blackstone, the law has lived a double life. In the epistemology of the courtroom it sincerely pursues an objective and demotic truth with all the tools at its disposal. Despite being corralled in a subjective mode of probability, it confronts its predicament in modern fashion. In the meta-epistemology of its policy-making its methodological problems have made it incapable of doing the same. Despite having the possibility of working within an objective form of probability, it has been unable to escape pre-modern methods that lack quantification, empiricism and an effective mechanism for democratic oversight. Thus two quite different ways of thinking have been obliged to co-exist, and have managed it in my view in no small part thanks to the bridging mechanism of Blackstone’s dictum. Whether this way of resolving the problem of proof, inherited from the early modern crisis described by Shapiro, can continue is not so clear.
At the intellectual level, the gap in reasoning in the original argument constructed by Blackstone seems to have caught up with it so that the dictum has now run out of steam. Adherence to the Blackstone ‘principle’ can be seen as an attempt to keep the show on the road, but it seems to be an exclusively American interest and even there it is striking that despite asserting that it is the foundation for a very wide range of fundamental principles in the United States, Epps—in order to attack it—had to articulate it himself. Where are its legions of defenders?
As the Goliath stumbles so two incommensurate and competing attempts have emerged to replace it as a source of authority, Kaplanism and the non-quantitative approaches. Both desire to finally escape the need for the jump in reasoning in Blackstone and both break with the dictum by abandoning any attempt to build a bridge between the quantitative and non-quantitative. Kaplanism does so by attempting to move forward and plant both feet in the quantitative; the non-quantitative approaches do so in exactly the opposite way, by retreating from the element of the quantitative that Blackstone introduced.
The problem with this second option is that quantification is a characteristic element of modernity. To abandon it is to abandon modernity. Porter has described quantification as ‘a technology of trust’, a more potent form of objectivity than the mere promise of experts to deny their personal inclinations and hence turned to especially in contested domains, which has spread through business, government and the social sciences. Today we can see that it is pervasive outside of law and religion. The need for trust in the law is exactly what Shapiro identifies as the driver of the development of policies in Blackstone’s time. If we now frankly abandon the technology of trust, which Blackstone’s dictum seemed to promise, the risk is simply that people will stop trusting the law, that the crisis of the early modern period will resurface. Given the parallel declines in the authority of Blackstone’s dictum and public confidence in our criminal justice systems, we should be alive to the possibility that this has in fact already started (Darbyshire, 2015; Porter, 1996).
Importantly, Porter describes how quantification is generally not imposed by but on institutions and disciplines. It was not accountants that insisted on the kind of standardised quantification that exists today in company accounts but tax collectors. It was not engineers who insisted on cost-benefit analysis of potential bridges but politicians. It was not teachers that insisted on IQ tests but the army. Thus the same may be possible in the law, that quantification will not be imposed by jurisprudence but on it, a harbinger of which may be the spreading use of algorithms in bail hearings in the United States (Porter, 1996: 115; Simmons, 2018).
At the practical level, what we would like to do is to bring together the law’s social purpose, its quest for truth, our concern for the two errors and the difference between them, our morals, and understanding of the practical consequences of any change to make decisions on policy under ultimately democratic governance. This would be normal in every other sphere of our society. But the methodological problem perennially undermines the deliberation by locking out any empirical arguments and over the past 50 years it has become clearer and less acceptable that the current approach is failing.
The papers emerging from the recent Seton Hall symposium contain denunciations of both the level of false convictions and false acquittals in the United States. 30 Lest we think this a peculiarly American problem, note that all Western nations today fail to deal effectively with rape. Likewise the fight against terrorism, which, to the extent that it elects to operate within the law at all, the executive commonly arranges to fall outside the ambit of normal criminal law. These are profound and pervasive failures whose chronic nature indicates that the problem is not one of detail but is rather hardwired into our current conception of criminal law, so that it cannot be fixed through the everyday evolution of policies.
The degree of failure should be a spur to action. Historically, the law was the crucible of probability and many of those today remembered by mathematicians as founding fathers were also lawyers, including Bacon, Leibniz and some of the Bernoullis. The vast theory of probability, we could say, is a spin off from the law’s struggle with the problem of proof (Franklin, 2015: 1). But today necessity is the other way round. If the law is to make progress with quantitative approaches and thus make itself fit for empirical argumentation, it needs to spin in expertise from other disciplines.
*****
Leaving Kaplan and Kaplanism aside, what we should expect of a quantitative approach to the criminal justice system can be simply outlined. At the end of a session, say, a court has heard a number of cases. Some of the defendants have been convicted, some acquitted. Some rightly, some wrongly. Given god-like insight, therefore, we could compile a table with a column for each of the four outcomes. If we now imagine the court having the same cases brought to it but operating under a different set of policies, we would expect the four columns to have different numbers in them. The outcomes of the two scenarios, X and Y, might then look as shown in Table 2.
Different policy frameworks, such as X and Y in this table, may lead to different sets of outcomes. Providing a rule to choose between any X and any Y is the basic problem any quantitative approach to the criminal justice system faces, and which Kaplanism has failed to answer.
We then face the question of which set of outcomes, the X or the Y, we consider better. It is obvious that such a conclusion cannot be reached without making a value judgement, possibly several, about the relative importance to us of the four possible outcomes. But given such value judgements, we want a transparent and appropriate computational framework that will allow us to decide between X and Y. Otherwise, we are not choosing between X and Y and the policies they entail on the basis of the value judgement(s) but on some other, hidden, basis.
To accept the reasonableness of this argument is to accept also the importance of establishing a suitable computational framework. In the hunt for such a thing, one lesson from the Death by Logic argument above is that we should think more carefully about the roles we give to the numbers we can calculate. In particular, we must reject n as an objective or meta-rule based on that objective but have no reason not to retain n as a value judgement. In consequence, rather than thinking about n as an output of the computational framework we could think of it as an input.
My suggestion is to turn to binary classification, a field which did not exist when Kaplan and Tribe were writing on this, but which has emerged to support the recent explosion of computerised data analysis. This can provide a mathematisation of the system we inhabit that relies on n only for the input of a value judgement. And once we begin making empirical measurements, it allows us to make policy through ‘because n, X’ arguments that are both logically cogent and democratically governed, thus making sense of Blackstone’s dictum and allowing us to begin to have a consequential debate about values of n. This is not an attempt to have mathematics make the decisions for us, deus ex machina, or to bypass moral reasoning. Nor does it wed the law to, for example, a utilitarian approach; morals may be as a good a basis for advocating a value for n as any other. Nor does it require us to believe that epistemic considerations are the only ones that matter. Rather, in a world where we rely on policies to manage the system and generate trust, and believe that wrongful convictions and wrongful acquittals are an objective fact, it merely offers us a coherent methodology for considering the trade-offs we inevitably face (Cullerne Bown, 2018).
For many, I suspect the main argument for discounting this kind of approach is the empirical one: even if we solve the theory problem and find a suitable computational framework, we don’t have god-like insight and so can’t obtain the numbers to put against X and Y in the table. With insight gained from other disciplines, I believe this challenge can also be overcome. 31
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
Appendix: refuting Kaplow
The techniques developed in this article can also be used to attack work in the law and economics tradition, for example Kaplow, which is an especially validating test given that he does not use n at all.
Kaplow wishes to include criminal law within a general framework for setting the standard of proof in all fields of law. In the criminal context I see three problems in his argument that are sufficient to refute it: an inappropriate framing, a wrong perspective leading to fatal problems being overlooked and ultimately a useless meta-rule.
First, he starts with a version of DeKay’s curves from signal detection theory that we have in Figure 2, but instead of the curves representing the apparent guilt of the truly guilty and truly innocent, they now represent the strength of evidence of harmful and benign acts. In order to determine the appropriate standard of proof he counterposes ‘effects on the deterrence of harmful acts and on the chilling of benign acts’. But this foundational framing is not applicable in the criminal justice system, where the perils of, for example, misidentification mean that it is not necessary to commit any act at all in order to be falsely convicted, and a false conviction is harmful in itself even though it may chill no behaviour.
Second, he envisages the system having two stages: First, some sort of scrutiny brings a portion of individuals into the legal system on account of the acts that they are observed to commit. Second, for those brought into the system, adjudication assesses the evidence in order to decide whether to exonerate or instead to assign liability and impose the applicable sanction…
Kaplow ends up counterposing two probabilities, PHARMFUL(xT) with PBENIGN(xT). In the context of criminal law, PHARMFUL(xT) is the likelihood that someone whose act was criminal will be convicted, PBENIGN(xT) the likelihood that someone whose act was not criminal will be convicted. For him these are important factors in the degree of deterrence and chilling resulting from a standard of proof defined by xT. In our notation they correspond to tp/(tp + fn) (harmful) and fp/(fp + tn) (benign), where the use of lower case indicates that we are dealing with probabilities rather than tallies; in turn, these are functionally equivalent to the indicators of sensitivity and specificity used in medical diagnostics.
The chilling effect is computable within Kaplow’s universe, but that doesn’t really get us to the reality of our interest in chilling. At bottom, that concerns not the likelihood that I will be convicted, given that I have been drawn into the system, but the likelihood that I will be convicted for a non-criminal act without that assumption. And then the relevant set of actions consists of all my acts in the real world that are not criminal, which is evidently not countable. This is one of the reasons I have already given for the sovereign being obliged to avoid relying on TN (or tn in the definition of PBENIGN(xT)). Thus the calculus is incapable of enabling computations when applied to the universe that is required for its results to be relevant.
Third, what Kaplow aims at is a welfarist calculus in which the impact on his two probabilities of altering the standard of proof is then combined with a consideration of the marginal social impact of both changes to establish an equation in which the system will be optimised when the two sides are equal. Thus the ultimate function of the calculus is to determine a ratio between the two probabilities, call it k, that we should aim to achieve by altering the standard of proof. But k-the-objective-as-meta-rule is useless in just the same way as n-the-objective-as-meta-rule. To see this, let us repeat the analysis of Table 1. Consider a court to which we bring 1,000 cases, in which in point of fact half are truly guilty, half innocent. Suppose we have chosen k = 10 as an objective. Then we can achieve this with systems that deliver any of the outcomes shown in Appendix Table A1.
The conclusion is as before. The sovereign still cannot accept an arrangement whereby it is expected that of 500 truly guilty brought to court 490 will be acquitted, and from here the argument proceeds as before, through Judgement Q. Kaplow’s avowed concern for the true positives has been lost in the detail of the calculus he has constructed.
