Abstract
Given the prevalence of statistical techniques that use probability to quantify uncertainty, the aim of this article is to highlight the theoretical aspects and implications of the major current probability interpretations that justify the development and use of such techniques. After briefly sketching the origins and development of the notion of probability, its theoretical interpretations will be outlined. Two main trends will be distinguished: one epistemic and one empirical, corresponding to the twofold meaning characterizing probability. The epistemic type embodies the so-called classical theory put forward by Laplace as well as the logical and subjective approaches. By contrast, the frequency and propensity theories are, in theory, empirical in character. This way of understanding probability contrasts with both the tenet that there is a “Bayesian interpretation” of probability and the tendency to conflate Bayesian probability with the subjective interpretation, both of which are misleading for reasons that emerge from the following discussion. The final section of the paper addresses the question of which type of probability is best suited for the organization sciences and suggests the subjective interpretation as the best option by virtue of its pluralism and awareness of context.
Inferences made by quantitative researchers usually rely on statistical techniques, such as estimating regression coefficients or using p values for null hypothesis significance testing. All such techniques use probability to quantify uncertainty either directly, as with p values, or indirectly to justify the use of a technique, as with tools like structural equation modeling with maximum likelihood that uses fit statistics and is justified by estimation theory or Monte Carlo methods that all rely on probability. Although the substantive theories that are tested using probability are often well theorized, researchers are usually unaware that there are theories of probability that underlie their methods and that these theories have been the subject of debate for well over a century. This is important because different theories of probability carry different assumptions and have led to different ways of quantifying different kinds of uncertainty. To help researchers understand the theories of probability that can be used to make inferences and to elucidate their assumptions, this paper begins with a historical overview of probability. Then, the major current probability theories are described. The paper concludes by discussing that organization science, as a diverse discipline studying a variety of phenomena with multiple methods in diverse contexts, seems best suited to adopting a subjective theory of probability.
Historical Sketch
Probability is a quantitative notion that assigns a value ranging between 0 and 1 to a hypothesis on the basis of a body of information. This notion of probability emerged around 1660 in the work of Pascal and Fermat. As Hacking notes in The Emergence of Probability, “a problem about games of chance proposed to an austere Jansenist by a man of the world was the origin of the calculus of probabilities” (1975: 57). The reference is to Chevalier de Méré, a gentleman at the court of Louis XIV, who asked Pascal to solve problems in gambling, such as how many dice tosses are needed to have a fair chance to obtain a double-six or how the players should divide the stakes if a game is interrupted. Pascal involved Fermat in the study of such problems, and their collaboration led to formulating the general principles of probability.
Obviously, probability existed before Pascal and Fermat, and problems like those studied by them were previously addressed and given solutions by others, such as Cardano and Galileo. What makes Hacking and others fix the birth of probability to the work of Pascal and Fermat is a conviction that they were the first to see the idea of probability developed for games of chance as a general model for reasoning under uncertainty for all sorts of problems.
For this, Hacking emphasizes probability’s duality, wherein probability is “Janus faced. On the one side it is statistical, concerning itself with stochastic laws of chance processes. On the other side it is epistemological, dedicated to assessing reasonable degrees of belief in propositions quite devoid of statistical background” (Hacking, 1975: 12). Thus, probability can describe phenomena that exhibit random behavior—such as observed frequencies—or it can express the degree of belief in propositions that refer to any events whose outcome is uncertain. To illustrate this duality, Hacking refers to Pascal, who addressed problems of random events, like games of chance, but also applied probability to God’s existence in his famous wager, wherein betting on God’s existence is justified as it maximizes expected gains. So the argument goes, if a probability of one half is assigned to God’s existence, it should be bet on anyway, because the expected gain is two lives instead of one; but it should be bet on even with a very small probability for God’s existence, because this is balanced by an infinite expected value of eternal life. Hacking (1975: 12) notes, “It is no matter of chance whether or not God exists, but it is still a question of reasonable belief and action to which . . . probable reasoning can be applied.”
The fact that probability has epistemic (e.g., belief) and empirical (e.g., frequencies) meanings is at the root of the philosophical problem of its interpretation. Although assigning probability a single meaning became predominant in the middle of the 19th century, giving rise to its different interpretations discussed below, there was a long period in which these two interpretations coexisted and even mingled in the literature, such as in the work of Huygens that followed Pascal and Fermat. Huygens pioneered the study of mathematical expectation, in turn creating an outburst of interest in probability, so that by the turn of the 18th century, probability had progressed enormously and widened its scope of application, facilitated by the combinatorial calculus and studies in statistics and demography.
A pivotal role in this history was played by the Bernoulli family, including Jakob, the author of Ars Conjectandi (1713), who started the analysis of direct probability—the probability assigned to a sample based on a known parameter—and proved the “weak law of large numbers,” which was the first limit theorem. Other Bernoulli family members are Nikolaus and Daniel, achieving key results in the analysis of mathematical expectation and starting the study of distributions of errors of observation (such as today’s “standard error”), which reached its peak in the first half of the 19th century with Gauss. After Jakob, the study of direct probability was carried on by authors such as De Moivre, Laplace, and Poisson, up to the 20th-century mathematicians Borel and Cantelli and the Russians Chebyshev, Markov, Lyapunov, and Kolmogorov.
Thomas Bayes also deserves mention for his method of inverse probability, or the probability assigned to a hypothesis on the basis of evidence, which was communicated to the Royal Society posthumously by Price on December 23, 1763. Whereas direct probability leads from a known parameter to estimated distributions in a sample, inverse probability leads from observed frequencies to estimated distributions of parameters. In the context of Bayes’ rule, inverse probability is also called “probability of causes” because it enables an estimation of the probabilities of the causes underlying an observed event.
By providing a tool to combine inductive reasoning with probability, Bayes was ahead of his time, when induction and probability were considered separately, and induction was seen as a way to expand knowledge by drawing general conclusions from many observations aiming at certainty, not probability. Laplace was the first to grasp the import of Bayes’ result, which today is the cornerstone of inference by statisticians of the “Bayesian school.” The crucial issue for Bayes’ method is how to fix prior probabilities, such as the probability of a hypothesis calculated using background information, before additional evidence is gathered. To fix these “priors,” different theories of probability advocate different solutions, and the debate is open.
From the second half of the 18th century, probability grew rapidly. Work in the moral and political sciences in Condorcet’s “social mathematics” paved the way for probability’s use in the “sciences of man,” leading to study of statistical distributions. This progressed due to the work of Quetelet, Galton, Karl Pearson, Weldon, Gosset, Edgeworth, and others who shaped modern statistics by developing the analysis of correlation and regression, and methods for assessing statistical hypotheses against experimental data through significance tests. Other branches of modern statistics were started by Fisher, who created ANOVA and the likelihood method for comparing hypotheses on the basis of a body of data. Also important are Neyman and Egon Pearson, who extended the method of tests to a comparison between two alternative hypotheses.
In parallel, probability entered natural science not only to master errors of measurement but as a constituent of physics. Starting in 1827 with Robert Brown’s work on the motion of particles suspended in fluid, the use of probability to characterize complex physical phenomena progressed rapidly, leading to the kinetic theory of gases and thermodynamics developed by Maxwell, Boltzmann, and Gibbs. By 1905 to 1906, von Smoluchowski and Einstein brought the study of Brownian motion to completion. In the same years, the analysis of radiation led Einstein and others, such as Planck, Schrödinger, de Broglie, Dirac, Heisenberg, Born, and Bohr, to formulate quantum mechanics, making probability a basic ingredient of the description of matter.
Alongside mathematical probability, theories of probability’s meaning and philosophical implications emerged that privileged it as either epistemic (e.g., belief) or empirical (e.g., frequencies). Before surveying these theories, it is worth noting that probability’s mathematical properties hold independently of the interpretation attached to it. An interpretation of probability can be deemed adequate only if it satisfies the mathematics of probability functions, namely, the properties of the so-called probability calculus. This refers to the work of Andrej Kolmogorov, who in 1933 spelled out the mathematical properties of probability in axioms, separating the mathematical notion of probability and its interpretation. The probability calculus provides the tools for calculating posterior probabilities on the basis of prior ones, for instance, the probability of obtaining two 6s when throwing two dice, given that every face of the dice has a 1/6 probability, or the probability of seeing a king when drawing from a deck of 52 cards, knowing that every card has a 1/52 probability. The evaluation of priors, however, does not fall under the calculus of probabilities. For that, different schools adopt different methods, so that each interpretation has both a philosophical component and a method for evaluating priors.
Theories of Probability
The Classical Interpretation
The so-called classical interpretation was developed at the turn of the 19th century by Laplace, one of the greatest probabilists ever. Called the “Newton of France” for his work in physics, Laplace developed a philosophy of probability rooted in determinism, according to which the universe is ruled by a “principle of sufficient reason,” wherein all things are brought into existence by a cause. The human mind is incapable of grasping every connection of the causal network underlying phenomena, but one can imagine a superior intelligence able to do so. Using mathematical tools aided by probability, man can approach the all-comprehensive view of such an intelligence. Because the focus is on human knowledge, probability has an epistemic meaning, pertaining to knowledge rather than the stochastic nature of phenomena themselves.
Laplace holds that probability is “the ratio of the number of favourable cases to that of all possible cases” (1814/1995: 6). This statement, known as the classical definition, embodies Laplace’s canon for calculating priors, grounded on the assumption that all the alternatives open to a given phenomenon should be regarded as equally possible in the absence of information that would lead to believing otherwise. Then, to determine probabilities, equally possible cases are valued equally probable, leading to a uniform prior probability distribution—for instance, all outcomes of a dice play, or each possible effect of one variable on another, are considered equally probable. This assumption, known as the “principle of insufficient reason”—or “principle of indifference,” a term coined by J. M. Keynes—is the cornerstone of Laplace’s theory of probability.
Before applying his method, Laplace recommends assuring that some outcomes are not more likely to happen than others. If this assumption does not hold, “one must first determine their respective possibilities, the apposite appreciation of which is one of the most delicate points in the theory of chances” (Laplace, 1814/1995: 6). The determination of the different chances to be attributed, say, to the two sides of a biased coin would be a case for counting frequencies. Non-uniform prior distributions are allowed by Laplace, but they are regarded as unnecessary, because, as observed by probability historian Stigler, “the analysis for uniform prior distributions was already sufficiently general to encompass all cases, at least for the large sample problems Laplace had in mind” (1986: 135), because very large samples make priors mostly irrelevant.
An important part of Laplace’s theory is the analysis of the “probability of causes,” which he deals with in a Bayesian fashion by means of a method of inference called “Laplace’s rule,” later labeled by Venn as the rule of succession. In the case of two alternatives—the “occurrence” and “non-occurrence” of an event or an effect—this rule allows the probability of an event to be inferred from the fact that the same event has been observed to happen in some number of cases. In a general formulation, the rule says that if m is the number of observed positive cases, and n that of negative cases, the probability that the next case observed is positive equals (m + 1) / (m + n + 2). If no negative cases are observed, or if one wants to calculate the probability of an event after observing that it has happened “m times in a succession,” the formula reduces to (m + 1) / (m + 2). Laplace’s rule is based on the assumptions of equiprobability of priors (uniform distribution) and independence of trials, conditional on a given parameter—like the contents of a marble jar or the ratio of the number of favorable cases to that of all possible cases.
Laplace’s theory is philosophically obsolete mostly because the deterministic worldview underpinning it no longer holds. Also, the classical notion raised much discussion and criticism because of the difficulties it faces due to so-called Bertrand’s paradoxes, and partly on the account that it hinders learning from experience. 1 The authors who later worked on probabilistic inference in the tradition of Bayes and Laplace, including Johnson, Carnap, and de Finetti, opted for the weaker assumption of “conditional independence,” known as “exchangeability.” 2 That said, the classical definition of probability can be used in many applications, such as games of chance, where a set of possible outcomes is identifiable and assumptions of independence apply.
The Empirical Approach
Empirical approaches privilege probability’s meaning as a characteristic of phenomena, such as events or observed effects, rather than as a product of knowledge. Today, this is found in the frequency or “frequentist” theory of probability and the “propensity” theory.
The frequency theory
The frequency theory centers on the idea that probability can be analyzed with observed frequencies, wherein probability is defined as the limit of the relative frequency of a given attribute, as observed in the initial part of an indefinitely long sequence of repeatable events—for instance, the observations (or measurements) generated by an experiment that can be reproduced in identical conditions in an independent way. Although this theory of probability is associated with many social scientific tools, such as maximum likelihood estimation, p values, and confidence intervals developed by Ronald Fisher and Jerzey Neyman, this definition of probability rests on an idealization because infinite sequences of observations are impossible and must be estimated in practice. Also, the independence of outcomes and absolute similarity of the experimental conditions may be disputed as unobtainable in practice.
Started in the 19th century by Ellis and Venn, the frequency theory reached its climax with Richard von Mises (1883-1953), after whose work it became so popular with physicists and natural scientists as to become the official interpretation of probability in science. Further, in 1900 to 1930, the frequency notion of probability also pervaded mathematical treatments of probability, where limiting theorems on behaviors of relative frequencies were derived on the assumption of probabilistic independence (as when sampling randomly from a population). In the same period, statistics also burgeoned and methods for testing statistical models and estimating parameters were developed. With few exceptions—notably, the physicist Jeffreys—frequentism was widely accepted. What appealed to scientists was the “objective” flavor of this interpretation, which holds that “true” or correct probability values exist. Albeit unknown, the idea is that these values can be approached by estimates based on the frequencies observed in larger and larger samples.
A central feature of this approach is that once defined in terms of frequency, probability deals with mass phenomena—phenomena resulting from many elements or consisting of indefinitely repeatable events. Consequently, to speak of the probability of a single event, such as an organization’s success or observing an effect in an empirical study, makes sense only if one refers to the wider class to which such an event belongs. To refer to mass phenomena, von Mises coins the term collective, denoting “a sequence of uniform events or processes which differ by certain observable attributes, say colours, numbers, or anything else” (1928/1957: 12). To qualify as a collective, a sequence of events (observations) should be able to be prolonged indefinitely and exhibit frequencies tending to a limit. The distinctive feature of collectives is randomness, a feature von Mises defines operationally as “insensitivity to place selection,” where place selection means random sampling. Accordingly, randomness obtains when the limiting values of the relative frequencies in a collective are not affected by any of all the possible selections that can be performed on it. Also, the limiting values of relative frequencies observed in the subsequences (samples) obtained by place selection equal those of the original sequence. Von Mises also names this randomness condition the “principle of the impossibility of a gambling system” because it reflects the impossibility of devising a system leading to a certain win.
After defining the notion of a collective, von Mises restates the theory of probability in terms of collectives, by using operations of selection, mixing, partition, and combination. The use of this conceptual machinery is meant to allow von Mises to lay a foundation for an empirical notion of probability that can be operationally reduced to a measurable quantity. The obvious objection to the operational character of this theory is that it uses infinite sequences, which can never be obtained. Von Mises’ answer is that probability as an idealized limit can be compared to other limits encountered in science, such as velocity or density. Also, in connection with the problem of applicability due to the need to find a connection between the sequences of observations, which are finite, and the infinite sequences postulated by the theory, he claims that “the results of a theory based on the notion of the infinite collective can be applied to finite sequences of observations in a way which is not logically definable, but is nevertheless sufficiently exact in practice” (von Mises, 1928/1957: 85; see also von Mises, 1939/1951).
It is worth emphasizing that the existence of a collective for von Mises is a necessary condition for probability because without it, no meaningful probability assignment is possible. The result is that it makes no sense to talk of the probability of single occurrences of events, such as the probability of an organization experiencing success, because probability applies only to a theoretically infinite number of such events. This causes frequentism’s single-case problem.
An attempt to solve it is made by Reichenbach, who develops a more flexible version of frequentism. Reichenbach begins with a weaker concept of randomness relative to a restricted domain of selections “not defined by mathematical rules, but by reference to physical (or psychological) occurrences” (1935/1971: 150). This approach bears some similarity to the notion—familiar to statisticians—of “pseudo-randomness,” which is used by researchers when randomly sampling by using tables of random numbers or random number generators. Use of such methods guarantees that the samples obtained are random for all practical purposes, although someone knowing how to generate the sequence of random numbers itself would be able to predict it. In Reichenbach’s words, random sequences are characterized by the peculiarity that a person who does not know the attribute of the elements is unable to construct a mathematical selection by which he would, on an average, select more hits that would correspond to the frequency of the major sequence. . . . This might be called a psychological randomness. (1935/1971: 150)
Also, Reichenbach introduces the idea of a practical limit “for sequences that, in dimensions accessible to human observation, converge sufficiently and remain within the interval of convergence” and adds that “it is with sequences having a practical limit that all actual statistics are concerned” (1935/1971: 347-348).
Reichenbach regards a probability evaluation as a posit—“a statement with which we deal as true, although the truth value is unknown” (1935/1971: 373). The posit tries to solve the single-case problem by connecting the probability of a sequence to a single case. The idea is that a posit regarding a single event (e.g., observing a positive effect in an empirical study) receives a weight from the probabilities attached to the reference class to which the event is assigned (e.g., a population), which must be homogeneous. A homogeneous reference class includes as many cases as possible similar to the one under consideration and excludes dissimilar ones. Similarity should be intended relative to any relevant properties, and homogeneity is obtained by successive partitions of the population based on relevant properties. Once a reference class cannot be further partitioned, it is homogeneous. For example, to assign a weight to the probability that some company will go out of business in the next year, the reference class should be chosen to include all relevant properties, like the size, industry, performance, and so on. The probability of going out of business is then determined on the basis of observed relative frequencies, giving the weight assigned to a company under study closing in a year. In Reichenbach’s words, “a weight is what a degree of probability becomes if it is applied to a single case” (1938: 314).
Although Reichenbach’s theory allows single-case probability attributions within the frequency framework, the proposed solution faces a reference class problem, caused by the fact that one can never be sure that all relevant properties and cases with respect to a phenomenon are taken into account and included in the reference class. Identifying the proper reference class raises problems to which no single solution can be given. Only the context in which a study is being conducted can suggest where to stop the search for relevant properties to be included in the reference class, so that a decision depends on context-dependent factors, including the purpose of a study and the use to which a probability will be put, such as explanation, prediction, or control.
Reichenbach’s distinction between primitive and advanced knowledge is worth recalling. Advanced knowledge exists when prior probabilities are available; primitive knowledge exists when priors are unknown. Reichenbach calls prior probabilities anticipative, or blind, posits, calculated as limiting frequencies according to the frequentist canon. When knowledge of priors is available, one is in a state of advanced knowledge, and appraised posits can be obtained by applying the probability calculus. Bayes’ rule plays a privileged role in advanced knowledge, regarded by Reichenbach as the proper tool for confirming scientific hypotheses, making him an objective Bayesian who holds that priors should be determined on the basis of frequencies alone.
Part of Reichenbach’s Bayesianism is a conception of knowledge acquisition as a self-correcting procedure that starts with blind posits and goes on to devise appraised posits that become part of a complex system. For Reichenbach, scientific knowledge is the result of a continuous interplay among experiencing frequencies and predicting probabilities, made possible by the method of posits—the fundamental ingredient of scientific method. He notes, By means of the inductive rule we set up posits concerning the limit of the frequency in a sequence and thus establish probability values. The probabilities so constructed can be used as the weights of certain other posits; we are thus able to construct appraised posits by means of anticipative posits. (1935/1971: 461)
The method of posits represents the core of induction, and has a self-correcting character, which is responsible for “the overwhelming success of scientific method” (Reichenbach, 1938: 364). Given the fundamental role Reichenbach gave to Bayesian updating, for him, Bayes’ rule using priors calculated with frequencies is the cornerstone on which all of science rests. 3
The propensity theory
The problem of single-case probabilities in quantum mechanics is the origin of the propensity theory of probability given in the 1950s by Popper, who resumed it in the 1980s to account for various causal tendencies in the world. In quantum mechanics, the single-case problem caused by a frequency theory is critical “because the ψ-function determines the probability of a single electron to take up a certain state, under certain conditions” (Popper, 1957: 66). Popper’s idea is that probability is a property of an experimental setup, or the generating conditions of experiments, liable to be reproduced to form a sequence. Here, probability is not a property of sequences—as in a frequency theory—rather, it is a disposition of generating conditions themselves. As Popper (1959: 37) notes, this modification of the frequency interpretation leads almost inevitably to the conjecture that probabilities are dispositional properties of these conditions—that is to say, propensities. This allows us to interpret the probability of a singular event as a property of the singular event itself.
This interpretation does not associate single events with particular objects, like particles or dice, but rather with the experimental setups that define experiments: Every experimental arrangement is liable to produce, if we repeat the experiment very often, a sequence with frequencies which depend upon this particular experimental arrangement. These virtual frequencies . . . characterize the disposition, or the propensity, of the experimental arrangement to give rise to certain characteristic frequencies when the experiment is often repeated. (Popper, 1959: 67).
4
Popper regards propensities as physically real albeit non-observable, and he believes his interpretation is very strongly objective. Propensities are the object of a “new physical (or perhaps metaphysical) hypothesis” similar to Newtonian forces (Popper, 1983: 360). When referring to mass phenomena or to repeated experiments, propensities can be measured by means of observed frequencies. Otherwise, they are estimated “speculatively” (Popper, 1990: 17). In all cases, statements about propensities are hypotheses that must be testable. To guarantee testability, Popper distinguishes between probability statements expressing propensities and statistical statements, claiming that probability statements express conjectured frequencies pertaining to virtual sequences of experiments, whereas statistical statements express relative frequencies observed in actual sequences of experiments. The idea here is that probability statements expressing propensities can be tested by means of statistical statements, by comparing the conjectured frequencies against those observed in sequences of experiments that have been performed.
Popper regards the propensity theory as both the solution to the single-case problem and the basis of an objective view of probability apt to underpin an indeterministic conception of the world. Propensities are endowed with an indeterministic character meant to embrace probabilistic tendencies of all kinds, from physics and biology to the motives of human action.
Popper’s propensity theory strongly influenced philosophers of science. Versions of it are embraced by many contemporary authors, including Mellor, Giere, Miller, and Gillies. 5 However, the propensity theory faces severe difficulties, including a reference class problem akin to that affecting frequentism. In fact, though single-case attributions made in the frequency framework require identifying the reference class containing all relevant properties, propensity attributions need to be based on the complete description of the surrounding conditions, such as the complete context and procedures used for an actual study. Therefore the problem of identifying a complete set of information still exists. For instance, Miller emphasizes that single-case propensity attributions should depend on the complete description of the state of the universe at the time a given event takes place, or as he puts it, “the probability of an outcome is . . . relative only to the unique situation of the world (or the causally operative part of the world) at the time” (1994: 183).
A further problem affecting the propensity theory amounts to its unsuitability to interpret inverse probabilities. This was noted by Humphreys (1985), who observed that the dispositional character of propensities, defined as tendencies to produce certain outcomes, gives them an asymmetry that goes in the opposite direction from that characterizing inverse probability, with the consequence that Bayes’ rule is inapplicable to propensities. While authors have attempted to circumvent this problem, others, like Salmon (1979), adopt the notion of propensity to represent probabilistic causal tendencies rather than probabilities. Still others, like Suppes (2002: chap. 5), claim that although propensities do not express probabilities, they can play a useful role in the description of certain phenomena, conferring an objective meaning on the probabilities involved.
The Epistemic Approach
Like the classical theory, the modern epistemic approach privileges the epistemic meaning of probability and regards it as pertaining to our knowledge of facts, not to the facts themselves. Two main trends take this approach: the logical and the subjective interpretations.
The logical interpretation
This interpretation sees probability theory as a part of logic, so that probability is a logical relation between two propositions, one of which describes a given body of evidence while the other states a hypothesis, such as “high performance work systems increase organizational performance.” Attaching a logical character to probability is intended to make probability objective. A strong connection is established between probability’s logical and rational character, so probability theory is seen as a theory of reasonable degrees of belief.
Supporters of the logical view include De Morgan, Boole, Jevons, and Keynes. This view reached its climax with the philosopher of science and logician Carnap, who developed probability as the object of inductive logic—the logic of confirmation—conceived as a formalized axiomatic system. Inductive logic applies to measures of confirmation defined on the semantic content of statements and is meant as solid ground for the best probability estimates based on evidence, thus providing the ideal basis for rational decision as opposed to actual decisions. 6
Part of Carnap’s point, and a critical feature of logicism, is the idea that in light of the same evidence, there is only one rational—or correct—probability assignment. To safeguard the goodness of probability evaluations, Carnap imposes on inductive logic the requirement of total evidence, according to which “in the application of inductive logic to a given knowledge situation, the total evidence available must be taken as a basis for determining the degree of confirmation” (Carnap, 1950/1967: 211), which is to say that all relevant information must be used when estimating probabilities. This requirement raises problems because in practice one can never be certain of having taken into account all relevant evidence, making it similar to the reference class homogeneity problem arising in Reichenbach’s frequentism and the equally problematic need to base propensity attributions on a complete description of the generating conditions or the state of the universe surrounding the event of interest.
To use logical probabilities requires working with “confirmation functions” formalized within inductive logic, which belong to the broader family of Bayesian methods. Carnap defines a “continuum of inductive methods” characterized by a blend of a priori and empirical components (see Carnap, 1952). At one end of the continuum stands the frequentist canon that Carnap calls straight rule, which calculates priors based on observed frequencies, while at the other end of the continuum is a function that Carnap calls c+, corresponding to the classical (Laplacean) method, according to which priors depend on the assignment of equal probability to all possible cases. Somewhere in the middle lies the function c*, having the property of exchangeability. Events belonging to a sequence of observations are exchangeable if the probability of h successes in n events is the same, for whatever permutation of the n events and for every n and h ≤ n. In other words, the notion of exchangeability means that the locations of the successes in a sequence make no difference to the evaluation of probability, for instance, when calculating the probability that the next case to be observed will be a success. The function c* is regarded as optimal because it allows learning from experience faster than the stronger assumption of independence whenever available information is scant; obviously, as the bulk of information becomes greater, estimates based on exchangeability tend to converge with those based on independence. Carnap believes that by adopting c*, all inferential methods can be reformulated in inductive logic, including inference from an observed to an unobserved sample, the prediction of a single event, inference by analogy, and estimating parameters in statistical models.
Carnap notes the role of inductive logic as a logic of decision, but he emphasizes that it is a theory of rational decisions. In other words, inductive logic intends to give a system of rules that form the gist of a normative decision theory designed to advise “human beings in their effort to make their decisions as rational as their limited abilities permit” (Carnap, 1971: 17). The normative character of inductive logic sets Carnap apart from a subjective interpretation of probability, which represents a descriptive approach to decision theory and probability.
A merit of Carnap’s formalization of probability in inductive logic is that it clarifies the assumptions as well as the rules of probabilistic inferences, which are fully explicit once they are axiomatized. Unfortunately, Carnap’s awkward formalism makes inductive logic unpalatable to statisticians and scientists, with the drawback that it has hardly gone beyond the restricted circle of philosophers of science. Yet, discussion of Carnap’s methods mingles with that on Bayesian confirmation—mainstream in literature on probabilistic confirmation—and authors, including Jeffrey and Skyrms, opt for a more eclectic approach, closer to subjective Bayesianism. 7
A slightly different version of logicism is due to the physicist Jeffreys. Jeffreys embraced Bayesianism in his work as a scientist active in fields like seismology and meteorology, where massive data were not available and where one typically had to tackle problems of inverse probability, explaining experimental data with different hypotheses or evaluating general hypotheses in light of changing data. To assign prior probabilities to general hypotheses, Jeffreys, in collaboration with Wrinch, proposed a method based on the assumption that all quantitative laws form an enumerable set and their probabilities form a convergent series, formulating a “simplicity postulate” according to which simpler laws should be assigned a greater prior probability. 8
Jeffreys retains an epistemic notion of probability taken to express a reasonable degree of belief, claiming that given a set of data, the value of probability is uniquely determined. Jeffreys’ conviction that a satisfactory theory has to account for “the existence of unique reasonable degrees of belief” (Jeffreys, 1939: 36) puts him in line with the logical interpretation of probability because it involves a normative aspect, stressing the rationality of degrees of belief. 9 This gives probability an objective character, which, according to Jeffreys, is necessary for its use in science. However, the idea that there are unknown probabilities is alien to Jeffreys’ thought, as is the view that probabilities express properties of the world. Indeed, Jeffreys reverses such a view by claiming that probability comes before notions of objectivity, reality, causality, and external world, in the sense that these are established by inference from experience. A notion of reality obtains when some scientific hypotheses receive a probability so high that on their basis one can draw inferences whose probabilities are practically the same as if the hypotheses were certain. Hypotheses of this kind are taken as certain in the sense that all their parameters “acquire a permanent status.” In such cases, we can assert the associations expressed by the hypotheses in question “as an approximate rule” (Jeffreys, 1937: 69). This thinking brings Jeffreys close to subjectivism, as does his conviction that science is fallible and the admission that empirical information can be “vague and half-forgotten” (Jeffreys, 1931/1973: 406), anticipating literature of a subjective Bayesian inspiration, as represented by authors like Jeffrey, Skyrms, and many others. 10
The subjective interpretation
The subjective view centers on the tenet that probability is the degree of belief entertained by a person regarding the occurrence of an uncertain event on the basis of available information, such as the degree of belief in a positive effect of one variable on another, given observed data. The notion of a “degree of belief” is assumed and needs to be associated with an operational definition to measure degrees of belief as prior probabilities, meaning that there must be a scheme to elicit degrees of belief in the form of probabilities. A long-standing method to do this is the betting scheme, according to which one’s degree of belief is represented by the odds at which one would be ready to bet on the occurrence of an event. In other words, the probability of an event is valued equal to the price to be paid by a player to obtain a gain in case the event occurs. This method, which dates back to the 17th century, raises problems, such as that of the diminishing marginal utility of money and the fact that different people take different attitudes toward betting, depending on their aversion or propensity to risk. A number of alternative methods have been devised to avoid these shortcomings.
Central to a subjective theory is the notion of coherence in assigning probability to competing hypotheses or events. Using the terms of betting, coherence ensures that if degrees of belief are used as the basis of betting ratios, they should be such as to avoid sure loss/gain. This leads to what is known in the literature as the Dutch book argument. As an example of a Dutch book, take people betting at odds 4:1 on the occurrence of a random event and at odds 2:3 on the non-occurrence of the same event. This leads to a sure loss, because (a) if the event occurs, they win the first bet but lose the second; namely, they win 1 but pay 2; (b) if the event does not occur, they win the second bet but lose the first; namely, they win 3 but pay 4. Given that betting at odds 4:1 reflects a degree of belief of 4/5 that the event in question will occur, a coherent person should be ready to bet at odds 1:4 on its non-occurrence, reflecting a degree of belief of 1/5.
The British philosopher Ramsey (1903-1930)—who did not adopt the betting machinery but a more general notion of preference—was the first to state in a famous paper written in 1926 called “Truth and Probability” that coherent degrees of belief satisfy the laws of probability. This makes coherence the only condition that needs to be imposed on degrees of belief. A consequence is that the laws of probability “do not depend for their meaning on any degree of belief in a proposition being uniquely determined as the rational one” (Ramsey, 1990: 78). In other words, coherence is the only condition that degrees of belief need obey: If degrees of belief are coherent, there is no further demand of rationality to be met, and it is perfectly admissible that two people on the basis of a given body of information produce different evaluations of the same hypothesis.
Working independently but in the same years as Ramsey, de Finetti (1906-1985) offered a similar definition of probability, moving toward a mature subjectivism using a “representation theorem” that showed how using Bayes’ method, in conjunction with the property of exchangeability, leads to convergence in degrees of belief and frequencies. This makes subjective probability applicable to statistical inference, which, according to de Finetti, can be entirely based on it. Being a strong Bayesian, de Finetti regards the shift from prior to posterior—or, as he said, from initial to final—probabilities as the basis of statistical inference. This shift is given a subjective interpretation in the sense that going from priors to posteriors always implies personal judgment.
Yet, for de Finetti, updating belief with new evidence does not mean changing opinion: If we reason according to Bayes’ theorem we do not change our opinion. We keep the same opinion and we update it to the new situation. If yesterday I said “Today is Wednesday” today I say “Today is Thursday”. Yet, I have not changed my mind, for the day following Wednesday is indeed Thursday. (de Finetti, 1995/2008: 43)
The idea of correcting a previous judgment does not belong to this view, nor does the notion of a self-correcting procedure dear to Reichenbach. Also, there are no correct or rational probabilities: “The subjective theory . . . does not contend that the opinions about probability are uniquely determined and justifiable. Probability does not correspond to a self-proclaimed ‘rational’ belief, but to the effective personal belief of anyone” (de Finetti, 1951: 218). De Finetti’s view here sharply contrasts with logicism, which holds that there is only one rational—or correct—probability assignment given a body of evidence.
Regarding the operational definition of probability, de Finetti is pluralistic, allowing different methods. In the 1930s, he introduced a qualitative definition based on the relation “at least as probable as” in addition to the betting scheme, and from the 1960s on, he used a definition based on penalty methods, such as scoring rules, which also improve probability evaluations.
De Finetti’s theory of probability as subjective degree of belief is part of a broader framework that Jeffrey labeled “radical probabilism,” which regards scientific knowledge as a product of human activity, ruled by probability as degree of belief rather than truth or objectivity. 11 The basis of radical probabilism is the subjective notion of probability, which de Finetti deems preferable to all other notions and apt to cover all uses of probability in science and everyday life. This attitude, rooted in a pragmatist philosophy, goes hand in hand with a criticism of the tenet that probability is an objective notion and that there are unknown probabilities. Indicating his commitment to degrees of belief, de Finetti claimed that “probability does not exist.” This claim, epitomizing de Finetti’s anti-metaphysical philosophy, fostered the conviction that subjectivism represents some sort of “anything-goes” approach. Far from embracing an anarchist approach wherein probability can take any value provided that coherence is satisfied, de Finetti took the problem of objectivity of probability evaluations very seriously and gave it an important contribution, partly in collaboration with Savage. 12 The approach adopted is based on penalty methods as in Brier’s rule (Brier, 1950). Scoring rules like Brier’s are devised to oblige those who make probability evaluations as accurate as possible and to be honest. Such rules provide a tool for improving the probability evaluations of single agents and of groups, because they can be used as methods for enhancing “self-control” as well as a “comparative control” over probability evaluations (de Finetti, 1980: 1151). Scoring rules are the object of great attention by Bayesians who widely discuss “well-calibrated” estimation methods. 13
What de Finetti struggles with is objectivism—the idea that probability depends entirely on some aspects of “external reality,” not objectivity. Opposing a tendency to identify objectivity with objectivism, de Finetti sees subjectivism as the only way to responsibly address the issue of objectivity, aware of how evaluating probability is a complex procedure that needs objective and subjective elements to be taken into account. As he notes, “every probability evaluation depends on two components: (1) the objective component, consisting of the evidence of known data and facts; and (2) the subjective component, consisting of the opinion concerning unknown facts based on known evidence” (de Finetti, 1974: 7). While reaffirming that factual information is by all means the main ingredient of probability evaluation, de Finetti recommends bearing in mind that it is context dependent. Evidence must be collected carefully and skillfully, and the use one can make of it depends on the judgment of what elements are relevant to a problem being considered and should be taken into account for evaluating related probabilities. Furthermore, the collection and use of evidence is constrained by economic and practical considerations that depend on a situation. In some cases, especially with scant available information, the level of expertise of evaluators who are trying to determine probabilities also matters. All of these elements form the subjective component of probability evaluation, a component whose explicit recognition is for de Finetti a prerequisite for the appraisal of objective elements—an explicit rejection of an “anything-goes” approach to probability as degree of belief. In this regard, he claims that taking into account the subjective elements of probability judgments will not “destroy the objective elements nor put them aside, but bring forth the implications that originate only after the conjunction of both objective and subjective elements at our disposal” (de Finetti, 1973: 366). In sum, de Finetti sees evaluating probability as a complex process that must be contextualized and cannot be entrusted to one rule or method—be it based on frequencies or a priori considerations regarding the equiprobability of possible alternatives. These are important ingredients of probability evaluations that cannot be ignored, whenever available, provided that they are not used uncritically as automatic rules and simply equated with probability, as often happens with inference under uncertainty, such as with “rules of thumb” that involve, for example, p values at some cutoff level, almost always 0.05.
What Theory of Probability for Organization Science?
Different probability theories typify different ways to treat uncertainty, with implications for scientific inference. With its empirical character, the frequency theory is considered by many the natural candidate for the natural sciences. It underlies most quantitative tools in the social sciences and is very popular with physicists, who accept it in spite of the fact that it clashes with the use of single-case probabilities in quantum mechanics. The propensity theory, put forward by Popper to solve that problem, meets with consensus on the part of philosophers but not scientists. Logicism does not fare much better, as it has mostly remained in philosophical circles, with some exceptions, including Jeffreys and a few others. Subjectivism enjoys wide consensus among social scientists and statisticians, but it retains a halo of arbitrariness that makes it unpalatable to many, not just natural scientists. Forensic scientists, for instance, are suspicious of subjective probability and often turn to logical probability, presumably reassured by the promise of objectivity conveyed by the term logical. Such controversy is far from settled.
Based on the twofold meaning of probability, theories of probability are divided into two camps, being empirical or epistemic. Following de Finetti, the line could also be drawn between theories that juxtapose the definition and the evaluation of probability and those that do not. The first category includes classical, logical, frequency, and propensity views—all interpretations sharing the idea that there are true probability values, uniquely determined by evidence. Such theories take a “rigid” attitude that “consists in defining (by whichever way and whichever concept) the probability of an event, thus univocally determining a function” (de Finetti 1933/1992: 348). 14 Alternatively, a subjective theory separates the definition and evaluation of probability, taking an “elastic” attitude that is not committing a particular function to a single rule or method. An elastic attitude for de Finetti means including in a probability evaluation, regarded as a complex and largely context-dependent procedure, both empirical objective elements and subjective considerations. In the awareness that objective elements, taken by themselves, are neither necessary nor sufficient to guarantee objectivity, he exhorts judgments be grounded on a “deep analysis of problems” (de Finetti, 1962: 367) addressing all sorts of contextual elements. For him, this is the only way a viable notion of objectivity is possible. De Finetti’s pluralism and his stress on context offer a clue to what theory of probability is best suited for the organization sciences.
Organization science is hugely pluralistic, combining many perspectives, methodologies, and approaches. This is emphasized by, among others, Van de Ven and Johnson, who write, “We take a pluralistic view of science and practice as representing distinct kinds of knowledge that can provide complementary insights for understanding reality” (2006: 808). Attention is called to the need to combine theory and practice and to take into account contextual elements to develop knowledge apt to be shared by a community of researchers. The same authors take a pluralistic attitude toward objectivity, wherein “a pluralistic approach of comparing multiple models of reality is therefore essential for developing objective scientific knowledge” (Van de Ven & Johnson, 2006: 807).
A plea for pluralism is also made in well-known work by Morgan and Smircich, who claim that organization research should profit from quantitative—objective—methods and from qualitative—subjective—inquiry and call attention to the role played by context in understanding social systems. This epistemological position “stresses the importance of monitoring process, the manner in which a phenomenon changes over time in relation to its context” (Morgan & Smircich, 1980: 496). In an attempt to proceed further in the direction of Morgan and Smircich, Cunliffe reaffirms a pluralistic viewpoint wherein “insights from objectivist- and subjectivist-based, statistical and narrative methods can help create a fuller understanding of organizational practices” (2011: 666).
Such a view of epistemology for use in organization science naturally fits the pluralism and contextualism of subjective probability. Specifically, the view of objectivity by organization scientists seems to require the blend of objective and subjective factors that, according to subjectivists, enter into the evaluation of probability. Although organization researchers do not specifically refer to probability, the kind of epistemology they support can—or rather, should—be extended to probability, which is an essential part of knowledge, be it intended for description, explanation, prediction, or control. As long as no one is led astray by the undeserved flavor of arbitrariness that is too often erroneously associated with a subjective interpretation of probability, there is no doubt that it is the natural candidate for organization science.
