Abstract
P values have been critiqued on several grounds but remain entrenched as the dominant inferential method in the empirical sciences. In this article, we elaborate on the fact that in many statistical models, the one-sided P value has a direct Bayesian interpretation as the approximate posterior mass for values lower than zero. The connection between the one-sided P value and posterior probability mass reveals three insights: (1) P values can be interpreted as Bayesian tests of direction, to be used only when the null hypothesis is known from the outset to be false; (2) as a measure of evidence, P values are biased against a point null hypothesis; and (3) with N fixed and effect size variable, there is an approximately linear relation between P values and Bayesian point null hypothesis tests.
Across the empirical sciences—be it in medicine, biology, neuroscience, economics, sociology, or psychology—the classical P value is arguably the single most influential concept for statistical inference. Scientific claims about the presence of hypothesized effects are judged fit for publication only when the associated statistical tests yield
The P value detractors usually do not mince words. For instance, Edwards (1965, p. 400) argued that “classical significance tests are violently biased against the null hypothesis.” Berger and Delampady (1987, p. 330) stated that “when testing precise hypotheses, formal use of P-values should be abandoned. Almost anything will give a better indication of the evidence provided by the data against H0.” Meehl (1978) claimed that
the almost universal reliance on merely refuting the null hypothesis as the standard method for corroborating substantive theories in the soft areas is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology. (p. 817)
Rozeboom (1997, p. 335) echoed this statement when he called P value significance testing “surely the most bone-headedly misguided procedure ever institutionalized in the rote training of science students.”
Undeterred by such verbal onslaughts, some researchers believe that the critiques against P values are overstated or misplaced. For instance, Wainer (1999, p. 212) feels “a little at a loss to understand fully the vehemence and vindictiveness” of the P value critiques; Hagen (1997, p. 22) praises the logic of P value hypothesis testing, calling it “elegant” and “extraordinarily creative”; and Leek and Peng (2015, p. 612) point out that “arguing about the p value is like focusing on a single misspelling, rather than on the faulty logic of a sentence,” and recommend that statisticians “need to stop arguing about P values.”
In this article, we continue to argue over P values. We depart by outlining a well-known Bayesian interpretation of the one-sided P value, and then sketch three immediate consequences. By doing so we hope to increase the field’s awareness of what P values are and what they are not (Schervish, 1996).
Point of Departure: A Bayesian Interpretation of the One-Sided P Value
The Bayesian interpretation of the one-sided P value has a long and ongoing history (e.g., Berger & Mortera, 1999; Casella & Berger, 1987; Greenland & Poole, 2013; Jeffreys, 1961; Lee, 2012; Lindley, 1965; Marin & Robert, 2007; Morey & Wagenmakers, 2014; Pratt, 1965; Pratt, Raiffa, & Schlaifer, 1995; Rouanet, 1996). The main result may be summarized as follows. Consider Bayesian parameter estimation for the location parameter
Thus, for the classical statistician the one-sided P value represents the outcome of a significance test that assumes the null hypothesis is true, whereas for the Bayesian statistician the one-sided P value can be obtained from an estimation procedure (i.e., posterior updating of
Furthermore, in this specific case the Bayesian estimation outcome is directly related to a Bayesian test for direction, one in which we contrast
where
As mentioned above, the relationship is exact for location parameters in models from the exponential family when these parameters are assigned uniform priors; for other parameters and prior distributions the relationship is approximate (e.g., Casella & Berger, 1987; Greenland & Poole, 2013; for a critique, see Gelman, 2013). In what follows we explore three consequences and insights afforded by the Bayesian interpretation of the one-sided P value.
First Consequence: P Values Are Meaningful Only When the Null Hypothesis Is False
The Bayesian interpretation of the one-sided P value is that it is a test for direction, as the logit of the one-sided P value equals the log of the Bayes factor that contrasts
The interpretation of a one-sided P value as a test for direction—not as a test for the null hypothesis—is relevant because a common critique against the use of P values is that the null hypothesis is nearly always false. For instance, D. H. Johnson (1999) complains,
P is calculated under the assumption that the null hypothesis is true. Most null hypotheses tested, however, state that some parameter equals zero, or that some set of parameters are all equal. These hypotheses, called point null hypotheses, are almost invariably known to be false before any data are collected. (p. 764)
The same sentiment was expressed by Cohen (1990):
A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world. It can only be true in the bowels of a computer processor running a Monte Carlo study (and even then a stray electron may make it false). If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it? (p. 1308)
From a Bayesian perspective, however, the one-sided P value is not a test that involves the null hypothesis at all—instead, it is a test for the direction of an effect, suitable exactly for those scenarios where D. H. Johnson (1999) and Cohen (1990) argued it is meaningless. Note that in the Bayesian interpretation, collecting a large enough sample does not confirm the obvious; instead, what will be confirmed is the true direction of the effect. Paradoxically, the threat to the validity of the Bayesian interpretation of the one-sided P value is not that the null hypothesis is false, but that the null hypothesis is true. For when the null is exactly true, the test is between two directional models that are both equally wrong: The truth is literally in the middle (see also Sanborn & Hills, 2014; but see Rouder, 2014).
In sum, from a Bayesian perspective the one-sided P value represents a test for direction, a test that is valid only when the null hypothesis is false. For readers familiar with the popular argument against P values (i.e., “the null is never true”) this line of argumentation may come as a surprise.
Second Consequence: P Values Are Biased Against
As alluded to earlier, several statisticians have remarked that P values overestimate the evidence against a point null hypothesis (e.g., Berger & Delampady, 1987; Dickey, 1977; Edwards, Lindman, & Savage, 1963; V. E. Johnson, 2013; Sellke, Bayarri, & Berger, 2001). The relation expressed in Equation 2 allows us to bypass mathematical details and present an intuitive argument: the one-sided P value corresponds to a Bayesian test for direction, in which
For example, consider a match between two avid Rummikub players. After six games, Player A is leading Player B by 4-2. If the choice is between
In sum, tests for direction are easier than tests for existence: when applied to the same data, tests for direction are more diagnostic than tests for existence. From a Bayesian perspective, the one-sided P value is a test for direction; when this test is misinterpreted as a test for existence—as classical statisticians are wont to do—this overstates the true evidence that the data provide against a point null hypothesis.
Third Consequence: With N Fixed, the Relation Between P Values and Bayesian Point Null Hypothesis Tests Is Approximately Linear
Several authors have explored the lawlike relationship between the classical P value and the Bayes factor against a point null hypothesis (e.g., Rouder, Morey, Speckman, & Province, 2012; Wetzels et al., 2011). Specifically, when sample size N is relatively stable and only effect size varies, lower P values will be accompanied by higher Bayes factors against the point null hypothesis. Figure 1 shows the empirical relation for

The highly regular relationship between one-sided P values and point null Bayes factor hypothesis tests for 440 t test results reported by Wetzels et al. (2011) and reanalyzed by Rouder et al. (2012).
We now formalize the relation between P values and Bayes factors for point null hypotheses by exploiting two facts. The first fact is that the one-sided P value is the posterior mass to the left of zero (i.e., Equation 1). The second fact is that the Bayes factor hypothesis test for a point null hypothesis
In words, the Bayes factor in favor of the null hypothesis
We examine the following simplified scenario. The prior for the location parameter

Prior and posterior distribution for a hypothetical data set. The shaded area of the posterior distribution indicates the mass that is lower than zero, whereas the two dots visualize the Savage–Dickey density ratio. As the posterior distribution shifts to the right, the shaded area and the posterior ordinate at
The nature of these simultaneous changes is shown in Figure 3 for values of

Lawlike relation between the one-sided P value and the point null Bayes factor
In this demonstration, the lower end-point corresponds to a value of
An interesting observation about the relations shown in Figure 3 is that they are invariant across different choices of N and the choice of prior variance for the location parameter
In sum, for a fixed value of N there exists a lawlike relation between the (approximate) one-sided P value and the Bayes factor for a point-null hypothesis. This relation implies that one can traverse from the one-sided P value to the Bayes factor and vice versa. Assuming that the relation between
Concluding Comments
We have demonstrated that one-sided P values can be given a Bayesian interpretation as an approximate test of direction, that is, a test of whether a latent effect is negative or positive. From a Bayesian perspective, this means that P values may be used when the null hypothesis is false or when its veracity is not at issue (and when a diffuse, symmetric prior on the location parameter is acceptable). When misinterpreted as tests of existence, P values overestimate the evidence against the null hypothesis, as a test for direction is generally easier than a test for existence. Finally, with N fixed and effect size variable, P values and point null Bayesian hypothesis tests are approximately linearly related on the log-scale. This latter finding may falsely suggest that tests for direction and tests for existence are closely related. Although we have demonstrated this to be the case for N fixed, the situation changes if N is variable (e.g., Cano, Carazo, & Salmerón, 2013; Girón, Martínez, Moreno, & Torres, 2006). With N variable, sharp conflicts between test of direction and tests of existence are unavoidable, a phenomenon known as Lindley’s paradox (Lindley, 1957). Consider the scenario shown in Figure 2 and imagine that more data are collected, causing the posterior distribution to become more peaked. At the same time, imagine that the posterior mean moves toward zero such that the posterior area lower than zero remains constant; when this happens the posterior ordinate will increase and this strengthens the evidence in favor of the point null hypothesis. Thus, as N increases and the posterior area lower than zero remains constant, the evidence in favor of the point null hypothesis increases indefinitely. This means that in a test for direction, one may be relatively certain that the effect is positive rather than negative; for the same data, a test for existence may reveal that the null hypothesis is much stronger supported than the alternative hypothesis. Of course, the paradox seizes to feel like a paradox as soon as it is properly understood. In the foreword to his monograph Theory of Probability, Jeffreys already underscores the main point:
The most beneficial result that I can hope for as a consequence of this work is that more attention will be paid to the precise statement of the alternatives involved in the questions asked. It is sometimes considered a paradox that the answer depends not only on the observations but on the question; it should be a platitude. (p. x)
The Bayesian interpretation of the one-sided P value presents a double-edged sword. On the one hand, researchers can feel more confident in their use of the one-sided P value; after all, it has a Bayesian interpretation and it is valid when the null hypothesis is false (and when a diffuse, symmetric prior on the location parameter is acceptable). On the other hand, it is clear that the Bayesian interpretation of the one-sided P value presents a test of direction, not a test of existence. Despite the fact that many statisticians and methodologists have argued that tests of direction are more meaningful than tests of existence, we are not convinced that their arguments resonate with medical researchers, geneticists, experimental psychologists, and researchers in similar fields where general laws and invariances are regularly tested by means of empirical investigations.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the ERC Grant “Bayes or Bust!” from the European Research Council.
