Abstract
In a 1974 commencement address, Richard Feynman described scientific integrity as a kind of utter honesty, a kind of leaning over backwards to tell the whole truth. We argue that investigators could tell more of the truth and increase the value of their papers by highlighting and discussing unexplained variation, a major source of which is individual differences. An argument that unexplained individual differences must have many sources is presented, and means of representing that variation are illustrated. We believe that such a change in reporting of research results is likely to advance the progress of scientific psychology, but perhaps the most compelling argument for what we propose is simply that telling the whole story as fully as possible is good scientific practice. The Appendix provides two examples of what we are urging, taken from recent psychological literature.
In his 1974 commencement address at the California Institute of Technology, Richard Feynman described scientific integrity as a principle of scientific thought that corresponds to a kind of utter honesty—a kind of leaning over backwards. For example, if you’re doing an experiment, you should report everything that you think might make it invalid—not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked—to make sure the other fellow can tell they have been eliminated. Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can—if you know anything at all wrong, or possibly wrong—to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it. There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition. In summary, the idea is to try to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgment in one particular direction or another. (Feynman, 1985, p. 341)
Feynman went on to contrast this image of science with advertising. His example involved a statement that was literally but selectively, true, in that it implied that the product was unique in an important respect, when in fact, in that respect, it was no different than its competitors.
Meehl on “the slow progress of soft psychology”
In 1978, Paul Meehl argued that There is a period of enthusiasm about a new theory, a period of attempted application to several fact domains, a period of disillusionment as the negative data come in, a growing bafflement about inconsistent and unpredictable empirical results, multiple resort to ad hoc excuses, and then finally people just sort of lose interest in the thing and pursue other endeavors. (1978, p. 807)
With Meehl, we recognize the “terrible intrinsic difficulty of the pursuit of understanding in our subject matter” (p. 807). Meehl listed 20 features that make the science of human psychology difficult, but his central point was that a key source of the slow progress was excessive reliance on significance testing, which, we note, leads naturally to inferences of an all-or-none nature. Given the enormous societal investment in psychological research over the intervening three decades, we do not believe that the progress soft psychology has, or has not, attained would have led Meehl to recant his position, either on the 20 features or on his core argument regarding significance testing (see Miller, 2004). In fact, two decades after Meehl, the Task Force on Statistical Inference (TFSI) of the American Psychological Association (Wilkinson & The Task Force on Statistical Inference, 1999) reiterated some of Meehl’s concerns, as did Kline (2004).
Meehl was highly critical of the use of null-hypothesis testing, however we believe that the consequences of null-hypothesis testing that Meehl lamented may be due more to the way in which investigators, in representing and discussing research results, focus almost solely on those results supporting the generalizations made. The focus of the present paper is on the representation and discussion of unexplained variation, a major contributor to which is individual differences, and we note that several of Meehl’s 20 features reflect individual differences. He explicitly named individual differences, as well as polygenic heredity, the idiographic problem, unknown critical events, and others. Furthermore, whereas Meehl addressed his concerns to what he called soft psychology, many of the problems he raised characterize the entire spectrum of psychological research. We further propose that another reason for “the slow progress of soft psychology” is the tendency of psychologists when reporting research results to tell only part of the story. One important part of the story that is not well told is the unexplained variation that characterizes behavioral research. Highlighting unexplained variation would be one way, in Feynmann’s words, “to try to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgment in one particular direction or another” (1985, p. 341).
Meehl on appraising and amending theories
In 1990, Meehl presented a formal analysis of theory appraisal, in which he described the logical form of theory testing as follows:
where T is the theory of interest, A t and A i auxiliaries, and C n the research results. The right side is the material conditional, if you observe O 1 then you will observe O 2. This is not the place to explore Meehl’s entire analysis, but note that from our perspective, C p (“a Ceteris paribus clause (all other things being equal)…,” Meehl, 1990, p. 109) is the key term that reflects a fundamental reason for the lack of greater progress in psychological research. It is why clinicians, in the words of Baker, McFall, and Shoham (2008), “are deeply ambivalent about the role of science in informing their practice… and value personal clinical experience over research evidence” (2008, p. 77). But in no field in psychology, much less clinical psychology, are all other things ever equal. The fact that there is variance across people on virtually every behavioral measure precludes the possibility that all other things will be equal. How can unexplained individual differences be highlighted and explored?
Representing and exploring unexplained individual differences
Kluckhohn and Murray posited that “Everyman is in certain respects (a) like all other men, (b) like some other men, (c) like no other man” (1948, p. 35). While the language is outdated, the ideas are not. These three statements encompass most, if not all, of the questions constituting contemporary psychological research, research that has principally, but not exclusively, been conducted under the rubric of what Suppe (1977) and others term the received view of science. The goal dictated by this view is the achievement of general, causal laws of behavior, underlying the pursuit of which is the tacit assumption that every man is, with regard to the behavior studied, like all other men.
Clearly, research in domains characterized by Kluckhohn and Murray’s assertions a and b challenges the creativity and the technical skills of researchers and clinicians alike, but the greater challenge for scientific psychology emerges from the fact that in some important respects every person is unique (Allport, 1937, 1961). This makes it exceedingly difficult to represent and talk about results from traditional research designs in a satisfactory way. Traditional nomothetic designs, by the very nature of the statistical analysis of group averages and of the consequent inferences in the tradition of null-hypothesis testing, treat within-group individual differences as error, even to the point of labeling them as such.
The nomothetic model that dominates psychological research leads researchers and the consumers of research to state or imply that “X and Y are causally related.” Such statements are, in the simple case, based on a statistically significant difference between an observed mean of an experimental group and a mean consistent with some null hypothesis. The within-group variability is labeled, as noted before, “error.” But that variability is not simply error. Some unknown amount of it represents potentially highly interesting individual differences in behavior. Research results are typically presented so that the significant effects, whether experimental or correlational, are the focus of the results and discussion, and almost invariably the sole focus of presentations in secondary sources. Treating research findings such that the experimental effects or the obtained correlations are the whole story is inconsistent with the scientific imperative and Feynman’s admonition. The degree of unexplained individual differences can often be estimated if the data presented allow calculation of the effect size, but whatever real individual differences are reflected in the error term are often ignored. If the measurement instruments possess acceptable reliability and unless the effect size is extraordinarily large, then the magnitude of the error attributable to real individual differences must be large.
The cost of ignoring unexplained variation
There is an important payoff in the simplification achieved by drawing conclusions and making generalizations as though only the mean matters, essentially treating everyone as pretty much the same. Doing so simplifies the results, thus facilitating both understanding and communication of the findings. But what are the costs? It has long been argued that there is a cost to ignoring individual differences, and many have argued for more attention to individual behavior. These include Cronbach (1957) in his classic paper, “Two Disciplines of Scientific Psychology,” Allport (1937, 1961) in his call for idiographic research (see Lamiell, 1981, for a review), and in the current interest in qualitative approaches (e.g., Lincoln & Guba, 1985). Despite these pleas, nomothetic research, with its focus on null-hypothesis testing and the tie to and discussion of means, correlations, and group statistics is undoubtedly here to stay. In spite of variation in individual differences that may dwarf the so-called systematic variance, experimenters typically treat the independent variable as the sole cause worth discussing. Unexplained variation fundamentally implicates multiple causes, but only one—the independent variable—is typically considered and discussed. Given the literature review, the conception and conduct of the research to the final writing and the comments of referees, the investigator very likely knows a great deal about the multiplicity of potential influences on the behavioral phenomenon represented by the dependent variable. However, that knowledge too rarely finds its way into the published paper. In those cases in which the variance due to individual differences is considerable but we focus solely on the significant effects, we find ourselves in the odd position of saying that “people tend to do X” when it is equally or more true that “people tend not to do X.”
The theoretical composition of unexplained variation
Unexplained variation includes some degree of simple measurement error, the magnitude of which can be estimated by establishing the reliability of the dependent variable. But the study of behavior potentially includes two different superordinate classes of confounding variables: (a) environmental variables and (b) organismic variables. Potential sources of environmental confounding are controlled by keeping constant all conditions other than the independent variable, and are not addressed here. In this paper we focus on organismic variables, by which we mean the myriad ways in which people differ, including physiological, emotional, personalogic, and cognitive. Potential confounding by such variables is routinely addressed by random assignment, and sometimes by other methods such as covariance analysis and control groups. Considering the many variables investigated by personality psychologists, social psychologists, sociologists, physiologists, and so forth, it seems self-evident that there are a multitude of organismic variables that may covary with any interesting behavior under study.
Random assignment to the experimental and control groups is intended to make the two samples probabilistically equivalent before manipulation of the independent variable. That is, the experimental and control groups are assumed to be equivalent on whatever organismic variables the investigator believes may influence the dependent variable, but also on all other organismic variables as well. Fisher put it succinctly: “Randomisation properly carried out… relieves the experimenter from the anxiety of considering and estimating the magnitude of the innumerable causes by which his data may be disturbed” (1935, p. 49). We argue that Fisher overstated the case. Instead, we claim that experimenters have little or no warrant to treat unexplained variation as random variation and to dismiss it as error.
Estimating the number of potentially confounding organismic variables
For present purposes we restrict ourselves to simple two-group, between-subjects designs with a single independent variable. Assume that an experimenter has randomly sampled 200 participants from some population, randomly assigned 100 of each to control and experimental groups, manipulated an independent variable, measured a dependent variable, and analyzed the data via an independent-groups t test. The null hypothesis has been rejected, and a causal relationship has been inferred and discussed. We make a conditional estimate k, the number of confounding organismic variables expected to be statistically significantly associated with the dependent variable even after randomization.
By what reasoning might we arrive at a defensible estimate of the total number, V, of organismic variables? One class of such variables is personality traits. Another class includes factors proposed to constitute intelligence. Other categories include variations in physiological characteristics, educational differences, age, interests, differential response biases, health-related factors, etc. The number of organismic variables, V, must be very large. For the sake of illustration in the argument below, let V = 1,000.
Assume the simplest case, that is, all 1,000 potentially confounding organismic variables, Xi, are independent of one another, and, in the population, uncorrelated with Y, the dependent variable. For any experimental study, given that the participants have been randomly assigned to experimental and control groups, there are V sample correlations between X i and Y. The distribution of sample correlations is expected to be normal, with a mean of 0 and a standard error determined solely by the sample size. It follows that the expected number of statistically significant confounding organismic variables, k, in this hypothetical case, is simply
Given V = 1000 and α = .05, then k = 1000 *.05 = 50, and we can be certain with a confidence of .99 that the true value of k lies in the interval of
The power of randomization is supposed to be such that knowledge of what the extraneous variables are is unnecessary. Yet, the possibility, or rather virtual inevitability, of confounding is not that easily dismissed. Replication may be presented as the answer to the problem explored here, as the mean effect of the extraneous variables will be zero across replications. But on any given replication, one or more extraneous variables will almost certainly have a significant correlation with the dependent variable. One may argue that multiple replications should take care of the issue, but even single replications are rare enough in most research domains in our field. As noted, the previous calculation assumes that organismic variables are independent of one another, as well as having a true effect size of zero. The independence assumption and the assumption of zero true effects of all organismic variables are, of course, unrealistic, but they are convenient in that they lay bare the argument. Relaxing those assumptions does not, in principle, weaken the argument. On the contrary, relaxing the assumption that all of the V organismic variables in the population are uncorrelated with Y increases k. Relaxing the assumption that the V organismic variables are independent of one another would, on the other hand, diminish k. But setting V = 1,000 is wildly conservative.
There is an analogous problem in typical correlational designs. Two measures of behavior are assessed on a group of people, a coefficient of correlation is calculated, and if the null hypothesis is rejected, the coefficient of correlation with its associated p value is reported, and the substantive conclusion that the variables are related is drawn. The theoretical conclusion is often framed in terms of a universal generalization, even though, as before, the residual individual differences variation taken with respect to the regression line may be far larger than the variance explained by the predictor. Again, the term “error” or “error variance” is used in such a way that the potential degree of unexplained but real individual variation is ignored. The all-or-none nature of the significance test fosters this tendency to discount or ignore unexplained individual differences, and we applaud the often recommended procedure of reporting effect sizes rather than p values, but doing so does not address the problem described as it focuses strictly on the effect.
Principles guiding the prominent representation of unexplained variation
There are five principles underlying our recommendations: (a) present the unexplained individual differences in multiple ways, graphic, numeric, and verbal; (b) stay as close to the observed data as possible; (c) represent the unexplained individual differences, if possible, on the same scale as the explained variance; (d) when feasible, report frequencies (see Gigerenzer & Hoffrage, 1995); and (e) the most fundamental principle concerning representation of unexplained variation must be that as scientists, we ought to strive to tell the whole truth. Putting the best face on results may be tempting, but it is a temptation we ought to resist. We are not advertisers, pitchmen, or politicians. Feynman’s injunctions were framed primarily in terms of alternative explanations. They apply equally to representing unexplained variation with as much prominence as we represent “systematic” behavior. Scientific psychology, if it is to fulfill its promise, must drop the convenient fiction that some large components of total variation are “error,” hence to be ignored. Again, doing so places the investigator in the position of saying that “people tend to do X” when it is also true that “people tend not to do X.”
Suggested methods of representing unexplained variation
Example 1: Pearson correlation
Consider a hypothetical study in which a correlation coefficient is calculated on paired observations from 100 participants. A statistically significant correlation of .30 emerges (p < .01), and the theoretical implications of the relationship are explored. While the discussion focuses upon the 9% of the variance explained, considerable variance is unexplained and goes unrepresented.
Consider two possible graphic representations. One, the scatterplot, has been described by Chambers, Cleveland, Kleiner, and Tukey (1983) as possibly the single most powerful statistical tool for analyzing the relationship between two variables, x and y, and goes a long way toward satisfying the principles listed above. A scatterplot should, whenever possible, be presented along with the value of r. A scatterplot visually confronts the user with the unexplained variation. Scatterplots are, of course, commonly used, but we think that they should always accompany reports of correlation coefficients. Even where large numbers of coefficients are reported, these can be shown in scatterplot matrices (Cleveland & McGill, 1988). They should certainly be presented when an r value is the central means of communicating a result. Figure 1 shows a scatterplot for r = .30, n = 100, constructed according to the recommendations of Doherty and Anderson (2009).

A scatter plot with n = 100 and r =.30.
Another compelling visual portrayal of the unexplained variation would be a display of three frequency distributions side by side: one being the distribution of the measure of the personality variable, the second being the distribution of the “errors,” and the third the predicted z -scores. The three distributions shown in Figure 2 have means of 0, and standard deviations of 1.00, .96, and .30, respectively.

Three frequency distributions based on a correlational analysis with n = 100 and r = .30. Panel A shows the distribution of the dependent variable. Panel B shows the deviations of the z y values from the predicted z y values. Panel C shows the values of z y predicted from z x . The variance of the terms in Panel B, .92, is often termed the “error variance” whereas the variance of the terms in Panel C, .09, is often termed the “explained variance,” or “common variance.”
The unexplained variation can be communicated transparently in a numerical fashion as well. It would be obvious to researchers that the proportion of variance in the personality measure not accounted for by the independent variable is 1 – r 2, or .91. But consider the recommendation that the measure of unexplained individual differences should be on the same scale as the measure of degree of association. The statistic commonly reported (r) is, of course, the standard deviation of the predicted z-scores. If we report the standard deviation that takes into account the influence of the predictor, then should we not report the standard deviation of the errors of prediction, the coefficient of alienation, (1 – r 2).5, which is .96 in the example at hand?
Example 2: The phi coefficient and chi squared
A scatterplot shows the observations directly, but a simpler representation may also be desirable. It may be enlightening to a nonspecialist to show the data in a 2 x 2 table, with both variables dichotomized at their means, or at some other meaningful points such as cut scores, medians, etc. Table 1 shows the data from Figure 1, dichotomized at the means.
The data from Figure 1, with the Pearson correlation, dichotomized at the means
Example 3: A t test
Next, consider the analysis of a simple between-groups experiment with 50 participants in each of two groups. The group means are 12 and 16. The standard deviations are both 10. The t value for such a result is 2.020, which is significant with p < .05. In addition to the summary statistics routinely presented, the unexplained variation could be given more prominence by presenting the frequency distributions of the two groups on the same axis. Doing so would give the reader something that looks like the overlapping distributions of a typical figure in the theory of signal detectability. Would the suggested distribution of scores be impressively far apart, suggesting a substantial effect, or would the overlap impress the reader with the degree of unexplained individual differences? Figure 3 shows the suggested representation. Of course, the t-test data could also be represented in a striking fashion as the 2 x 2 table in Table 2, with cells being the frequencies of each of the two groups that exceed the mean of the control group.

The data from the hypothetical t test, but as overlapping distributions of the raw data.
The data from the hypothetical t test, in the form of frequencies that exceed the mean of the control group
Example 4: ANOVA
The kind of representation shown in Figure 3 would not work well for multigroup experiments. The figure would be too complicated to portray the data well. Consider a hypothetical experiment, in an independent-groups design with four levels of an independent variable. Multiple-group data for a one-way ANOVA with an interval-level independent variable are often presented on a graph, with the group means connected by a statistically derived function. If the independent variable is nominal or ordinal, the abscissa will have arbitrary spacing. The variation is represented by error bars that are almost always standard errors of the mean. In order to read the differences between adjacent means on the dependent variable, we simply check the values on the ordinate because the mean differences are on the raw score scale, but the unexplained individual differences are represented as the standard errors of the mean, which can be, with a large number of df, miniscule in spite of substantial variation. Such a practice faithfully reflects the analysis, but the data would be much more faithfully reflected if the unexplained individual differences were represented by error bars in standard deviations, or even representations of the distribution of observations, as shown in Figure 4, rather than standard errors.

The results of an ANOVA showing the individual observations instead of the usual standard error bars.
The potential payoff of exploring unexplained variation
Early in this paper we quoted Meehl to the effect that after a period of enthusiasm, there often comes a period of attempted application to several fact domains, a period of disillusionment as the negative data come in, a growing bafflement about inconsistent and unpredictable empirical results, multiple resort to ad hoc excuses, and then finally people just sort of lose interest in the thing and pursue other endeavors. (1978, p. 807)
An example of a contemporary enthusiasm is a very popular, decades old educational practice: sorting children according to learning styles. Pashler, McDaniel, Rohrer, and Bork’s (2009) review of the literature showed that this practice has little or no empirical warrant, in spite of the enormous resources that have been expended thereon. Even in the light of such a devastating critique, it may be optimistic to assume that educators will lose interest and pursue scientifically supported interventions. We propose that those initial rushes of enthusiasm for what turn out to be costly blind alleys might well be tempered if the investigators were to describe their results with as much focus on unexplained variation as on the effect. Would those rushes of enthusiasm be leavened by the doses of reality administered by, for example, error bars in standard deviations rather than standard errors? By scatterplots and coefficients of alienation as well as correlation coefficients? Should not researchers of all sorts provide those who use the research all of the information needed to judge the value of the research?
Our call for the prominent representation and exploration of unexplained variation is related to a criticism voiced by Campbell (1990; see also McGuire, 1986, 1989). Campbell’s misgivings were in a commentary on Meehl (1990), in which he lamented the practice of reporting only results that had resulted in rejection of the null hypothesis. These authors also identified the hypothesis testing ideology as the root cause of selective reporting that ignores unexplained variation. The suppression of pilot-study evidence of contexts where the theory does not hold is unexplained variation in a different context than that which we address in this paper, but it is nevertheless unexplained variation.
A foundation for a theory of the behavior assessed by the dependent variable
What do we mean by exploring unexplained variation? The term has a well-established statistical connotation, but what, specifically, does the recommendation to explore it mean? Recall that we argued above that the study of behavior potentially includes two different classes of confounding variables. This proposal calls for the discussion sections of empirical papers to include a subsection that presents two lists, with brief discussions of the items listed in each. The first list would deal with the potentially confounding organismic variables that are the main focus of this paper. The proposal is that authors list and discuss organismic variables other than the ones that might have been assessed for a covariance analysis—that is, those unknown variables that were presumptively ruled out by random assignment, but that are known from other sources to be significant sources of variation in the dependent variable, or dependent variables closely related to dependent variable(s) in the study. Random assignment, as crucial as it may be, does not preclude unknown organismic variables from being correlated with either the independent variable, the dependent variable, or both. Such a list would be an attempt to account for the unknown variation in the study involved, as measured by the usual statistical yardsticks, and would serve to strengthen the empirical foundation for future theorizing about the behavior in question.
The second list would explore all independent variables, other than those manipulated in an experimental investigation that are known by the investigator to influence the dependent variable. Potential contributions of such variables to the unexplained variation are, in principle, ruled out by proper experimental controls, or perhaps statistically via covariance analysis. But such a subsection in a discussion section would deal, however incompletely and imperfectly, with unexplained variation in a larger ecological sense. That is, it would be relevant to the variation in behavior not associated with the independent variable in the wide variety of other contexts in which the behavior naturally occurs. An investigator publishing a paper likely knows far more about the various influences on the behavior represented by the dependent variable than many of the readers of that paper. The two subsections of the discussion suggested before are vehicles for communicating that knowledge.
Perhaps the most compelling argument for what we propose is simply good scientific practice involves, in the words of Richard Feynman, “trying to give all of the information to help others to judge the value of your contribution” (1985, p. 341).
Footnotes
Appendix
In order to illustrate more concretely what we mean by exploring unexplained variation, we comment on two investigations. We have selected two investigations, each coauthored by renowned investigators. We argue that had the authors given serious consideration to the unexplained variation, the papers could have had more value to the scientific community. One case is from the judgment and decision-making literature and is, we think, relatively noncontroversial. The second is a recent, highly controversial paper on parapsychology.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Michael E. Doherty, Professor Emeritus of Psychology at Bowling Green State University, obtained his PhD from the University of Connecticut in 1965. His early research involved Bayesian modeling in perception, after which he devoted his professional career to research in human inference, judgment and decision making, and the psychology of science, the latter with Clifford Mynatt and Ryan Tweney. Recent papers with Richard Anderson dealt with signal detection approaches to inferences about correlation. An early book (Asking Questions About Behavior: An Introduction to What Psychologists Do) with Kenneth Shemberg was aimed at introductory psychology students. Address: Department of Psychology, Bowling Green State University, Bowling Green, OH 43403, USA. Email:
Kenneth M. Shemberg, Professor of Psychology at Bowling Green State University, obtained his PhD from the University of Nebraska in 1966. His early writing and research was on the treatment of outpatient psychotics, and on professional issues. Collaboration was often with Donald Leventhal and Stuart Keeley. His most recent professional involvement is in the teaching and supervision of doctoral students in the Psychological Services Center at Bowling Green State University’s Department of Psychology. The book Asking Questions About Behavior: An Introduction to What Psychologists Do aimed at introductory psychology students was done with Michael E. Doherty. Address: Department of Psychology, Bowling Green State University, Bowling Green, OH 43403, USA. Email:
Richard B. Anderson is an Associate Professor of Psychology at Bowling Green State University. He obtained his PhD from the Pennsylvania State University in 1992. He currently teaches graduate and undergraduate courses in cognitive psychology and in statistics, as well as an undergraduate course in general psychology. His research—some of which has been funded by the National Science Foundation—pertains to intuitive statistical judgment and reasoning, memory, and cognitive simulation. Address: Department of Psychology, Bowling Green State University, Bowling Green, OH 43403, USA. Email:
Ryan D. Tweney is Professor Emeritus of Psychology at Bowling Green State University, where he has been since receiving his PhD from Wayne State University in 1970. He is the coeditor (with Michael E. Doherty and Clifford R. Mynatt) of On Scientific Thinking (1981). His historical-cognitive research on scientific thinking has resulted in papers and books on the experimental researches of Michael Faraday. Currently, he is working on the metaphoric underpinnings of mathematical physics, centering on the thinking of James Clerk Maxwell. Address: Department of Psychology, Bowling Green State University, Bowling Green, OH 43403, USA. Email:
