Abstract
I suggest that the recent, highly visible, and often heated debate over failures to replicate results in the social sciences reveals more than the need for greater attention to the pragmatics and value of empirical falsification. It is also a symptom of a serious issue—the under-developed state of theory in many areas of psychology. While I focus on the phenomenon of “social priming”—since it figures centrally in current debate—it is not the only area of psychological inquiry to which my critique applies. I first discuss some of the key issues in the “social priming” debate and then attempt to show that many of the problems thus far identified are traceable to a lack of specificity of theory. Finally, I hint at the possibility that adherence to the materialist tenets of modern psychological theory may have a limiting effect on our full appreciation of the phenomena under scrutiny.
During the past decade the discipline of psychology has experienced several blows to its collective self-esteem. Some of these are rather serious—involving accusations of fraud—that have resulted in more than 60 articles being retracted from respected journals such as Cognition, the Journal of Personality and Social Psychology, and Science.
Less serious than data manipulation, but still discomforting, has been the inability of several well-known social psychological findings to withstand the scrutiny of replication. In particular, work by Bargh and colleagues (Bargh, Chen, & Burrows, 1996; Williams & Bargh, 2008) and Dijksterhuis and van Knippenberg (1998) on the phenomenon of social priming has been taken to task due to the failure of a number of labs to reproduce published effects (Doyen, Klein, Pichon, & Cleeremans, 2012; Nieuwenstein & van Rijn, 2012; Pashler, Coburn, & Harris, 2012; Shanks et al., 2013). 1 Social priming refers to performance on tasks in which a stimulus—presented (often, though not always) in a manner falling outside the participant’s awareness—subsequently is found to affect his or her behavior in a manner that rationally can be tied to information embodied in the stimulus.
Replication failure, of course, is an accepted and anticipated aspect of scientific inquiry (e.g., Godfrey-Smith, 2003; Kuhn, 1962; Ladyman, 2002; Popper, 1963/2004; Trusted, 1979). In light of this, the reaction of the psychological community appears somewhat disproportionate to the severity of the assumed transgression: articles and editorials addressing the failed replications have appeared in the popular media (e.g., The New Yorker; Psychology Today) and more scholarly venues (e.g., Nature; The Chronicle of Higher Education; ScienceNews); academic journals have issued responses and suggested remedies (e.g., Plos One; The Psychologist); there have been calls for more studies devoted to replication (e.g., the Reproducibility Project), and the creation of outlets for reporting findings (e.g., Perspectives on Psychological Science). In addition, numerous psychology blogs have made it a point to reassure their readers that replication failure is a normal part of scientific practice (e.g., Discovery Magazine; Evolutionary Psychology; Social Brain, Social Mind)—something, one would think, is already common knowledge among their audience.
I certainly applaud a strong interest in, and concerted effort toward, greater attention to issues of replicability. Increased concern with such things as the “file drawer” problem, the need for accessible databases, more detailed recounting of experimental method, clear enumeration of the criteria for a competently conducted replication study, the establishment of venues whose primary mission is to support such studies, and the like, are all positive developments. To the extent that replication studies have taken a second seat in the pursuit of psychological knowledge (and they have—e.g., the Journal of Personality and Social Psychology has a policy of refusing to publish replication studies; E. Smith, personal communication, January 20, 2013), renewed appreciation of their importance is a good thing.
Is that all there is?
Not surprisingly, every one of the blogs, articles, and editorials cited above has identified as the remedy for concerns about data reliability a need to accord replication studies a more valued status in psychological research. However, I do not believe this prescription goes far enough. Anxiety over the lack of appreciation of the merits of replication (and the need to redress this oversight) does not, by itself, seem sufficient to account for the intensity and acrimony of the debate between key players (one blogger characterized the response from an author whose data failed a replication attempt as a “scathing personal attack”; Yong, 2012a).
One explanation for the passionate, and occasionally accusatory, nature of the expressed concerns is that social scientists are worried that the public will conflate replication failure with data fraud, thereby calling to question the scientific legitimacy of the psychological enterprise. In support of this, an open email (9/26/2012) to the psychological community from Nobel Laureate Daniel Kahneman argues that social psychology is now seen as “the poster child for doubts about the integrity of psychological research” and that there will be a “train wreck looming” unless psychologists take immediate steps to clean up their act. Kahneman offers several correctives he hopes can resolve the perceived replicability crisis and “rehabilitate the field” (Yong, 2012b).
However, both the problem and its resolution, as I see it, require more than increased attention to conditions that enable proficient replication. The impassioned quality of the disputes between the authors whose findings are in question and those unable to reproduce them suggests more than normal, justifiable interest in the logic and practice of scientific falsification is at work. In my opinion, much of the tension surrounding concern over the pragmatics of falsification is symptomatic of a far deeper, but largely unidentified problem—i.e., the failure of much of psychological theory to attain the standards characterizing theory in the non-social sciences.
The relation between theory and experiment: What is being replicated?
A well-conducted replication requires, at a minimum, that the essential conditions of the study match those of the to-be-replicated study as closely as possible (e.g., Brunswik, 1947/1956). Methodological deviations can make all the difference between success and failure to replicate. But what determines what counts as an essential condition? The answer is simple and direct—it is the function of theory to enumerate those conditions (e.g., Brunswik, 1947/1956; Fodor, 1968; Hanson, 1958; Popper, 1963/2004; Trusted 1979). Fodor (1968) makes the point, using as his example the relation between the theory of projectile motion and its experimental instantiation: It would be hoped that sufficient information about initial states, together with a viable theory of actions, would, in principle, permit the theorist to compute the pattern of motions that will realize a given action on a given occasion. The fact that under different conditions the same action may be realized as different patterns of motion is irrelevant to the feasibility of this goal, just as the fact that under different initial conditions the same interaction of forces determines different trajectories is irrelevant to the feasibility of constructing a theory of mechanics. (p. 43)
Thus, the class of essential conditions required for a successful, quantifiably predictable test of a scientific theory is specified by abstract principles embodied in theory. Other causally potent, but conceptually extraneous factors (a potentially infinite set) typically can be reasonably well controlled 2 via the random assignment of representative samples to experimental conditions.
In the social priming debate, by contrast, successful replication appears to pivot precariously on the reinstatement of such factors (for a more detailed enumeration, see below) as whether or not a cubicle was used for testing, the type of technology employed to time participants’ movements, etc.—i.e., aspects of testing that, on the surface, seem far removed from the set of theoretically mandated “variables of interest.” Failed attempts to reproduce social priming results are met with accusations that seem to imply that almost any deviation from the original protocol can be responsible for causing the anticipated effect to wither and die, rather than simply alter in a theoretically predictable manner. That is, the predicted outcome appears to be all or none rather than principled variation in outcome.
While changes in conditions can—and should—be expected to affect an experimental outcome, they should do so in ways dictated by theory. They should not simply banish the experimental outcome to the realm of interpretive oblivion. A well-formulated scientific theory makes explicit the conditions for its own empirical evaluation. For example, with regard to effects of age-related primes on rate of walking (Bargh et al., 1996), a theory of social priming should clearly state how the effects of age-related primes on bodily movement will vary with parametric changes in causally relevant essential factors. Unfortunately, this is not what we see in the social psychological “replication wars.” Rather, the key players, citing the volatility of social priming, argue that even small changes in experimental conditions can eliminate the phenomenon.
A poorly designed experiment cannot reasonably be expected to recapture the effects under scrutiny. But the conditions that constitute a “poorly designed” social priming study seem unusually broad by scientific standards. For example, Dijksterhuis (2013) argues that factors such as the failure to conduct testing in cubicles, the number of experimental conditions being examined, the number of participants tested, the method of participant recruitment, the degree of heterogeneity permissible as a function of sample size, presumptions about what participants may or may not have believed, whether undergraduate or graduate students conducted the study, etc., are sufficient to obliterate the effect. Concessions to the extreme sensitivity of method are also voiced by Bargh (2012), noting among the culprits, the number of age-relevant words employed in the priming task, whether participants are given explicit or implicit directions where to walk (one study examined changes in participant’s walking speed as a function of age-related primes), the technology used to time their rate of walking, etc. In short, the binary opposition of “phenomenon present or phenomenon absent” appears to be the level of predictive certitude that theories of social priming can, at present, comfortably accommodate.
Of course, the conditions of replication can derive from pragmatic as well as theoretical considerations. The former can be reasonably addressed by (a) simple logic (e.g., visually impaired individuals should not take part in visual perception studies, participants should not be told the experimental hypothesis in advance of the study, etc.) or, when not so easily identified, (b) random assignment. The latter (i.e., theoretical) considerations, by contrast, are accommodated by the formal features of the theory: changes in theoretically specified factors should lead to predictable, measurable alterations in experimental outcome (only one of which is its complete elimination).
Can we construct a well-specified scientific theory from a set of binary oppositions?
In what follows I focus on social theory that fails to adhere to the tenets of the hypothetical-deductive model of scientific inquiry. (It is also the case that some replication failures concern phenomena which are described outside any well-specified theoretical framework. I do not explicitly address these phenomena; however, they too, of logical necessity, fall victim to issues of theoretical under-specification that contribute to replication failures.)
An abstract theory is, of necessity, tested by a particular set of empirical tasks. Hopefully those tasks capture the essential components embodied in the theory (e.g., Brunswik, 1947/1956). However, when the criteria for a properly conducted test are stipulated to include precise reinstatement of all (or most) task conditions, it almost seems as though it is the task, not the theory that is undergoing evaluation.
But is this necessarily a bad thing? Perhaps such an approach might ultimately lead, via a steady accumulation of outcomes sharing a folk-psychological family resemblance, to a set of formal principles enabling precise quantitative prediction. Perhaps a nomological network (e.g., Margenau, 1950) can be pieced together from the outcomes of individual studies.
Unfortunately, in advance of the establishment of a set of computational principles capable of predicting how specific variations in initial conditions will affect task performance, it is hard to imagine that predictive sophistication can easily transcend the binary opposition of “effect present” or “effect absent.” As Newell observed 40 years ago, As I examine the fate of our oppositions, looking at those already in existence as a guide to how they fare and shape the course of science, it seems to me that clarity is never achieved. Matters simply become muddier and muddier as we go down through time. Thus far from providing the rungs of a ladder by which psychology gradually climbs to clarity, this form of conceptual structure leads rather to an ever increasing pile of issues, which we weary of or become diverted from, but never settle. (1973, pp. 288–289)
He continues: “We never seem in the experimental literature to put the results of all the experiments together” (p. 298).
Theories are not static; they grow and flourish, or they are cast aside, as new data confirm or disconfirm their core principles (e.g., Heisenberg, 1958/1999; Kuhn, 1962; Ladyman, 2002; Margenau, 1961; Popper, 1963/2004). Moreover, theories have to come from somewhere: they are not created ex nihilo. In the non-social sciences, deductive principles (e.g., Margenau, 1950; Trusted, 1991) as well as the accumulation of experimental data conjoined with inductive inference (e.g., Medawar, 1968/1980) provide the grist for construction of theoretical propositions capable of supporting parametric predictions that go well beyond binary oppositions.
This is observed in psychological science as well, particularly, but not limited to, areas such as psychophysics, perception, memory research, language, and problem solving. One thing these areas of inquiry have in common is a tendency to focus on conceptual quantification while either ignoring or downplaying the individual’s phenomenology (though, as I have argued with respect to memory research, this may be a serious mistake; Klein, 2013, 2014). A person’s experiences are (hopefully) recognized as being involved—in some way—with the acts being measured; but his or her experiences, per se, are assumed either to be epiphenomenal (i.e., mere concomitants with no causal potency), or only indirectly relevant to the attainment of a quantifiable set of nomological principles.
However, as I touch on in the final section of this paper, while this approach to theory construction may be “workable” in some areas on psychological inquiry, it comes at a cost. It relegates first-person experience to a second-class citizenship whose particulars can be largely ignored in the pursuit of the construction of a scientific theory of psychological phenomena.
Thus, while certain domains of psychological research can, and do, rely on the outcomes of individual studies for theory construction, they often do so by forgoing detailed analysis of personal phenomenology. And, a science of the mind that removes mind from empirical consideration runs the serious risk of obscuring the very issues that should be of focal concern in the psychological enterprise (e.g., James, 1890; for extensive, recent discussion see Klein, 2014). To the extent that these issues (i.e., mental constructs) are accorded empirical attention, the difficulty (perhaps the impossibility; e.g., Klein, 2014; Martin, 2008; Nagel, 2012; Valera, Thompson, & Rosch, 1993; Wallace, 2003) of reducing them to a material instantiation means they are most often examined via binary oppositions rather than quantitative manipulation. And, as Newell (1973) and others have argued, this does not bode well for theory construction (but see Kosslyn, 2006, for a different perspective).
Theory serves a directive and interpretive function: it identifies the essential conditions that enable prediction of a range of effects, and it affords meaning to those outcomes. It is understood that factors not dictated by theory (e.g., participant expectancies, sample size, ambient noise, time of day) can, and do, influence test results—hence the need for random assignment of representative samples from the population(s) being tested. But a theory whose predicted outcomes vanish with slight deviations in task construction—e.g., presence or absence of cubicles, academic pedigree of the experimenter, etc.—hints strongly that the token (task), not the type (theory), is being subject to empirical scrutiny.
To put it slightly differently, as the number of permissible variations in experimental conditions approaches zero, we have reason to be concerned that it is the task—not the theory for which the task was pressed into service—that is being evaluated. When the test of a “theory” admits only (or most often) to the binary outcome of “present or absent” the theory becomes conceptually inseparable from the specific task used in its evaluation.
In summary, I have no doubt that the complete reinstatement of experimental conditions will ensure a successful replication of a task’s outcome. How could it be otherwise? Such attention to detail logically guarantees the reemergence of the specific cause/effect relation.
But if this is the requirement for a successful replication, the question must be asked “What is the cause in service of?” The answer, of course, is the effect. But then the question becomes “And what is the effect in reference to?” The answer is (or should be) the theory that predicted the obtained outcome. But if the very presence of a cause/effect relation (rather than predictable alterations in the strength or quality of that relation) depends on exact (or nearly exact) reproduction of task details (some of which have no obvious connection to an underlying theory), then the effect appears to be a task-specific outcome rather than a theory-based prediction. This, in turn, suggests either (a) there is no actual theory being tested; rather there is, at most, a taxonomy consisting of similarly classifiable phenomena in search of a well-specified theory or (b) if a theory exists, it is so lacking in predictive power and generalizability that its utility for understanding of the phenomena at hand must be seriously questioned.
Psychological research and scientific theory
The field of psychology is awash in data. What is often missing, however, are overarching, computationally specified theories by which the data can attain conceptual relevance. Fodor frames the problem in the following way: Psychological metatheory has remained seriously underdeveloped … a psychologist is likely to appeal his decisions about research strategies directly to general methodological principles to an extent to which a physicist or chemist does not … a consequence of the unsettled state of psychological metatheory is thus that schools of psychology are distinguished as much by the kinds of experiments that their adherents typically perform as by the theories they espouse. (1968, pp. xiv–xv)
Wittgenstein sees things similarly: The confusion and barrenness of psychology is not to be explained by its being a “young science”; its state is not comparable with that of physics, for instance, in its beginnings … For in psychology, there are experimental methods and conceptual confusion. The existence of the experimental method makes us think that we have the means of getting rid of the problems which trouble us; but problem and method pass one another by. (1953/2009, A Fragment XIV, 371)
In short, social science often seems to be lacking scientifically credible nomological networks (e.g., Cronbach & Meehl, 1955; Margenau, 1950; Torgerson, 1958)—that is, theoretical devices capable of clearly linking physical observation to a well-formulated, conceptually sophisticated, and rationally integrated set of abstract constructs—thereby enabling computationally rigorous predictions (as well as conceptually satisfying explanations). Absent such a guide, we have no way of knowing whether earlier studies are commensurate with, or antithetical to, whatever studies are presently under examination (e.g., Newell, 1973).
This evaluation is not meant as an indictment of the entire field of psychology. There are a number of sub-disciplines in which sophisticated theoretical propositions specify the links between constructs and observation, thus providing a clear direction to empiricism. The fields of perception, learning, language, and problem solving are obvious examples. But the question then arises “what is it that we are attempting to explain?” (e.g., Klein, 2014). To what do our theoretical findings refer? Can we explain what needs to be explained by embracing the modern scientific world view, and attempting, in a somewhat procrustean fashion, to map “psychological reality” onto an exclusively materialist picture of nature?
The possibility of a scientific psychology: Some concerns about the feasibility of a quantitative materialist reduction as the modus operandi of psychological inquiry
An important, yet often overlooked, question is whether such quantitatively precise formulation is necessarily a good thing for the field of psychology, taken in toto. The criticisms I have offered are directed at research that trades on the idea that current scientific formalism is the ideal toward which psychology should aspire. For such endeavors, it is appropriate to hold investigators to the standards of their adopted paradigmatic approach.
But a broader question is whether a quantifiable, reductionist approach based in materialist ideology is the proper venue for psychological research. In what follows I briefly touch on some of my concerns. I realize I appear to simultaneously be (a) complaining about the relative paucity of scientific theory in the psychological sciences, but, when examples are presented (b) questioning whether they are appropriate ways of acquiring psychological knowledge. I can only beg the readers’ indulgence. In this section I merely hint at reasons for my equivocation, after which the reader can decide if my concerns merit additional consideration (a fuller treatment is provided in Klein, 2014).
Thus far I have argued that much (though certainly not all!) of psychological theory has failed in its attempt to more closely approximate theory in more mature sciences (i.e., the usual suspects such as chemistry and physics). One reason is that researchers, particularly in certain areas of psychology, have found it both financially and psychically rewarding to produce “sexy” findings. This practice often consists in one-off demonstrations of curious relations between such unlikely variables as environmental disorder and discriminatory tendencies (Stapel & Lindenberg, 2011) or physical height and virtuous acts (Sanna, Chang, Miceli, & Lundberg, 2011). Such catchy, media-friendly findings 3 are quick to grab the public’s attention, thereby garnering their authors acclaim as experts on mind and behavior. Unfortunately, such studies, more often than not, are in the service of the phenomenon rather than theory (though the phenomenon may have a tenuous relation to some vaguely specified theoretical consideration) and thus have little to say about the complex mechanisms and processes that might underlie such flashy demonstrations.
However, this is a small part of the problem. A much larger issue, as I see it, stems from the nearly universal adoption of Western science’s presumption that reality, in its entirety, must be composed of quantifiable, material substances. 4 Galileo captured this sentiment at the dawn of modern science with his famous dictum that anything not involving the study of the quantifiable properties of material bodies does not deserve to be called a science (similar sentiments are found at least as far back as the writings of Pythagoras; e.g., Koestler, 1989). While new and more sophisticated versions of the materialist metaphysic have been proposed since Galileo (e.g., Bunge, 2010; Churchland, 1986; Kim, 1998; Melnyk, 2003), the notion of materialism he espoused still holds sway among both physical and behavioral scientists (e.g., Klein, 2014; Koons & Bealer, 2010). 5
In accordance with the materialist manifesto, science is not science unless it involves the quantitative treatment of material reality. Many great advances have been made by expressing reality in terms of mathematically formulated physical laws (for discussion, see Elvee, 1992; Hanson, 1958; Ladyman, 2002; Margenau, 1950; Rescher, 1984, 1996; Spencer Brown, 1957). Measurements and equations are held to sharpen our thinking. And indeed they have done so in those areas of psychology in which a materialist reduction “appears” to make sense—e.g., neural computation, and similar domains of inquiry in which experience can be accorded a trivial role, or eliminated from consideration with “apparent” impunity. 6
It is clear that materialist doctrine lies at the center of Western thought (e.g., Papa-Grimaldi, 2010). Although psychology’s scientific objects often are functional categories, it is likely that the majority of psychologists are not even aware of the materialist assumptions underlying their conceptual categories (e.g., psycho-neural identity theory; Place, 1956). Many psychological phenomena can be classified as objects that are, in some way, tied to the material aspects of reality even if they are not in themselves material (for recent discussion, see Klein, 2014).
But, is materialist reduction appropriate for all of psychology? It is undeniable that many, if not all, the great achievements in modern science were made possible by the exclusion of “mind” from the world around us. However, as Nagel (2012) points out, at some point “it will be necessary to make a new start on a more comprehensive understanding (of reality) [parenthesis added] that includes the mind” (p. 8). Not all aspects of reality can be restricted or reduced to quantifiable, material facts. Adopting the idealized, quantitative formalizations, such as those of mathematical physics, as a model for the study of human experience does not, and cannot (at least at present), adequately capture the richness of human phenomenology (e.g., Earle, 1955, 1972; Gallagher & Zahavi, 2008; Klein, 2014; Meixner, 2005; Tulving & Szpunar, 2012; Valera et al., 1993; Wallace, 2003).
In short, to maintain that all reality can be captured by a single set of methods (e.g., those of current science) is to maintain that reality consists in its entirety of objects, processes, systems, and relations, i.e., those aspects capable of being grasped by a particular set of methodologies and theoretical assumptions. But as we currently have no way of surveying the whole of reality, this amounts to little more than metaphysical dogma.
Further discussion of this terribly important issue would, I fear, take us far afield from the main purpose of this paper—i.e., consideration of the possibility that replication failures in the social sciences stem not only from the normal issues attending scientific research; they also reflect a lack of theoretical specification in many areas of the psychological research. For readers interested in a treatment of the role of the materialist dogma in psychological inquiry (the libraries are full of such discussions), see Klein (2014) for some discussion and possible direction.
Footnotes
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
