Abstract
In their critique of Klein (2014b), Trafimow and Earp present two theses. First, they argue that, contra Klein, a well-specified theory is not a necessary condition for successful replication. Second, they contend that even when there is a well-specified theory, replication depends more on auxiliary assumptions than on theory proper. I take issue with both claims, arguing that (a) their first thesis confuses a material conditional (what I said) with a modal claim (Trafimow and Earp’s misreading of what I said) and (b) their second thesis has the unfortunate consequence of refuting their first thesis.
In their critique of Klein (2014b), Trafimow and Earp (2016) identify two points of concern: First, as we have shown, having a well-specified theory is not a prerequisite [emphasis added] for having replicable findings; hence the blame for apparent replication failures should not be placed upon ill-specified theories. And second, when there is a relevant [emphasis added] theory, experimental predictions depend much more strongly than Klein (2014) seems to appreciate on auxiliary assumptions, as opposed to on the theory proper. (Trafimow & Earp, 2016, p. 545).
These are curious concerns. Well-formed theories specify the conditions of their falsification as well as criteria for determining the empirical relevance of many—but not all (see Note 6)—auxiliary assumptions. Thus, it is a truism that, absent such a theory, an investigator cannot be sure of what is to be replicated (e.g., Newell, 1973).
Rather than restate my thesis (Klein, 2014b), I focus herein on Trafimow and Earp’s two points of concern. I argue that the first (see first section) confuses a material conditional (what I said) with a modal claim (Trafimow and Earp’s misreading of what I said). The second (see second section), has the unfortunate consequence of refuting their first thesis. I discuss each in turn.
Erecting and tearing down a straw man
Klein (2014b) took the position that psychology generally, and social psychology in particular, lack well-specified scientific theories. One consequence of this, I argued, is that in the absence of well-specified scientific theory, replication attempts can be seriously compromised.
Since words matter when philosophical argument takes the form of verbal statements (as opposed to mathematical or logical formalisms), I took care to state what I meant by a “well-specified theory.” Drawing largely on work by Margenau (1950), Fodor (1968), and Torgerson (1958), I defined “well-specified theory” as consisting in a set of propositions “capable of clearly linking physical observation to a well-formulated, conceptually sophisticated, and rationally integrated set of abstract constructs—thereby enabling computationally rigorous predictions (as well as conceptually satisfying explanations).” Continuing, “Absent such a guide, we have no way of knowing whether earlier studies are commensurate with, or antithetical to, whatever studies are presently under examination” (Klein, 2014b, p. 332).
I then discussed the paucity of theories in the social priming literature (the major battleground in the current psychological “replication wars”—although I made clear that my argument was not restricted to this particular domain) that meet these criteria. For example, since most social priming theories only permit deduction of the binary outcome “effect present/effect absent,” they fail to provide the predictive precision expected from a well-specified scientific theory.
Despite my care to avoid claims about necessity (or sufficiency), Trafimow and Earp (2016) saddle my definition of well-specified theory with an implicational structure that goes well beyond what I stated. Specifically, they assert that according to Klein (2014b) such theory is a necessary condition for the proper conduct of a replication effort: “a well-specified theory is not a prerequisite [emphasis added] for having replicable findings” (Trafimow & Earp, 2016, p. 545).
Having effected this conceptual makeover (i.e., insertion of an “if and only if” condition), they take aim and refute their self-generated straw man by citing examples of successful replications unaccompanied by well-specified theory (but see the subsection “Are Trafimow and Earp’s examples counterexamples?”): “as we have shown, having a well-specified theory is not a prerequisite for having replicable findings; hence the blame for apparent replication failures should not be placed upon ill-specified theories.” (Trafimow & Earp, 2016, p. 545).
For the sake of argument, let’s accept Trafimow and Earp’s claim that Klein (2014b) argues a well-specified scientific theory is necessary for successful replication. 1 A necessary condition is a prerequisite either as a formal or as an informal axiom. The former case yields tautologies, as in the axiom of equality (X = X). The latter is axiomatic by way of self-evident implicature: theoretical condition X implies observation Y. Trafimow and Earp thus commit the elementary mistake of confusing the replication of an effect with the replication of the theory-driven method whereby the effect is brought about.
But, this is moot, since Klein (2014b) made no claims about necessity (or sufficiency). I argued only that theoretical specification provides guidance (see the above quote from Klein, 2014b, p. 332) in identifying conditions essential for a replication effort—not that well-specified theory was essential to identifying those conditions. Specifically, “A well-conducted replication requires, at a minimum, that the essential conditions of the study match those of the to-be-replicated study as closely as possible” (Klein, 2014b, p. 328). 2 I continued: “the class of essential conditions required for a successful, quantifiably predictable test of a scientific theory is [i.e., can be – not “only can be”] specified by abstract principles embodied in theory” (Klein, 2014b, p. 328).
Thus, while I accord theoretical specificity an important role in identifying conditions essential for a properly conducted replication effort, I never claim that theory plays an essential role. A well-specified theory is a potent—but not the sole—means by which one can re-instate the essential conditions for a replication attempt.
More formally, Trafimow and Earp (2016) maintain that their counterexamples show that -P (poorly-specified theory) can be accompanied by Q (successful replication). From this they infer that P (well-specified theory) is not necessary for Q. But, a defective theory implies nothing! 3 Moreover, the truth or falsity of their conclusion is irrelevant to my stated position.
Nor does Trafimow and Earp’s creative reformulation warrant inferences about logical sufficiency. A reasonable reading of their argument is that counterexamples demonstrate that -P (poorly-specified theory) is not sufficient for -Q (unsuccessful replication). From this they infer that -P cannot cause -Q. But, this follows only if they also assume: if -P is not sufficient for -Q, then -P cannot cause -Q. And it’s clear why we should reject this assumption: it is an instance of the general, but fallacious, principle that one can infer that -P doesn’t cause -Q from the fact that -P is not sufficient for -Q (e.g., an event often has multiple causes, none of which are sufficient on their own to bring about the event).
Indeed, it is easy to fashion cases in which X causes Y without X being either a necessary or sufficient condition for Y. For example, the finding that some heavy smokers (X) enjoy healthy lives (Y) does not sanction the conclusion that X can have no causal role in one’s health (Y). Or, borrowing an example from Laudan (1990), because surgery (X) is not always necessary or sufficient to cure gall stones (Y), it does not follow that surgery is never a useful means for treating gall stones.
In summary, Trafimow and Earp (2016) first change my argument from an if-then claim (a material conditional) to a claim about necessity (a modal claim), and then set out to defeat their re-formulation. But Trafimow and Earp (2016) and Klein (2014b) are asserting different things.
Are Trafimow and Earp’s examples counterexamples?
To support their argument, Trafimow and Earp (2016) turn to the history of science for instances of poorly specified theories associated with replicable outcomes (e.g., phlogiston and aether theory). Such “counterexamples,” they claim, demonstrate that “important and replicable findings” (Trafimow & Earp, 2016, p. 541) can be obtained even with poorly-specified theory.
While there are a number of unaddressed issues surrounding Trafimow and Earp’s choice of “counterexamples” (e.g., naturally occurring regularities versus regularities based on theoretical deduction, differences between theory in the physical versus social sciences, 4 and so forth), discussion of these problems would take us far afield (and push my commentary well over its word count limit). Accordingly, I restrict discussion to whether phlogiston theory—the “counterexample” to which Trafimow and Earp (2016) devote their most sustained attention—supports their claim that poorly specified theory can be associated with replicable outcomes.
Trafimow and Earp state that the theory of phlogiston is a blatantly wrong and ill specified theory—at least from the perspective of hindsight … —which nevertheless dominated the field from approximately the late 17th century to the late 18th century … this theory held that the fire-like element of phlogiston was responsible for combustion, although the specific nature of this relationship was never precisely articulated. Nevertheless, despite this lack of specification, researchers were able to demonstrate—and replicate—the existence of oxygen (wrongly considered to be “dephlogisticated” air), nitrogen (“phlogisticated” air), and other major elements. (2016, p. 541)
As Trafimow and Earp see it, phlogiston theory is a prime example of a poorly specified theory that enabled successful replication.
But what sanctions their assertion that phlogiston theory was “ill-specified”? The opposite case can (and often has) been made. Ladyman (2011, p. 98; see also Chang, 2010), for example, provides a partial summary of the observational regularities subsumed by phlogiston theory:
Metal + heat (in air) → calx [mental oxide] + phlogiston [de-oxygenated air]
Calx + charcoal (source of phlogiston) → metal (+fixed air [carbon dioxide])
Metal + water = calx + inflammable air
Water = inflammable air [hydrogen] = dephlogisticated air [oxygen]
These empirical phenomena can be explained by phlogiston theory. A few examples:
Metal = calx + phlogiston (explaining what it is that metals have in common)
Charcoal = fixed air + phlogiston
Phlogiston theory even made novel predictions, such as the existence of new acids (e.g., formic; Scheele, 1931).
Thus, contra Trafimow and Earp’s assertion, phlogiston theory is a decidedly questionable instance of the type of counterexample Trafimow and Earp (2016) require to support their first thesis. If anything, the history of phlogiston is a case study in which a well-specified relation between phenomena (theory and empirical outcomes) is preserved in subsequent science even though aspects of the ontology of the theory are not (i.e., the replacement of phlogiston by oxygen). In fact, phlogiston and oxygen theory make virtually identical predictions if one assumes phlogiston is negative oxygen (e.g., Chang, 2010; Wisniak, 2004). Negative oxygen, however, requires negative mass, a property that makes the phlogiston harder to accept.
In summary, despite Trafimow and Earp’s claim that “important and replicable findings” can be obtained despite poorly specified theory, the theories in which phlogiston (or aether) were explanatory were, in fact, well-specified. This is why failure of their predictions led to their abandonment. For example, objects taking on phlogiston should show a change in weight, but don’t. Trafimow and Earp thus confuse a “wrong” theory with a “poorly specified” theory. Ladyman (2011) is adamant on this point: “Phlogiston theory identified a number of real patterns in nature and it correctly described aspects of the causal/nomological structure of the world as expressed in the unification of reactions into phlogiston and dephlogistication” (p. 100). A similar conclusion is reached by Chang (2010): Some people think that phlogiston theory deserved to be consigned to the dustbin of history because phlogiston was just an imaginary entity, not based on anything empirical. This is a basic misconception, as phlogiston had some detailed [emphasis added] links with observed phenomena and with very concrete practical operations. And Lavoisier’s theory relied essentially on caloric, the material fluid of heat, which was just as unobservable or hypothetical as phlogiston … Lavoisier’s theory won because it was inherently simpler than phlogiston theory. (p. 57)
The take away message is that oxygen theory replaced phlogiston theory not because the latter was poorly specified, but because it entailed some controversial assumptions (e.g., negative mass) and made some predictions that were shown to be false. Phlogiston also fell victim to Occam’s razor (i.e., it unnecessarily complicated things).
Thus Trafimow and Earp make the error here (and in other “counterexamples” 5 ) of confusing poorly specified theory with well-specified, but incorrect theory. Phlogiston theory (like aether theory) played the part of all well-specified scientific theories: it specified the conditions of its refutation.
Auxiliary assumptions and the self-refuting thesis
Drawing on what appears a variant of the Duhem–Quine thesis—that is, the proposal that most scientific hypotheses only make testable predictions relative to the background assumptions (or auxiliary hypotheses) that tie them to the evidence—Trafimow and Earp (2016) argue that “even in the case where there is a clear theory to draw upon, it is important to remember that empirical predictions come from the combination of a theory and auxiliary assumptions rather than from a theory alone” (p. 542). They further assert that auxiliaries actually are more important to replication efforts than is theory proper: “experimental predictions depend much more strongly than Klein (2014) seems to appreciate on auxiliary assumptions, as opposed to on the theory proper” (Trafimow & Earp, 2016, p. 545)—although their reasons for this preferential ranking are never made clear.
Scholarly discourse on the precise nature of auxiliary assumptions is somewhat disputatious (even Quine and Duhem did not hold identical views—for example, Duhem, unlike Quine, maintained that his arguments did not apply to the social sciences [see also Note 3]; for reviews see Klee, 1997; Laudan, 1990). Accordingly, it is important to understand how Trafimow and Earp (2016) conceptualize auxiliaries. The authors, though brief, are admirably clear: An auxiliary is a “logical assumption that is required to link the theory to an actual observation” (p. 542). In effect, they are saying that to qualify as an empirically relevant and theoretically defensible auxiliary assumption, the auxiliary must be grounded in a logical relation to the theory under investigation. This helps avoid recruitment of ad hoc stipulations (i.e., auxiliaries lacking clear logical connection with the theory being tested—e.g., number of cubicles used, type of timing device employed, number of age-relevant words presented, academic status of the individual conducting the study, and so forth; for discussion, see Klein, 2014b) to insulate the theory from empirical refutation.
If a theory is an explanation of the cause of a phenomenon, the theory will include either statements that impose theoretically mandated conditions on the observance of the phenomenon, or impose an implicit default condition (e.g., this phenomenon can always be observed under any conditions operating at the time). A well-specified theory identifies these conditions as part of the explanatory reasoning of the cause of the observed phenomenon. These conditions need to be justified as a logical consequence of the theoretical structure being invoked—as would any theory-claim—and their alleged effect(s) tested as one might any other claim within that theory. 6 To avoid ad hoc stipulation, the “conditions-required-to-be-present-in-order-that-the-phenomenon-might-be-observed” need justification themselves, not simply “I assume such and such” (e.g., much as many in psychology say “I assume the variable is real-valued and continuous” with no explicit justification for why that should be so; Michell, 1999; Uttal, 2008). 7 That is, we cannot simply assume an auxiliary is applicable to a particular domain of inquiry: its relevance to the theory figuring in the replication effort must be logically justified. 8
To their credit, Trafimow and Earp include a non-ad hoc provision (2016, p. 543) that requires a theory be sufficiently well-specified (or, to use their terminology, “relevant”) in order that logical connections between theory and auxiliary can be forged. Unfortunately, this requirement has the consequence of refuting their initial thesis (i.e., that a well-specified theory is not necessary for a successful replication). Put differently, Trafimow and Earp present two theses: (a) well-specified theory is not necessary for replication and (b) well-specified theory is necessary for replication in virtue of its role in establishing the epistemic warrant of auxiliary assumptions. The falsehood of these two claims is guaranteed as the logical consequence of holding both to be true.
Footnotes
Acknowledgements
Special thanks go to Dan Robinson, Kirk Michaelian, Galen Strawson, Carl Craver, Tim Lane, Sven Bernecker, Byron Kaldis, Robert Klee, Charles Talieferro, Myra Schectman, Alba Papa-Grimaldi, and Paul Barrett for excellent comments and suggestions.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
