Abstract
Reviewers on manuscripts or grant proposals often react positively if authors use in-favor study techniques and negatively if authors use not-in-favor study techniques. A tacit assumption is that the in-favor technique is superior to alternate techniques. However, study techniques for theory testing depend on auxiliary assumptions that connect nonobservational terms in theories with observational terms in empirical hypotheses. Therefore, the extent to which a technique is useful depends on the theory and empirical hypothesis under investigation. A technique might be useful from one theoretical perspective and not useful from another theoretical perspective. Or a technique might successfully connect to one empirical hypothesis but not another. The present work threshes out some of the relevant philosophical issues.
Controversy in psychology is exciting. Basic researchers debate competing theories, applied researchers debate competing applications, and there are statistical debates that permeate both basic and applied research. But this is not to say that psychology researchers debate everything. It sometimes happens, depending on the area of psychology, that study techniques become in-favor or not-in-favor (Iso-Ahola, 2017; Marks & An, in press; Michell, 1999; Mischela, 1990; Peters & Crutzen, 2017; Richters, 2021; Runyan, 1983; Woodside, 2019). That to which a technique refers can be many entities. An in-favor technique could be a dependent variable, independent variable, study paradigm, measurement paradigm, or statistical paradigm that most reviewers or editors evaluate positively. In-favor and not-in-favor study techniques can be distinguished probabilistically. Manuscripts featuring in-favor study techniques are more likely than manuscripts featuring out-of-favor study techniques to be published.
One example concerns psychometrics, and whether it is permissible to use techniques originating from classical test theory or whether researchers should use more modern techniques based on item response theory, factor analysis, and so on. 1 Because modern techniques are more powerful, in the sense that they are capable of leading to stronger conclusions, it may seem obvious that they are preferrable to classical techniques. Indeed, there are relatively few classically based measurement articles in the last decade; most recent measurement contributions depend on a more modern technique or a mixture of more modern techniques. And yet, it is possible to argue that the power of modern techniques is a disadvantage, rather than an advantage, due to stronger assumptions that are less likely to be true. In the classical scheme, there is no necessity to assume an underlying trait whereas in modern schemes, there is such a necessity (Gulliksen, 1987; Hulin et al., 1983; Lord & Novick, 1968). If there is no underlying trait with respect to a particular modern measurement application, then that application is wide of the mark, and a classical perspective may be better. If one is unsure whether there is an underlying trait, and one can meet one’s goals the classical way or a more modern way, the classical way might still be preferred to reduce the chance of making a wrong assumption.
Moreover, an additional, though implicit, assumption modern techniques make is what Richters (2021) termed the qualitative homogeneity assumption. Although people might differ quantitatively with respect to a trait, everyone is qualitatively homogeneous in that the underlying psychological structures and processes function the same way for everyone. Richters provided a strong case against the qualitative homogeneity assumption (also see Molenaar, 2004). The larger point is not that classical techniques should be preferred over more modern ones, or the reverse, but rather that what to prefer need not be obvious and the answer may depend on idiosyncratic aspects of the research question and goal. It is a mistake to automatically assume that one measurement technique is better than another or to automatically reject manuscripts that use an out-of-favor measurement technique, such as a classical one. The present aim is to provide a nuanced philosophical exploration of that which renders, or does not render, one study technique superior to another. This discussion hopefully will increase (a) tolerance of not-in-favor techniques, (b) researchers’ willingness to use not-in-favor techniques, and (c) the ability of such researchers to defend against, or forestall, criticisms from reviewers for using not-in-favor techniques. To commence at a general level, let us briefly contrast three well-known philosophical perspectives: Kuhn, Popper, and Feyerabend.
Kuhn versus Popper versus Feyerabend
Of the three famous philosophers, Kuhn (1962) likely would be most favorable to insisting on in-favor techniques and Feyerabend least favorable. Kuhn’s normal scientist accepts basic dogma in the field, which would include the in-favor technique, because science is a co-operative enterprise that cannot function smoothly without shared perspectives. It is possible to interpret Kuhn descriptively, normatively, or both. Interpreting Kuhn descriptively, the need to share perspectives in the interest of co-operation is a sociology of science force that pushes researchers to accept basic dogma. Interpreting Kuhn normatively, scientists not only do accept basic dogma, but ought to do so, to facilitate co-operation between scientists. For present purposes, interpreting Kuhn descriptively is sufficient, though the incommensurability issue is perhaps more dramatic in the context of a normative interpretation.
The need to share perspectives, whether interpreted descriptively or normatively, is intensified by Kuhn’s use of the notion of incommensurability. The idea is that opposing theories do not share the same language, even if they share similar terms. For example, mass was used very differently by Newton (1642–1727) and Einstein (1879–1955), though neither defined it (Lederman, 1993). Because of the different meanings, there is no way to directly set the two theories against each other. However, it is possible to argue that Kuhnian incommensurability, even if true, is overstated because commensurability at the theoretical level is not necessary if it is there at the empirical level. Remaining with Newton and Einstein, even if they cannot agree on what mass means, they nevertheless can agree on a clock reading, which may be sufficient to distinguish between the two theories. And if incommensurability is less definitive than Kuhn had thought, that lack of definitiveness suggests that Kuhn may have overstated the need for shared perspectives and acceptance of basic dogma.
In contrast, Popper’s (e.g., 1959, 1963, 1972, 1983) emphasis on falsificationism departs significantly from Kuhn. In the Popperian scheme, the goal is to attempt to disconfirm theories. Either an attempt succeeds, in which case scientists can replace the falsified theory with a better one, or an attempt fails, in which case the theory gains verisimilitude—to use Popper’s term, it is corroborated (but not proved). Either way, science gains. From Popper’s perspective, the best technique to use is the one that stands the best chance of resulting in theory falsification. As the in-favor technique likely was used much previously—which is why it has achieved in-favor status—a Popperian might be skeptical that it would be particularly likely to result in theory falsification. In contrast, an alternate technique might stand a better chance. A Kuhnian counterargument might be that, due to the sociology of science, scientists would be unlikely to accept that an alternate technique really has successfully falsified the theory. This potential Kuhnian counterargument might be a reason for even a Popperian to remain with an in-favor technique. That said, if we were making bets, the present bet would be that Popper would suggest not worrying about the sociology of science and would rather have scientists use alternate techniques that would carry with them increased falsification likelihoods.
Feyerabend (1975, 1993) would likely be most antagonistic to researchers insisting on in-favor techniques, as he presented himself as favoring an anarchist view of science and rejecting that there are universal methodological rules. This viewpoint suggests it is downright silly to have in-favor techniques. If there are no universal a priori methodological rules, then there is no reason for in-favor techniques to have achieved in-favor status, and there is no reason for scientists to take those techniques more seriously than other techniques. A common criticism of Feyerabend is that, with an insistence of a lack of universal methodological rules, there is no way to demarcate between science and nonscience. A Popperian, for example, with falsifiability as the salient demarcation principle, would not be tolerant of what might be considered a too-loose approach. And a Kuhnian likely would argue that science cannot work without consensus, which, taking Feyerabend at face value, would not be able to be achieved if scientists adopted his anarchic approach. In fairness to Feyerabend, however, he did an admirable job of showing historical examples where scientists went against in-favor techniques, to the blatant betterment of science (Feyerabend, 1993). Therefore, it is possible for a Feyerabendian to admit to Feyerabend’s inability to demarcate science from nonscience, but nevertheless claim that the demarcation is not strictly necessary.
There is a sociology of science component to Feyerabend’s (1993) writing. In his description of the accomplishments of Galileo Galilei (1564–1642), Feyerabend took pains to describe how there were good reasons, at the time, not to believe the Galilean scheme. For example, there was a lack of observed parallactic effects that are a consequence of Galileo’s insistence that the Earth orbits the Sun. Secondly, if the Earth moves, it seemed strange that nobody can feel it. Galileo explained the lack of parallactic effects by contending that the stars are too far away. 2 And Galilean relativity—the Earth is not moving from our frame of reference because we are moving with it—explains why nobody feels the Earth move. However, the extreme distance of the stars did not seem credible at a time when the universe was considered much smaller than it is considered today, nor did Galileo’s mostly unsupported relativity notion seem convincing yet. Thus, Galileo’s scientific victories were not solely based on scientific grounds; he needed a sociology of science victory to aid in achieving a scientific victory.
As Feyerabend (1993) emphasized, Galileo was a master of rhetoric; his famous book, Dialogue Concerning the Two Chief World Systems (Galilei, 1632/1953), was a rhetorical masterpiece. 3 In this book, Galileo presented a conversation between three people. One represents his own thinking, one represents conventional thinking, and a third is an intelligent layperson, who just happens to ask all the right questions. Galileo’s argument, through this rhetorical device, is both entertaining and convincing. Hence, some of Galileo’s success was due to his rhetorical ability, that aided him in overcoming credible objections. This is not to say Galileo was not a great scientist. Feyerabend (1993) acknowledged Galileo’s greatness. However, Feyerabend argued, too, that Galileo’s scientific brilliance might not have won the day had it not been for Galileo’s rhetorical ability that enabled him to excel with respect to the sociology of science. It was, in important part, through rhetoric that Galileo obtained a sociology of science victory, which eventually became a scientific victory too.
The present goal is not to claim that any of the three perspectives is best, as that would require its own article. Rather, the more limited goal in bringing up Kuhn, Popper, and Feyerabend, in the context of a discussion about technique, is to illustrate the relevance of general philosophical perspectives to making decisions about remaining with in-favor techniques or exploring not-in-favor ones. A goal is to foster scientists thinking more philosophically about methodological issues and making decisions based on more sophisticated and nuanced epistemological stances.
Auxiliary assumptions
Theory tests do not come from theories alone, but also from auxiliary assumptions. Halley’s comet provides a dramatic example. Edmond Halley (1656–1742) suggested that various appearances of comets that were a matter of historical record were really reappearances of the same comet (Grewing et al., 1988). Halley used Newton’s (1642–1727) theory to predict that the next reappearance would be in 1758. However, Halley’s prediction did not depend solely on Newton’s theory, but on assumptions about the accuracy of recorded observations, assumptions about the effects of gravitationally relevant astronomical bodies, and others. These various additional assumptions are often termed auxiliary assumptions. Thus, it would be naïve to say that Halley’s spectacular prediction provided a pure test of Newton’s theory because it did not. Rather, the prediction provided a test of the combination of Newton’s theory and auxiliary assumptions that Halley added to Newton’s theory.
Well, then, suppose that Halley’s prediction had failed. The empirical defeat would not have automatically falsified Newton’s theory because it could have been attributed to a wrong auxiliary assumption (Duhem, 1906/1954). Nor does the empirical victory that Halley obtained prove Newton’s theory true because empirical victories can be attributed to auxiliary assumptions, just as empirical defeats can be so attributed (Trafimow, 2017a). The major point, then, is that auxiliary assumptions are crucial for testing theories.
Auxiliary validity and the connective function of auxiliary assumptions
A large literature on validity includes different kinds of validity such as face validity (Gravetter & Forzano, 2012), content validity (Lawshe, 1975), predictive validity (Lynn, 1982), construct validity (Cronbach & Meehl, 1955; Kane, 2001), and others. But the type of validity that is of most relevance here is auxiliary validity that applies to both manipulations and measures (Trafimow, 2012).
The notion of auxiliary validity came from an explicit realization that theories contain nonobservational terms whereas empirical hypotheses contain observational terms. Auxiliary assumptions fulfill the role of establishing initial conditions (Hempel, 1965). An example might be that random assignment of participants to conditions renders the two groups equivalent (or close enough) on all causally relevant variables. However, another function auxiliary assumptions fulfill is to connect nonobservational terms in theories and observational terms in empirical hypotheses. It is this latter function that is of present emphasis.
Consider another Newton example: force = mass × acceleration. The Nobel Laureate, Leon Lederman (1993), considered this the most important equation in the history of physics, despite Lederman’s emphasis on the fact that Newton never defined any of the terms in the equation, particularly mass. Mass is a nonobservational term that should not be confused with weight, which is observational. The difference becomes obvious upon considering that the same object would have very different weights, but the same mass, on the earth or the moon. Nevertheless, it is possible to make auxiliary assumptions that connect weight with mass, so that data pertaining to weights can be used to draw theoretical conclusions about masses.
Of course, the distinction between nonobservational and observational terms is not as clear-cut as the example suggests (Quine, 1951; Trafimow, 2012). And perhaps a better way would be to describe terms as being relatively-more-nonobservational or relatively-more-observational, but gray areas are not of present interest, so we shall remain with the simple nonobservational–observational distinction. To put this in a psychological context, whereas there is much disagreement about the nature of attitudes (nonobservational term), there is much consensus that if a person chooses “3” on a scale that purports to measure attitude, the “3” really is a “3”—participant scale choices are observable. What the “3” actually means is, of course, another matter that relates to arguments about the validity of the measure.
Once the cruciality of auxiliary assumptions for bridging the gap between nonobservational terms in theories and observational terms in empirical hypotheses is rendered salient, it follows that the validity of techniques, which provide the basis for empirical hypotheses, depends on the quality of the auxiliary assumptions on which they are based; that is, auxiliary validity (Trafimow, 2012). Put simply, techniques are no better than the auxiliary assumptions upon which they depend. And the moment has come to face an important question squarely: On what is the truth of auxiliary assumptions based?
A return to theories and empirical hypotheses
As auxiliary assumptions connect nonobservational terms in theories and observational terms in empirical hypotheses, it may be tempting to conclude that the truth of theories and empirical hypotheses is crucial for the truth of auxiliary assumptions that connect them. However, this conclusion is false.
To see the falsity, consider a Geiger counter that provides a technique for measuring radiation. Suppose that a theory specifies that an entity to which a construct refers causes radiation. The empirical hypothesis is that a particular manipulation should cause more radiation to appear in the experimental than in the control condition. Now, suppose that the theory and the empirical hypothesis are both false. The dual falsities do not invalidate the auxiliary assumption that the Geiger counter validly measures radiation. It is possible, of course, that the empirical defeat is due to a defective Geiger counter, but it also is possible that the theory and empirical hypothesis are wrong. More to the present point, the truth of auxiliary assumptions need not depend on the truth of theories and empirical hypotheses that auxiliary assumptions connect. Rather, the truth of auxiliary assumptions is subjunctive: if radiation were present, the Geiger counter would give a reading. If our hypothetical researcher subsequently performed independent tests of the Geiger counter and showed that it gives readings under conditions where radiation is known to occur, that would constitute an argument that the original empirical defeat is due to the theory being wrong or due to another auxiliary assumption being wrong (say, that the radiation-creating apparatus did not work correctly), and not due to a defective Geiger counter.
However, denying that auxiliary validity depends on the truth of theories or empirical hypotheses fails to imply that theories and empirical hypotheses are irrelevant to the usefulness of techniques. Consider again that techniques depend on auxiliary assumptions that connect nonobservational terms in theories to observational terms in empirical hypotheses. Clearly, then, the terms in theories and auxiliary assumptions that serve as terminals for the connections, will have to matter. As an analogy, American Airlines flight 8080 connects Chicago, Illinois and El Paso, Texas. But if one wished to fly from El Paso to Dallas, flight 8080 would be insufficient and a different flight would be necessary. American Airlines flight 8080 is fine for the Chicago–El-Paso connection, but it does not work for the El-Paso–Dallas connection. Returning to theories and empirical hypotheses, consider again the connection between attitude and a particular attitude measure. A useful auxiliary assumption might be that check marks on the measure really do measure attitude. But note that the auxiliary assumption must contain, for example, the term attitude or it cannot connect the theoretical construct with the empirical hypothesis.
Considering the two foregoing paragraphs together implies at least two items. First, the truth of auxiliary assumptions does not depend on the truth of the theory or the truth of the empirical hypothesis. Second, the usefulness of a set of auxiliary assumptions nevertheless depends strongly on the theory and on the empirical hypothesis, and researchers should not confuse truth with usefulness. It is impossible for a set of auxiliary assumptions to bridge the gap between terms in theories and terms in empirical hypotheses, unless the set of auxiliary assumptions actually contains those terms. This second implication does not necessitate that any particular auxiliary assumption contains all relevant terms so long as the set does. To see this, suppose that the theory contains the nonobservational term A and the empirical hypothesis contains the observational term D. An auxiliary assumption connects A and term B, another auxiliary assumption connects term B and term C, and a third auxiliary assumption connects term C and term D. In that case, although none of the auxiliary assumptions is sufficient to connect nonobservational term A and observational term D, the set of auxiliary assumptions, taken as a whole, is sufficient to make the connection.
Two psychology examples and a biology example pertaining to technique utility
There are consequences of the fact that auxiliary assumptions connect theories to empirical hypotheses. The main consequence of interest here is that it is extremely difficult to show that one technique is superior to another technique. That techniques become in-favor in particular areas of psychology need not indicate that the techniques are wonderful. And that particular techniques are not-in-favor need not indicate that they are not wonderful.
Suppose that a particular technique is used, with its associated auxiliary assumptions, to connect a theory and empirical hypothesis. Now, suppose that a different technique, with its associated auxiliary assumptions, connects a different theory and different empirical hypothesis. It is not clear which technique is superior. It would be quite reasonable to say that one technique is superior for testing one theory–empirical hypothesis pair whereas the other technique is superior for testing the other theory–empirical hypothesis pair.
Now, let us suppose that, as seems quite common in some areas in psychology, a technique reaches in-favor status. It is deemed generally superior. Let us consider what it would take to justify that status.
Because the basic research goal is to test theories, to say that a technique is generally superior, within an area of psychology, would be to say that the technique is better than all competing techniques for connecting all theories in that area with all potential empirical hypotheses. Stated baldly, this is blatantly ridiculous. But perhaps the notion can be saved by recognizing that the auxiliary assumptions associated with the different techniques are embedded in larger sets of auxiliary assumptions that connect theories to empirical hypotheses. Thus, it is possible to make a less blatantly ridiculous case by employing a modification. Specifically, one could argue that a technique is generally superior because it is more capable than competing techniques of being embedded in larger sets of auxiliary assumptions that connect theories with empirical hypotheses. Thus, it is the quality of “ability to be embedded in many large sets of auxiliary assumptions” that confers superiority onto one technique relative to competing techniques.
Although the modification may seem reasonable, there are two main difficulties. Even moving to large sets of auxiliary assumptions does not obviate that researchers would still need to know that the large sets that contain the ostensibly superior technique work better than sets that contain competing techniques, for every potential theory and potential empirical hypothesis in the area. To say this is unlikely would be an understatement.
Secondly, there is the issue of generalizability. Researchers want their theories to be generalizable, but one of the questions that can be asked is: across what do you wish the theory to generalize? Using a taxonomy of assumptions recently published by Trafimow (2019a), one domain across which theories might be said to generalize is auxiliary assumptions or sets of auxiliary assumptions. Under the ceteris paribus (all else is equal) condition, theories that consistently make correct predictions when combined with many different sets of auxiliary assumptions are to be preferred over theories that do not. Theories that are thusly generalizable instill confidence that the empirical victories really are due, in important part, to the soundness of the theory. In contrast, when combining theories with alternative sets of auxiliary assumptions results in incorrect predictions, the empirical defeats indicate more reason to question the soundness of the theory. Of course, the ceteris paribus condition might not apply (Trafimow, 2019b); but that is a different matter to be addressed later. For now, let us continue to three examples.
Two different attitude techniques
There has long been controversy about how to define attitudes. One popular definition has been the tripartite view; that attitudes have affective, cognitive, and behavioral components (see Augoustinos et al., 2014, for a review). An alternative conception is that attitudes are evaluations of behaviors (see Fishbein & Ajzen, 2010, for a review). We might question whether techniques based on the tripartite versus evaluative view are better.
Historically, the common assumption has been that attitudes are supposed to be good predictors of behaviors as almost all attitude theories include an attitude–behavior link, albeit not necessarily directly. But in the 1960s, researchers repeatedly failed to obtain respectable attitude–behavior correlations, and the crisis culminated in a famous review by Wicker (1969) who showed that unimpressive attitude–behavior correlations were the rule rather than the exception. As attitudes had generally been considered the most important construct in social psychology (Allport, 1935), Wicker’s review precipitated a crisis in the field.
There were two solutions. One solution was to continue to measure attitudes in the traditional way, but to invoke particular experimental manipulations. Thus, for example, Fazio (1990a) showed that rendering attitudes more accessible increases attitude–behavior associations. The other solution was to go with the definition of an attitude as an evaluation of a behavior and then note that behaviors have four elements: target, action, time, and context. By insisting that attitude measures must correspond with behavior measures with respect to the four elements, Fishbein (1980; Fishbein & Ajzen, 1975, 2010) obtained attitude–behavior correlations that had previously been considered impossible to achieve. So, which technique is better?
It depends. If one is willing to make the theoretical commitment that an attitude is an evaluation of a behavior, then Fishbein’s evaluation-based technique clearly results in impressive correlations. But if one makes a theoretical commitment to the tripartite attitude view, then the evaluation-based measure is wrong, by definition, no matter the strength of the correlations that are obtained. In fact, from a tripartite perspective, it is possible to criticize impressive attitude–behavior correlations obtained by the evaluation-based technique by insisting that correspondence in measurement results in duplicate measures of the same construct! Hence, what seem to be impressive correlations are not impressive, after all. The point is not that one technique is right and the other wrong; rather, it is that one technique is clearly more compatible with one theoretical perspective whereas the other technique is more compatible with the other theoretical perspective. Thus, technique superiority is relative; it can depend on one’s theoretical commitment.
The technique of reaction time versus the technique of free recall
In cognitive psychology and social cognition, reaction time has achieved in-favor status (Fazio, 1990b). If a researcher wishes to publish a manuscript, and fails to use the technique of reaction time, reviewer criticism is a likely outcome. And yet, it is instructive to consider a multiexperiment article by Srull et al. (1985) in a prestigious cognitive journal, Journal of Experimental Psychology: Learning, Memory, and Cognition. Srull et al. wished to test an associative network theory of person memory, where a crucial assumption was that items incongruent with a prior expectancy about a person are more difficult to understand than congruent items, and therefore incongruent items receive more processing. In particular, to understand incongruent items, participants associate them with both congruent items and other incongruent items whereas congruent items are associated with incongruent items but not with other congruent items. Consequently, two predictions ensue. Because incongruent items have more associative pathways leading to them than do congruent items, incongruent items should be better recalled than congruent items. Secondly, because congruent items are directly connected with incongruent items, but not with other congruent items, it should take more time to proceed to a congruent item from another congruent item than from an incongruent item. Srull et al. confirmed both predictions.
But was the free recall technique superior to the reaction time technique or was the reverse true? Note that, unlike the attitude example, although the empirical hypothesis differed, the theory is the same in both cases. That is, both techniques were under the assumption that incongruent items have direct associative pathways to other incongruent items and to congruent items whereas congruent items only have direct associative pathways to incongruent items.
Another interesting characteristic of the Srull et al. (1985) research is that the findings contradict a popular theory at the time that invoked the notion of cognitive schemas. From a schema perspective, congruent items are preserved in the schema, with two consequences. The preservation of congruent items in cognitive schemas should lead to them being recalled better than incongruent items. Secondly, to activate a congruent item subsequently to activating another incongruent item, it is merely necessary for participants to mentally read down the schema, and so the time taken to recall congruent items adjacently to each other should be short. Srull et al.’s recall data falsify the first prediction and their reaction time data falsify the second prediction. Whether we focus on how the Srull et al. (1985) data support their own associative network theory, or how they disconfirm a popular competing theory, it remains unclear why reaction time is superior to free recall or the reverse.
In the Srull et al. (1985) case, the two techniques are not competing at all, but rather are complementary. That both techniques resulted in correct predictions lends more generalizability to the theory than would either technique alone. A consequence of this increased theory generalizability is that the overall case is rendered stronger, not weaker, by including techniques that are not-in-favor. Nevertheless, the increase in theory generalizability has its limits. Consider, for example, that the theorized underlying cause of the associative pathways is the cognitive processing devoted to understanding the incongruent items. An alternative possibility is that people attempt to update their impressions of the target person, and it is during the process of updating that they form the hypothesized associations. According to this updating notion, incongruent items suggest that more updating is needed relative to congruent items, and so incongruent items are checked against both congruent items and other incongruent items. Both the free recall and reaction time evidence are as consistent with the updating notion as with the original notion pertaining to understanding incongruent items. By employing yet a third technique—measuring impression change—Trafimow and Porter (1997) obtained evidence that was more consistent with updating than with understanding, thereby demonstrating the interpretive complexities that come with careful thinking about alternative possibilities and multiple study techniques.
Boiling and contamination in spontaneous generation studies
In the first example pertaining to attitude research, we saw that the best technique depends on what one assumes. In the second example pertaining to person memory, we saw that there is no best technique but rather that techniques could be complementary. In addition, the introduction of a third technique can overturn the implications of two previous and complementary techniques. It is interesting to step outside of psychology, to the issue of spontaneous generation, to demonstrate a third point. Sometimes auxiliary assumptions—and the techniques that depend on them—are plain wrong.
At least since the ancient Greeks, there had been argument about whether life stems only from life or whether it could be created spontaneously from nonlife (spontaneous generation). Moving to the 18th century, let us consider the work by John Turberville Needham (1713–1781). He boiled open flasks containing broth mixtures (or later tainted wheat) to kill any existing microorganisms. The flasks were subsequently allowed to cool in the open air and sealed. Within a few days, microorganisms were present, thereby seeming to demonstrate spontaneous generation (Levine & Evers, 1990).
In contrast, later work by Lazzaro Spallanzani (1729–1799) and in the following century by Louis Pasteur (1822–1895), using different techniques that featured longer boiling times and better protection against contamination, failed to result in the presence of microorganisms. These researchers argued against spontaneous generation (Levine & Evers, 1990).
It is now known that the boiling times used by Needham were insufficient to kill all microorganisms, nor did Needham guard sufficiently against contamination. The auxiliary assumptions he had used, that boiling killed existing microorganisms and that the postboiling procedure prevented contamination, are now known falsities. Depending on the nature of the solution, boiling times might need to be quite long to kill all existing microorganisms. Thus, longer boiling times are better than shorter boiling times, and there is no ambiguity. Moreover, letting flasks cool in the open air promotes contamination, and there is no ambiguity here either. Needham’s auxiliary assumptions were plain wrong thereby rendering his technique plain wrong too (Levine & Evers, 1990).
Discussion
The examples demonstrate that the in-favor technique need not be best for all purposes, and for a variety of reasons. The in-favor technique might not be consistent with an alternative theoretical perspective, as we saw in the attitude example. Or the in-favor technique, even if it remains useful, might work best when supplemented with another technique, in which case a technique that might be considered competing is better considered complementary. Moreover, the implications of complementary techniques can be overturned by introducing yet a third technique. Then, too, the spontaneous generation case shows that auxiliary assumptions of an in-favor technique might be plain wrong. And returning to the example of classical versus more modern measurement, it is possible that a technique can be in-favor because of stronger assumptions that, although they facilitate stronger conclusions, are more likely to be wrong. A weaker not-in-favor technique may be preferable if it fulfills the researcher’s goals, despite its relative weakness, due to the decreased probability of containing a wrong assumption. These considerations push in the direction of tolerance towards multiple techniques, including not-in-favor ones. However, the call is not Pollyannaish. Although the spontaneous generation example shows that sometimes techniques are plain wrong because the auxiliary assumptions are wrong, even here there is a caveat. Suppose that Needham’s goal had not been to support spontaneous generation, but rather to demonstrate subtle contamination effects. In that case, his technique might deserve a more positive evaluation. And there may be other complications, to which we turn in the following two subsections.
When the ceteris paribus condition does not apply
We have already seen that, under the ceteris paribus condition, demonstrating the ability of theories to generalize across different sets of auxiliary assumptions is desirable, but with a discussion of what happens if the ceteris paribus condition does not apply postponed. Let us consider that now in the context of falling objects. According to Aristotle (384–322), the reason objects fall is because it is in their nature to fall; the heavier the object, the more of this nature it has, and the faster it falls (Wicklund, 1990). In essence, as Wicklund (1990; also see Lewin, 1931) summarized, this is a trait theory of falling objects. Although the ancient Greeks were not addicted to formal experimentation, it is not difficult to imagine a counterfactual world where they were so addicted, and performed numerous experiments dropping heavier (e.g., spears, shields, etc.) and lighter objects (e.g., feathers, papyrus scraps, etc.) from the same height at the same time to determine which would first contact the ground. These experiments could be performed using drops of different heights, in different locations, and so on. It is not difficult to imagine that Aristotle’s theory would generalize across the many different sets of implied auxiliary assumptions (Trafimow, 2019b). Under the ceteris paribus assumption, we could assert that Aristotle’s theory would have been well-supported.
However, as Galileo (1564–1642) figured out millennia later, the ceteris paribus assumption is wrong in this case. The many different sets of auxiliary assumptions have a common flaw, which is a failure to account for the interaction between atmosphere and object characteristics. Galileo performed experiments rolling balls of different weights down inclined ramps, which had two advantages. One advantage was that Galileo’s procedure mostly eliminated the interaction between atmosphere and object characteristics. A second advantage was to facilitate time measurement, which was not precise in Galileo’s time, by slowing the speed of descent to accommodate the limited means then available for measuring the passage of time. Eventually, between the principles of inertia and Galilean relativity, Galileo succeeded in dramatically improving our understanding of the universe. Aristotle was wrong and Galileo was right, or at least, importantly, less wrong (Asimov, 1966).
The example of falling objects shows that, although the ability of a theory to generalize across sets of auxiliary assumptions is positive under the ceteris paribus condition, it can be highly misleading if the ceteris paribus condition does not apply. The recognition that the ceteris paribus condition might not apply does not save researchers who wish to promote an in-favor technique at the expense of not-in-favor techniques. When the ceteris paribus condition fails to apply, to “save” the in-favor technique, there needs to be an implicit assumption that the failure is a convenient failure; that is, it is a failure in the exact way necessary to sustain the validity of the in-favor technique or contradict the validity of an alternative technique. Why should we assume this?
What if one technique really is better for the goal at hand?
Imagine five study techniques, but technique A is better than techniques B, C, D, or E in two senses. In the first place, there is impressive independent evidence that the auxiliary assumptions associated with it are true whereas this independent support is lacking for the others. Secondly, technique A is clearly related to the theoretical term of interest, through its associated auxiliary assumptions, whereas this connection is less clear with respect to the other techniques. Because of these two strong advantages, technique A has achieved in-favor status whereas alternative techniques are not-in-favor.
Despite foregoing comments touting the consideration of not-in-favor techniques, there are nevertheless times when reviewers and journal editors are justified in being insistent upon an in-favor technique. Alternate techniques that are not clearly tied to the theoretical term of interest by appropriate auxiliary assumptions are not capable of providing strong theory tests as strong theory tests demand strong connections to the theory. Moreover, if there really are plausible grounds for doubting that the associated auxiliary assumptions are true; say, there are good reasons to believe the Geiger counter does not work; then the ostensible benefits are compromised.
But a sword cuts in both directions. If aficionados of an in-favor technique have the right to question alternate techniques on grounds such as poor connection to theoretical terms or lack of independent evidence that the associated auxiliary assumptions are true, then an adopter of an alternate technique likewise has the right to question the in-favor technique on both grounds. How well does the in-favor technique really connect to the theoretical term of interest? Why should researchers believe that the auxiliary assumptions associated with the in-favor technique are true? A strong suspicion is that many in-favor techniques across various psychology domains would not fare well if subjected to intense questioning along these lines. Of course, sociology of science factors favor applying such intense questioning to not-in-favor techniques more than in-favor techniques, though whether this trend is desirable hearkens back to the foregoing discussion about Kuhn, Popper, and Feyerabend.
Conclusion
Possibly, the few logical positivists who still exist would disagree about the importance of auxiliary assumptions. From an extreme logical positivist perspective, one defines a construct by how it is manipulated or measured and so what we consider to be auxiliary assumptions are already included in the operationalizations. There are crucial problems with logical positivism that others have reviewed at length (Suppe, 1977). For example, an early version of logical positivism eschewed theory in favor of observation. However, as it became increasingly clear that theory is necessary in science, the argument was modified to allow theory, but to insist that the meaning of theoretical terms is defined by how they are operationalized. For example, what it means to heat a pan of water would be different depending on whether the researcher uses an electric burner, a campfire, and so on. Although Suppe included convincing arguments against this view, Peters and Crutzen (2017) recently suggested a psychology version by insisting that psychological constructs should be defined by how they are measured. However, a problem with this insistence is the consequence that all discussion about validity goes out the window, as Trafimow (2017b) illustrated with the following quotation: As an example, if I operationalise attitude towards exercising with the silly item, “I like/dislike red pillows,” it nevertheless is valid because attitude is defined by its operationalisation. By taking the surplus meaning out of nonobservational terms in theories, there is no straightforward way to make the obviously true point that “I like/dislike exercising” is a more valid measure of attitude towards exercising than the pillow item. (p. 123)
Or, put in terms of technique, a strictly logical positivist perspective necessitates that there is no way to distinguish the validity of one technique from the validity of another, because any technique that is used is correct by fiat.
But if we reject extreme logical positivism, as practically all philosophically sophisticated people do today, then that rejection necessitates that there are theoretical terms that have surplus meaning above and beyond what can be captured by any single measurement technique (operationalization). And once that is admitted, the relevance of auxiliary assumptions to connect theories and empirical hypotheses is clear. Nor are matters changed importantly by subscribing to a more sophisticated view of the observational–nonobservational distinction. As we suggested earlier, although it is possible to proffer examples where the extent to which a term is observational or nonobservational is unclear, there is no necessity to dwell in the gray areas. Just as the admission that one does not know where to draw the line between an acorn and an oak tree does not obviate the distinction between acorn and oak tree, the admission that there are gray areas with respect to the observational–nonobservational distinction does not obviate that distinction. All that is required is a difference between a term in a theory and a corresponding term in the empirical hypothesis, and the importance of auxiliary assumptions in bridging the gap is undeniable.
In summary, arguments about study techniques come down to arguments about auxiliary assumptions, or at least should do so. In turn, arguments about auxiliary assumptions must consider not only the auxiliary assumptions themselves, but also the theory and the empirical hypothesis they connect. This is not to say that the truth of auxiliary assumptions depends on the truth of theories and empirical hypotheses, as this conclusion had been demonstrated in a previous section to be wrong; but theories and empirical hypotheses nevertheless strongly influence the usefulness of auxiliary assumptions and the techniques with which they are associated. Hopefully, the present philosophical perspective will stimulate researchers to consider more nuanced arguments about the techniques they use, or consider using, that takes theories, empirical hypotheses, and the auxiliary assumptions that connect them, into account. An additional hope is that the present perspective will provide researchers who wish to use alternative techniques with an argument to respond to, or even forestall, criticisms about using a technique that is not-in-favor. The argument can take the form of showing that the in-favor technique depends on auxiliary assumptions that fail to adequately connect nonobservational terms in the theory with observational terms in the empirical hypothesis, whereas the not-in-favor technique does a better job. Or the argument can take the form of showing that one or more auxiliary assumptions of the in-favor technique are likely false whereas the out-of-favor technique suffers less from this problem. Either way, the researcher who wishes to use a not-in-favor technique might successfully address potential reviewer concerns pertaining to technique, thereby increasing the probability of eventual publication.
Footnotes
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
