Abstract
Causal inference via process tracing has received increasing attention during recent years. A 2 × 2 typology of hypothesis tests takes a central place in this debate. A discussion of the typology demonstrates that its role for causal inference can be improved further in three respects. First, the aim of this article is to formulate case selection principles for each of the four tests. Second, in focusing on the dimension of uniqueness of the 2 × 2 typology, I show that it is important to distinguish between theoretical and empirical uniqueness when choosing cases and generating inferences via process tracing. Third, I demonstrate that the standard reading of the so-called doubly decisive test is misleading. It conflates unique implications of a hypothesis with contradictory implications between one hypothesis and another. In order to remedy the current ambiguity of the dimension of uniqueness, I propose an expanded typology of hypothesis tests that is constituted by three dimensions.
Introduction
Causal inference via process tracing has received increasing attention during recent years. A 2 × 2 typology of hypothesis tests, developed by van Evera (1997) in a general discussion of social science methods, and rediscovered by the recent literature on process tracing (Bennett 2008, 2010; Collier 2011; Mahoney 2012), takes a central place in this debate. The typology is constituted by two dimensions related to the observable implications that one can derive from a hypothesis. The first dimension is called certitude or certainty and captures how likely it is to confirm a specific observable implication in process tracing. The second dimension is called uniqueness and asks whether an observable implication can be derived from a single or multiple hypotheses. 1 In a categorical view, the intersection of high and low certitude with uniqueness and nonuniqueness yields four tests that allow the derivation of inferences about a working hypothesis and rival hypotheses, conditional on whether the working hypothesis passes or fails the test (Bennett 2010; Collier 2011).
The 2 × 2 typology adds valuable insight and rigor to hypothesis testing via process tracing. However, a review and discussion of the typology in this article demonstrates that its role for causal inference can be improved further in three respects. First, there has not been a comprehensive discussion as to whether and how case study researchers can choose between the four tests. This is an important matter because each of the tests has different implications for causal inferences. Empirical researchers need to be equipped with knowledge about how to craft their design in order to realize one type of test or the other. One major aim of this article is to formulate case selection principles for each of the four tests.
Case selection related to the dimension of certainty has received some attention in the past (Bennett 2008), while there has been virtually no explicit reflection on the dimension of uniqueness. 2 In the discussion of case selection, particular attention is directed to two salient problems with the current interpretation of testing unique implications. The solution of both problems represents the second and third way in which this article seeks to improve the typology.
Second, it is essential to distinguish between the theoretical uniqueness and empirical uniqueness of an observable implication. Theoretical uniqueness is present when there is only one hypothesis that yields a given prediction in principle. Empirical uniqueness is present when we can only test one hypothesis on the basis of a specific case. If this requirement is met, it is evident that empirical uniqueness is sufficient for generating an unambiguous inference for a hypothesis. However, I demonstrate that empirical uniqueness is not necessarily the best criterion to pursue. For some constellations involving a working hypothesis and a rival proposition, it is only possible to produce inferences on both when the selected case lacks empirical uniqueness. Consequently, establishing empirical uniqueness in such process-tracing studies unnecessarily constrains the range of inferences we can make and effectively limits causal inference to the working hypothesis.
Furthermore, I show that failure to distinguish between the two forms of uniqueness is unfortunate because there is no unequivocal link between them. The consideration of empirical uniqueness alone is deficient because the test of an empirically unique implication does not convey all relevant information for causal inference on the hypotheses under scrutiny. On the other hand, an exclusive focus on theoretical uniqueness is inappropriate because it does not automatically ensure unambiguous causal inferences in the empirical analysis. A consideration of theoretical and empirical uniqueness should always go hand in hand and systematic case selection is the instrument linking the two in ways described in this article.
The variety of uniqueness-related constellations that one can confront is systematized by clarifying whether two hypotheses center on exactly the same cause, two exclusive causes, only one of which can be present in a single case (country is rich or low, person male or female, etc.), or two nonexclusive causes (country is rich and democratic, person is male and old, etc.). The same distinction is made for the hypothesized outcome and the underlying mechanism. The distinction between identical, exclusive, and nonexclusive causes, outcomes, and mechanisms highlights the variety of designs in comparative hypothesis testing and the complex interplay between theoretical uniqueness, empirical uniqueness, and the generation of causal inferences.
Building on this systematization, the third direction in which the typology is improved is specifically related to the so-called doubly decisive test. The literature attaches strong implications to the doubly decisive test combining high certitude with uniqueness. In the standard reading of the test, confirmation of one hypothesis automatically disconfirms the competing propositions (Bennett 2010:211; Collier 2011:825). A close examination of the dimension of uniqueness shows that this reading conflates the uniqueness of a proposition with mutually exclusive implications made by a working hypothesis and a competing hypothesis. The elaboration of multiple designs in which a working hypothesis and competing hypothesis can be involved highlights that unique implications do not necessarily entail contradictory implications.
Based on the insight that the doubly decisive test intermingles two separate dimensions, I propose an expanded typology of hypothesis tests that is constituted by three dimensions and enhances case selection and causal inference: certainty, uniqueness, and mutual exclusiveness. Although a 2 × 2 × 2 typology is not as handsome as a 2 × 2 typology, the expansion is necessary in order to remove the current ambiguity inherent in the latter. Throughout the article, principled methodological arguments are illustrated with empirical examples from various fields of research in the social sciences (without the intention to make any substantive contribution to any of the studies that is addressed).
The 2 × 2 Typology of Hypotheses Tests
The typology of hypothesis tests recently rediscovered in the case study literature dates back to van Evera’s (1997:30-32) distinction between the uniqueness and certitude or certainty of an observable implication that is derived from a hypothesis. “Observable implication” can refer to anything that can be derived as a prediction from a theory. In the context of process tracing, an observable implication often refers to what is now called a causal process observation (CPO, see Collier, Brady, and Seawright 2004). 3 CPOs are only observations and do not stand for themselves. They can be tied to the cause of interest, the outcome, the mechanism linking cause to outcome, or to additional causal relationships that should hold true if the causal relationship of prime interest holds true (Mahoney 2010). Ziblatt’s (2009) analysis of electoral fraud in nineteenth-century Germany is an example for the various roles of CPOs. He hypothesizes that electoral fraud in an electoral district is becoming increasingly likely as landholding inequality rises. The so-called capture mechanism that accounts for this link is that landowners capture the local administration in charge of the elections. The rationale is that landowners feel threatened by democratization because it undermines the power and status they derive from their wealth. Ziblatt measures the outcome, electoral fraud, by consulting and coding petitions to the German Reichstag charging electoral fraud in a specific district (Ziblatt 2009:5-6). 4 In process tracing, the capture mechanism is then accessed via the collection of CPOs such as the observation that a landowner, who was head of the local administration, urged owners of pubs to avoid offering their rooms for meetings of prodemocratization parties (Ziblatt 2009:16). An example of an auxiliary outcome substantiated with CPOs is that landowners align with other groups that have an equal interest in hindering democratization through the manipulation of elections. Ziblatt delivers CPOs that representatives of the incumbent conservative party, who felt threatened by the transition process and the Social Democratic Party, collaborated with the landowners (Ziblatt 2009:16). 5 CPOs thus can serve measurement purposes and allow us to generate causal inferences on causal effects and causal mechanisms. Following the existing literature on the 2 × 2 typology, I focus on empirical studies and examples that invoke process tracing in order to collect CPOs substantiating the claim that a cause is connected to the outcome and/or that a causal mechanism is operative. 6
The Ziblatt study can also be used to illustrate the logic behind the 2 × 2 typology constituted by two dimensions referring to different aspects of an observable implication. The dimension of uniqueness denotes whether a given observable implication is specific to one explanation or can be derived from multiple hypotheses. 7 In Ziblatt’s research, both uniqueness and nonuniqueness are given on the level of mechanisms. The capture mechanism entails that a landowner occupies the leading position in the local administration (Landrat) and uses his position to manipulate the elections. This expectation cannot be derived from the social power mechanism, which is the mechanism against which Ziblatt tests his preferred capture mechanism. According to the social power mechanism, landowners use their social power to threaten and coerce dependents to not vote at all or at least not in favor of prodemocratization parties (Ziblatt 2009:3). Since the exertion of social power does not require it to become head of the local administration, the two mechanisms yield unique predictions. This differs for the conservative party mechanism as it entails the same observable implication as the capture mechanism. 8 Conservative parties were the parties of the landed elites (Ziblatt 2009:9). This means the political stakes for the conservative party were particularly high in districts in which the landed elites held a large share of the land, highlighting that the conservative party mechanism is linked to high levels of landholding inequality as well. Since the heads of the local administration were representatives of the German state, the incumbent Conservative Party faced a strong incentive to manipulate elections by instructing the local administration to do so. Indeed, Ziblatt presents evidence that the Conservative Party relied on the local administration for the manipulation of elections on the district level (Ziblatt 2009:16).
The 2 × 2 typology intersects the dimension of uniqueness with the dimension of certitude or certainty. On this dimension, one asks for the probability with which one expects to find observable implications confirmed. When certitude is high, one is relatively confident of gathering supportive evidence, whereas low certainty means that it is relatively unlikely to collect confirming evidence for a given implication (Bennett 2008:706; Van Evera 1997:30). As regards the Ziblatt study, certainty is high for the region that he selects for process tracing because East Elbia is described by a high degree of landholding inequality (Ziblatt 2009:14). If landowners should have felt threatened by democratization and taken countermeasures, then this should be observable in the region of East Elbia where they had the most to lose. 9 The same holds true for the conservative party mechanism because the more unequal land distribution, the higher the stakes for the Conservative Party and the more likely it is that the latter manipulates the elections. Correspondingly, landowners are the less likely to subvert the democratization process, the more equally land is distributed in a region.
The intersection of the two dimensions produces the four types of hypothesis tests summarized in Table 1. The typology is presented in combination with a recent amendment by Bennett (2010:210-11), asking whether a successful test is a necessary or sufficient criterion for inferring that the hypothesis is correct (see also Collier 2011). This amendment is not to be confused with necessary and sufficient causation in set-relational research because the typology generally applies to all types of hypotheses. This means that the necessary and sufficient criteria for hypothesis testing pertain to hypotheses on necessary and sufficient terms alike. The four types of tests and the criteria of necessity and sufficiency are now clarified step by step.
Types of Hypothesis Tests and Causal Inference.
The weakest of all tests is the straw-in-the-wind test because it is marked by low uniqueness and low certainty (Collier 2011:826). A test of the capture mechanism against the conservative party mechanism in a district with low landholding inequality is a straw-in-the-wind test; one is unlikely to find confirming evidence because land is equally distributed, and the implications are not unique because they both yield the same implication. A passed straw-in-the-wind test is not sufficient to infer that the capture hypothesis is correct because the conservative party hypothesis would be confirmed as well, which is reflected in low uniqueness. Furthermore, a passed test is not necessary for a confirmatory causal inference because failure is of little surprise in light of low certainty. This means that negative evidence of course disconfirms the hypothesis, but that it is still eligible to a follow-up test where the ex ante likelihood of finding it confirmed is high (i.e., we should do a hoop test or a doubly decisive test).
A hoop test is characterized by high certainty and no uniqueness. A test of the capture mechanism against the conservative party mechanism in a district with high landholding inequality represents a hoop test. Certainty is high because one expects to find confirming evidence in such a district, but uniqueness is absent because the two hypotheses entail the same prediction. Similar to a straw-in-the-wind test, a passed hoop test is not sufficient for inferring causation due to low uniqueness. But passing the test is necessary because failing a test that was likely to be passed casts serious doubt on the hypothesis.
Smoking gun tests have high discriminatory power deriving from high uniqueness. It combines with low certainty of finding an implication confirmed. A smoking gun test of the capture mechanism involves the selection of a district with an equal distribution of land and tests this mechanism against the social power mechanism. A successful smoking gun test is sufficient for a confirmatory causal inference because it lends credence to the capture hypothesis only. At the same time, passing a test is not necessary because failure does not come as a surprise. 10
Doubly decisive tests are deemed to be the most powerful type because they combine high uniqueness and high certainty (Van Evera 1997:31-32). With regard to the Ziblatt example, a double decisive test is performed when choosing a district with high landholding inequality for a test of the capture mechanism against the social power mechanism. The intersection of high uniqueness and certainty means that passing a test is necessary and sufficient for a confirmatory causal inference. According to the conventional reading, supportive evidence for a unique implication with high certainty permits it to make a confirmatory inference for the capture hypothesis, while at the same time refuting the social power hypothesis (Bennett 2010:211).
The discussion of the four types of tests demonstrates that the realization of one test or the other has very different implications for causal inference. In light of this, one can wonder whether we can navigate between the types by crafting the analysis accordingly. Although this is a salient issue, the literature on the typology has not dealt with this matter in a comprehensive manner thus far. The remainder of this article is dedicated to a discussion of how case selection can be used to adjudicate between the tests. The discussion starts with the dimension of certainty and then turns to the role of uniqueness for causal inference.
Certainty and Case Selection
The dimension of certainty is closely tied to matters of case selection. 11 The reason is that a specific case allows us to determine on the basis of theoretical considerations whether an implication entails a high or low certainty of being found confirmed in process tracing. Implicitly, this became apparent in the empirical example when I argued that the analysis of a district with low landholding inequality entails low certitude for the capture hypothesis, while the choice of a district with an unequal distribution of land implies high certainty. When we theorize that the level of landholding inequality has an influence on electoral fraud, the former is an essential case specific feature that shapes the likelihood with which we expect to find the hypotheses on the mechanisms confirmed. It follows that the intentional choice of cases with a high or low level of certitude allows it to navigate between a doubly decisive test and a hoop test on one hand and a smoking gun test and a straw-in-the-wind test on the other.
The case study literature offers two established types of cases that fit squarely into the distinction between high and low certainty (Eckstein 1975). A most likely case implies a high certainty of gathering particular pieces of evidence, which is equivalent to high certitude. Correspondingly, a least likely case entails a low likelihood of finding specific evidence (Bennett 2008:719; Levy 2008:12-13), translating itself into low certitude. 12 Linkage of the dimension of certainty with the most likely and least likely case study demonstrates that the certainty with which one expects to gather observations is neither high nor low per se, but only for a particular case.
Concerning the above-mentioned benefits of high certainty, which is equivalent to a necessary element for causal inference, there are apparent advantages to the choice of most likely cases. However, high certainty might coincide with low uniqueness and leave one with a hoop test only, which allows for less powerful causal inference than a doubly decisive test. Consequently, there are also benefits to making an intentional choice between tests that are characterized by high and low uniqueness, respectively.
Uniqueness and Contrasts in Comparative Tests
In contrast to the dimension of certainty, the dimension of uniqueness is not automatically tied to matters of case selection. It is possible to specify observable implications and assess their uniqueness without considering any case against which these predictions can be tested. The implication of the capture hypothesis that landowners seize the local administration is unrelated to any specific electoral district one could choose for process tracing. However, cases of course must be chosen in order to be able to test the observable implications of a theory. As is demonstrated in the following, case selection is not an easy matter because there is no guarantee that an implication unique to one theory also has high uniqueness for the selected case. With regard to the Ziblatt study, an example of this discrepancy would be the choice of an electoral district in which landowners hold social power and capture the local administration by filling the position of a Landrat. While the social power hypothesis and the capture hypothesis entail a unique implication concerning the mechanism underlying electoral fraud, we have chosen a case that meets the preconditions for a simultaneous test of both hypotheses. This means that in regard to the chosen electoral district, the implications are not unique in the sense that process tracing might deliver empirical evidence that simultaneously substantiates both propositions. It might be that one finds confirming evidence only for the capture hypothesis or the social power hypothesis, but this would be due to chance and not due to the purposeful choice of a suitable case.
The extant literature on the 2 × 2 typology simply refers to whether the uniqueness of an observable implication is given or not (e.g., Bennett 2008). The example just given demonstrates that the mere reference to the uniqueness or non-uniqueness of an implication is ambiguous. The ambiguity can be removed by distinguishing between the theoretical uniqueness and empirical uniqueness of observable implications. Theoretical uniqueness refers to the question of whether one or more hypotheses yield a specific observable implication in principle. Empirical uniqueness asks whether we can test the observable implication of one or more than one hypothesis given a specific case. 12 As indicated above and as I show in the following, theoretical and empirical (non-)uniqueness do not necessarily go together, rendering it important to understand their interplay and the role of case selection as the connection between the two forms of uniqueness. 14
In the following elaboration of the interplay between theoretical uniqueness, case selection, and empirical uniqueness, comparative hypothesis testing involves a working hypothesis (W) and a rival (or competing) hypothesis (R). The 10 designs presented in Table 2 capture all relevant constellations between the working hypothesis and the rival hypothesis by asking whether the two hypotheses center on the same or a different cause or condition, 15 the same or a different mechanism, and same or a different outcome. 16 When the cause, mechanism, or outcome differs between the propositions, the systematization follows the idea of contrast classes. The idea behind contrast classes is to ask why one specific cause or mechanism, rather than another cause or mechanism, is expected to produce a specific outcome, rather than an alternative outcome (see Northcott 2008). The contrast thus lies in the comparison of two states of the cause, mechanism, or outcome, one of which occurred and one of which did not occur in a given case. 17
Designs for Tests of Working Hypothesis and Rival Hypothesis.
For contrastive mechanisms, causes, and outcomes, I further distinguish between exclusive and nonexclusive contrasts. In the second column of Table 2, I present a formalized view on the constellations between the working and the rival hypothesis. A and B denote conditions or contrast classes that are different, but not exclusive while ∼A (“not-A”) signifies a mutually exclusive condition compared to A. Correspondingly, M and N capture nonexclusive mechanisms and ∼M denotes a directly contradictory mechanism compared to M. Y and Z mark two potential outcomes and ∼Y the mutually exclusive outcome in relation with Y. (An extended formalized account is provided in the online appendix to this article which can be found at http://smr.sagepub.com/supplemental/). 18
The following discussion of the designs and their implications for case selection and causal inference is organized along the dimension of the condition (see below). Following this logic, the first four constellations involve working and rival hypotheses centering on the same condition. Among the four designs, scenarios 1a and 1b specifically focus on exclusive and nonexclusive mechanisms. Designs 4 to 6 involve propositions with mutually exclusive conditions (A vs. ∼A), while the last three designs (7–9) include nonexclusive conditions (A vs. B).
In order to keep the discussion of the designs comprehensible, only designs 1a and 1b explicitly deal with contrastive mechanisms. Besides that a three-dimensional visualization becomes demanding and potentially more confusing than insightful, 19 Table 2 suffices to give a full exposition of the range of constellations one may confront. All the arguments that I make for designs 1a and 1b extend to the other eight designs as they can involve exclusive and nonexclusive mechanism as well. From an inferential point of view, though, contrastive mechanisms are not needed here and are not taken into the picture when elaborating designs 2 to 9. While the causal statements are located at the cross-case level for these designs, the case studies that I present in the following are about process tracing on the empirical level because the authors use process tracing to substantiate the claim that there is a causal link between the cause and the outcome.
In the following sections, each of the 10 designs is illustrated with an empirical study with regard to the following features. First, do the two hypotheses yield unique implications on a theoretical level (column three of Table 2)? While, for reasons elaborated above, the certainty for each observable implication matters for causal inference, certainty is not further considered to maintain the full focus on the dimension of uniqueness (see also below). Second, do the two hypotheses imply mutually exclusive, that is, directly contradictory theoretical implications (column four)? Third, I explain what the optimal case selection strategy is, which requires elaboration of what “optimal” means. 20 The primary goal is to generate an unambiguous causal inference related to the working hypothesis. 21 The focus of the case selection column rests on the working hypothesis because it is, by definition, the hypothesis that one favors and seeks to test. On one hand, it is evident that empirical uniqueness always allows it produce an inference for the working proposition only, whereas the following sections show that it is sometimes impossible to achieve empirical uniqueness via case selection.
On the other hand, the following sections show that when the achievement of empirical uniqueness is possible, it is not always necessary to establish because it might deny generating inferences on the working hypothesis and the rival account. If we extend the view beyond the working hypothesis and also want to seize the opportunity for inferences on the rival proposition, it follows that empirical uniqueness for the working hypothesis is not necessarily the best strategy to follow (at least as long as this does not conflict with the goal of unambiguous inferences on the working hypothesis). In the following section, I show that for some designs an unambiguous inference on the working proposition requires choosing a case that does not allow us to test the rival hypothesis at the same time. In other designs, however, the intentional choice of a suitable case permits us to test the working hypothesis and the rival proposition simultaneously, which is superior to testing the former hypothesis only. Since it is not always possible or necessary to achieve empirical uniqueness for the working hypothesis via case selection, column 6 includes information on whether the recommended case selection rule establishes empirical uniqueness for the working hypothesis.
With regard to column 5, it should further be noted that it specifically yields information on the positive cases that we need for a test of the working hypothesis and, depending on the design, the rival proposition. 22 Here, my understanding of “positive case” is that of a case displaying the condition or mechanism entailed by the working hypothesis. Epistemologically, of course, causal inference that follows the criterion of difference making demands a negative case (Northcott 2008), allowing one to assess whether the condition or mechanism makes a difference to the outcome (see Machamer 2004). 23 The negative case can be an actual one (Mackie 1974: chap. 3) or introduced by means of a counterfactual (Lewis 1973), which is a matter that does not need concern us here. (See the online appendix for a discussion of negative cases in all 10 designs, which can be found at http://smr.sagepub.com/supplemental/).
Fifth and finally, I clarify the consequences of the confirmation and disconfirmation of the working hypothesis for the rival account, given the theoretical constellation between them and the assumption that the optimal case selection strategy is followed. It generally holds that if a confirmed working hypothesis has no ramifications for the competing account (which holds true for the majority of designs), so does a failed working proposition not entail anything for the rival hypothesis. On the other hand, when the confirmation of a working hypothesis automatically implies the disconfirmation of its competitor, the latter is also directly found confirmed when one gathers disconfirming evidence for the working proposition.
I also note that it is possible to elaborate more complex hypotheses and explanations than is done in the empirical examples that are presented below. With respect to Table 2, A, M, and so on, can generally denote a single mechanism or cause, but could also include conjunctions of conditions or mechanisms, INUS causes (Insufficient, but Necessary elements of a conjunction that is Unnecessary, but Sufficient), explanations involving a fine-grained chain of intervening steps, sequencing, and so on (Mahoney 2000, 2010; Mahoney, Kimball, and Koivu 2009; Pierson 2004). However, the complexity of hypotheses and explanations on effects and mechanism does not change anything about the arguments that I develop in the following. The formulation of a complex explanation means to derive an observable implication, much as the specification of a simple hypothesis stipulating a mechanism. The elaboration of a complex explanation promises inferential benefits because the more specific an explanation is, the more likely it is that it is unique. Still, a salient question is whether the explanation is incompatible (exclusive) or compatible (nonexclusive) with a rival explanation. It is also evident that Table 2 only captures the most basic constellations involving two hypotheses positing different conditions, mechanisms, and outcomes. In practice, one is likely to operate with more hypotheses related to each other in multiple, probably complex ways in terms of similar, exclusive, and nonexclusive implications. In principle, however, what is elaborated for two hypotheses extends to tests of multiple hypotheses. 24
I further note that if the following discussion explicitly touches on the doubly decisive test as it was introduced above, the assumption is that uniqueness is complemented with high certainty as this combination is constitutive for this test. Furthermore, when the discussion deals with the empirical confirmation of the working and the rival hypothesis, it is also clear that the two might differ in regard to the respective certainty with ramifications for causal inference. 25 In order to highlight the dimension of uniqueness, the implicit assumption is that the two propositions do not display noteworthy differences in the certainty of the respective observable implications.
Finally, it should be emphasized that all examples introduced so far and to be introduced below are statements about the sufficiency of a condition. Hypotheses on necessary conditions and correlational cause–effect relationships are not covered because this would require the discussion of empirical examples in this article. On the practical side, the focus on sufficiency can be justified to the extent that process tracing and case studies often involve statements about sufficient conditions (Blatter and Haverland 2012; Goertz and Mahoney 2012; Mahoney 2004; Mahoney and Goertz 2006). Equally important, the major conclusions that I derive from Table 2 fully extend to correlational and necessary condition case studies, which is elaborated in detail in the online appendix to this article. 26
Working and Rival Hypothesis With Same Condition
The first two designs in Table 2 capture a constellation in which two hypotheses share the same cause and outcome, but differ as to the underlying causal mechanism that can be exclusive (1a) and nonexclusive (1b). An example of design 1a with exclusive mechanisms is Grieco’s (1990) study of the implementation of the so-called Codes negotiated at the GATT (General Agreement on Tariffs and Trade) Tokyo Round in the 1970s. 27 In his case study, Grieco is testing neoliberalism against neorealism. This test includes contrastive mechanisms because neorealism claims that states seek relative gains in international cooperation. In contrast, neoliberalism stipulates that states are satisfied when they can achieve absolute gains. Absolute gain seekers thus also cooperate with other countries when the latter can reap more benefits from the agreement. These two predictions on the mechanisms of cooperation imply that neorealism and neoliberalism achieve theoretical uniqueness and stipulate mutually exclusive implications. 28 Given the presence of contradictory predictions, the confirmation of one hypothesis automatically disconfirms the other. 29 Design 1a is thus in line with the conventional reading of the doubly decisive test.
The choice of a suitable case for a comparative analysis is uncontroversial because one only needs to choose one case with the condition and the outcome present; in this example, these are the Tokyo mandate to negotiate about the implementation of the Codes and the actual outcome of the negotiations. Process tracing allows it to discern whether countries sought absolute or relative gains and to confirm one hypothesis while disconfirming the other. The case that we need to test the working hypothesis and its rival necessarily achieves empirical uniqueness because both propositions predict different mechanisms when the condition and the outcome are present.
The situation is different when the two mechanisms are not exclusive, captured by design 1b. Ziblatt’s (2009) study, introduced previously, is a prime example of a design with nonexclusive mechanisms. As explained, three possible mechanisms, the social power mechanism, the capture mechanism, and the conservative party mechanism, link landholding inequality to electoral fraud and exhibit theoretical uniqueness. The salient difference between Grieco’s and Ziblatt’s case study is that all three mechanisms could find empirical confirmation in a single-case analysis. Empirically, it may be that landowners use their social power to manipulate elections; that they additionally take control over the local administration in order to have additional leverage for electoral fraud; and that the incumbent conservative party has its own interest in manipulating elections via the head of the local administration.
Design 1b highlights that theoretically unique implications do not automatically entail contradictions; each hypothesis yields a unique mechanism with unique observable implications, but the confirmation of one hypothesis does not simultaneously disconfirm the other two. The Ziblatt example is an instance of equifinality or substitutability on the level of mechanisms, that is, multiple mechanisms potentially connect the same cause to the same outcome. In fact, the argument that the confirmation of one hypothesis automatically rejects the other two is a misreading of the hypotheses because each of them allows the other two hypotheses to be empirically accurate as well. As regards the necessary and sufficient criteria for hypothesis testing introduced above, it follows that the confirmation of a hypothesis including a nonexclusive and unique mechanism is not sufficient for confirming this proposition and refuting its rival.
The correct interpretation of uniqueness and nonexclusive mechanisms has further implications for the collection of empirical observations and their use for causal inference. In the misleading interpretation of the doubly decisive test, a CPO is taken as supportive evidence for one hypothesis and disconfirming evidence for the other proposition. However, if the implications of two hypotheses are unique, but nonexclusive, it is mandatory to treat each hypothesis on its own empirical ground. One has to search for evidence lending support to each of the two hypotheses and determine their validity separately. The working and rival hypothesis are confirmed by the respective presence of supportive evidence, and disconfirmed by the absence of such evidence or the presence of countervailing evidence. 30
Ziblatt is well aware of the relation between his hypotheses and proper collection and interpretation of evidence because he evaluates the social power hypothesis and the capture hypothesis on separate empirical ground. 31 This means that he is looking for observable implications confirming or disconfirming the capture hypothesis, and is searching for evidence on implications specifically corroborating or disconfirming the social power hypothesis.
In the face of unique and nonexclusive mechanisms, the challenge in creating the best possible basis for causal inference on the working hypothesis rests in achieving empirical uniqueness via case selection. In the context of design 1b, empirical uniqueness entails the choice of a case that only allows us to test for the presence of the mechanism stipulated by the working hypothesis. In Ziblatt’s study, empirical uniqueness is absent because the selected region of East Elbia permits it to simultaneously test all three mechanisms. Ziblatt finds disconfirming evidence for the social power mechanism, but is left with supportive evidence for the capture mechanism and the conservative party mechanism. This study thus highlights that case selection is the transmission belt between theoretical and empirical uniqueness. In Ziblatt’s analysis, an unambiguous causal inference would have been feasible by looking for a region where one would expect to find positive evidence for the capture mechanism (his preferred one), whereas the antecedents for a test of the other two mechanisms were not met. Such a region would permit a focus on one hypothesis and the generation of inferences for this hypothesis only. 32
It is true that this analysis would no longer be a comparative test because the social power hypothesis and the conservative power hypothesis have been removed from the equation by means of case selection. For design 1b (and other designs that follow), this is the price one has to pay to be certain that high theoretical uniqueness is complemented with high empirical uniqueness in a single case study. If one wants to perform a real comparative test of the three mechanisms, one must select three positive cases, each of which is appropriate for the test of one hypothesis only.
These problems of comparative hypothesis testing and case selection are not new to case study research as this is a long-standing topic in the case study literature (Lieberson 1991; Lijphart 1971; Zelditch 1971). However, the role of case selection is worth pointing out because it demonstrates that theoretical uniqueness does not necessarily imply unambiguous confirmatory (or disconfirmatory) causal inferences. When the working and rival hypotheses only distinguish themselves through a nonexclusive mechanism, theoretical uniqueness must be translated into empirical uniqueness by informed case selection. Whether or not the appropriate cases are available is an empirical question, meaning that it might or might not be possible to select a case that achieves empirical uniqueness. In principle, though, a reflected choice of cases is the vehicle for avoiding a lack of empirical uniqueness and the corresponding problems for causal inference.
Design 2 in Table 2 combines two hypotheses that focus on the same cause and include a mutually exclusive outcome. Zuber’s (2011) case study on party strategies in ethnically divided societies offers an example of such a design. The starting point of Zuber’s analysis is the established hypothesis that competition between political parties representing ethnic minorities leads to the ideological radicalization of these parties. Drawing on the literature on party competition, Zuber conjectures that ethnic minority parties also face incentives for ideological moderation, that is, taking a position that is closer to the center of the ideological spectrum rather than at the extreme poles. A case study of ethnic-minority party strategies in Serbia shows that they do opt for ideological moderation and, additionally, sheds light on why this is done. The collection of supportive evidence for ideological moderation directly contradicts the radicalization argument because a party can either moderate or radicalize its position at a given election.
In this design, theoretical uniqueness is given because both causal arguments center on a different outcome. Moreover, the two propositions are mutually exclusive because only one of the two hypotheses can be found confirmed in a single case. Case selection is straightforward because all one needs to ascertain is the presence of the condition, that is, that the selected party represents an ethnic minority competing with another minority party. Interestingly, such a case necessarily lacks empirical uniqueness because the presence of the condition entails the possibility to two competing hypotheses. In total, a design with the same cause and mutually exclusive outcomes exhibits the features of a doubly decisive test.
Case selection and causal inference follow different lines when the cause is identical and the outcomes are not exclusive (design 3). For an illustration of this design, I rely on Kammer’s (2013) research on the consequences of globalization on redistribution in advanced democracies. Kammer develops the hypothesis that globalization has two consequences for redistribution: it leads to less redistribution via the tax system (first outcome of interest) and more redistribution via the welfare system (second outcome of interest). There is no inherent link between the two trajectories of redistribution, meaning that more or less redistribution (or no change at all) through taxation can coincide with more or less redistribution through welfare spending. 33
Theoretical uniqueness is given for each hypothesis because they contrast different and nonexclusive outcomes, but these propositions do not contradict each other. Again, this shows that theoretical uniqueness is not sufficient for mutually exclusive implications. Theoretical uniqueness can be complemented with a lack of empirical uniqueness because the analysis of a globalized country, that is, a case with the condition in place, could deliver supportive evidence for both propositions. If one wants to avoid that both hypotheses receive empirical support at the same time, one has to choose a case that is suitable for a test of one proposition only. For example, one could select a country about which one knows that redistribution through the tax system is low at the outset, allowing one to focus on the consequences of globalization on welfare system-based redistribution. 34 But since we are interested in comparative hypothesis testing and are dealing with two nonexclusive outcomes here, the question is whether empirical uniqueness is desirable at all. In other words, we have a discrepancy between case selection achieving empirical uniqueness on one hand, and the choice of a case permitting us to test both hypotheses. If we choose a case allowing us to test both propositions at the same time, we contribute more to the generation of knowledge than in a design establishing empirical uniqueness for the working hypothesis. In this view, design 3 is one of the constellations for which there are benefits in deviating from the idea of achieving empirical uniqueness for one hypothesis only. 35
Working and Rival Hypothesis With Exclusive X
For the exposition of designs 4 to 6, I rely on studies that play exclusive causes against each other. An example mirroring design 4, which involves two propositions with exclusive causes and the same outcome, is Hendriks and Michels’ (2011) study of the spread of elements of direct democracy in the United Kingdom and the Netherlands. They hypothesize that the advent of direct democracy is attributable to an international debate about direct democracy that equally influences the United Kingdom and the Netherlands and thus renders their political regimes more similar. The comparison of a majoritarian democracy (United Kingdom) with a consensus democracy (the Netherlands) further permits the assessment of the counterclaim that either consensus or majoritarian political regimes are the actual driving force behind the invention of features of direct democracy. 36 For purposes of illustration, I focus on this claim here and take as the working hypothesis that consensus democracies exhibit more elements of direct democracy than majoritarian countries. 37
Theoretical uniqueness is not given in this design because both hypotheses entail the same outcome. However, a lack of theoretical uniqueness is not a serious problem for either case selection or causal inference. Design 4 is an instance of a doubly decisive test by choosing a case with the condition of the working hypothesis in place; if we find that consensus democracies have more elements of direct democracy installed than do majoritarian countries, it directly follows that the hypothesis is wrong according to which the majoritarian nature of a political regime is the cause of more direct democracy in a country (and vice versa). 38 With regard to necessary and sufficient criteria for causal inference, design 4 shows that theoretical uniqueness is not necessary for unambiguous causal inference. A successful test of a hypothesis lacking theoretical uniqueness can result in confirmatory causal inferences if it involves two propositions that contrast exclusive causes and center on the same outcome.
The discussion of design 5 draws on Eckert’s (2010) inquiry of regulatory autonomy in the European Union (EU). Eckert argues that the type of capitalist system in a country influences various dimensions of the autonomy of regulatory agencies in the postal sector. Among other things, Eckert (2010:1235) hypothesizes that liberal market economies grant encompassing competencies, coordinated market economics confer medium competencies, and that state-led market economies grant limited competencies.
Each of the three propositions displays theoretical uniqueness because they settle on different outcomes. In addition, empirical uniqueness is automatically guaranteed for a selected positive case because a country is subsumed under only one variety of capitalism (which is the condition in this example). One case allows it to focus on the corresponding hypothesis on the consequences of this type for regulatory autonomy. With regard to causal inference, the nonexclusivity of the causes and the outcomes implies that all propositions can be empirically valid. In fact, design 5 stands out from the other nine constellations because the confirmation of the working hypothesis automatically entails the confirmation of the rival hypothesis. This implication of design 5 follows from the logic of counterfactual inference in case studies. In abstract terms, for the positive case of the working hypothesis, I must show that A is a cause of Y by demonstrating that ∼A leads to ∼Y (Mackie 1974: chap. 3). As the formalization of design 5 in Table 2 shows, the requirement for causal inference on the working hypothesis (A → Y), that is, the negative case we need for comparison, is identical to the constellation between the working proposition and the rival account (∼A → ∼Y). Consequently, the inference that the working hypothesis is correct also confirms the other hypothesis. Design 5 thus is not a double decisive test, but, in fact, reverses its logic. 39
Design 6 covers two hypotheses that contrast exclusive causes and nonexclusive outcomes. 40 Research on international trade cooperation illustrates this design and its ramifications for causal inference and selection. When deciding on the institutional design of international trade cooperation, countries can either seek the realization of distributional concerns or concerns about the transaction costs of cooperation (Koremenos, Lipson, and Snidal 2001). Distributional concerns derive from the distributional implications that liberalizing trade cooperation have for domestic economic actors. Income is redistributed from domestic producers that compete with imported goods to consumers and exporters as the actors that benefit from reciprocal liberalization (Pahre 1998). A country’s concerns about transaction costs relate to the time and resources a country needs to invest in trade negotiations. Transaction costs are manageable when the number of cooperating countries is small, but become a burden to successful liberalization when the number is large (Zawahri and Mitchell 2011).
Concerns about distribution and transaction costs are exclusive causes because a country cannot attach priority to both at the same time, which becomes manifest in the working and rival hypotheses that are tested. The working hypothesis states that a country’s concern about distribution (condition A) leads it to favor bilateral cooperation (outcome Y) over multilateral cooperation (Rixen 2010; Rixen and Rohlfing 2007). The rival proposition stipulates that concerns about transaction costs (condition B) lead to the pursuit of the formula approach (outcome Z) in negotiations (Rohlfing 2009). Regarding the working hypothesis, the rationale is that a country can fine-tune its offers and demands for concessions in country-by-country negotiations, while this is impossible in multilateral settings where one offer extends to all other negotiating partners. 41 The rival hypothesis states that countries with ongoing concerns about transaction costs opt for the formula approach toward trade cooperation as opposed to the item-by-item method. The formula and item-by-item method represent two different approaches toward the handling of items (i.e., commodities and goods) in trade negotiations. In item-by-item negotiations, each commodity is negotiated on its own terms, for example, one negotiates a different tariff for small cars, SUVs, trucks, cars with hybrid engines, and so on. While this allows the fine-tuning of concessions in bargaining, it can clearly become cumbersome with hundreds or thousands of items. The formula method is advantageous in this regard because one concession, for example, a tariff cut of 15 percent, is applied to all commodities. The formula approach thus saves transaction costs.
The two propositions about bilateralism and the formula method exhibit theoretical uniqueness because they focus on a different outcome. Moreover, empirical uniqueness for the working hypothesis is easy to establish by examining a country about which one knows that it maintained concerns about distribution. Concerning causal inference, designs with exclusive causes and nonexclusive outcomes entail that the confirmation of one hypothesis does not invalidate the other proposition. Gathering confirming evidence for the proposition that concerns about distribution lead to bilateralism does not provide any clues about the veracity of the hypothesis on the consequences of concerns about transaction costs. Design 6 thus shows that neither theoretical uniqueness nor empirical uniqueness automatically imply that one is realizing a doubly decisive test.
Working and Rival Hypothesis With Nonexclusive X
Design 7 involves two hypotheses that center on a nonexclusive cause and the same outcome. Samford’s (2010:389-95) inquiry about rapid trade liberalization in Latin America offers an example of this design. He argues that rapid trade liberalization occurs when a country experiences devaluation in combination with a government that enjoys few institutional constraints, or when the executive is largely unconstrained and the country is hit by hyperinflation. 42 Theoretically, uniqueness is not given because both causal claims stipulate the same outcome. Moreover, the confirmation of one hypothesis does not have any consequences for the other hypothesis. Whether or not a country undergoes rapid trade liberalization when an unconstrained executive operates in a country with hyperinflation is independent of whether it does so when an unconstrained executive is situated in a country experiencing devaluation.
On an empirical level, uniqueness is not automatically achieved by the choice of a suitable case for the working hypothesis because the case might also be a positive instance for the rival proposition. For Samford’s study, empirical uniqueness would not be given when one selects a country with an unconstrained executive that is marked by hyperinflation and a devaluating currency. Since we, above all, want to arrive at an unambiguous causal inference for the working hypothesis, we have to choose a case that establishes empirical uniqueness for this proposition. 43 For design 7, systematic case selection thus is the key for translating theoretical uniqueness into empirical uniqueness which is salient here.
Design 8 is realized in a case study when the two hypotheses settle on nonexclusive causes and contrast mutually exclusive outcomes. An illustration of such a design can be found in welfare state research interested in the question of whether welfare state spending expanded or declined recently (Genschel 2004). For instance, one can develop the hypotheses that globalization leads to reduced welfare spending because it increases competitive pressure, while a competing hypothesis maintains that left-wing governments account for more spending because of principled ideological reasons (Obinger et al. 2010). Both hypotheses are theoretically unique as they predict a different outcome.
Any case having the condition of the working hypothesis present achieves empirical uniqueness for this proposition. But in order to get the most out of design 8, one should choose a country that counts as globalized and has a left-wing government in place, that is, a case that lacks empirical uniqueness. The case selection strategy for design 8 thus is exactly the reverse of design 7, and empirical nonuniqueness is desirable. The choice of such a case is interesting because it allows us to test which of the two conditions prevails over the other. With regard to the empirical example, we would be able to test whether a left-wing government increases spending even if the country is globalized or whether globalization would also lead to decreased spending in a country with a left-wing government in place. This means that only one hypothesis can find empirical confirmation on the basis of the selected case. Although the implications of the working hypothesis and the rival are not mutually exclusive on a theoretical level, the choice of a proper case permits to confirm one hypothesis and automatically disconfirm the other.
The rationale for selection such a case is that only one hypothesis can turn out to be correct, as spending levels can be either increasing or decreasing. Design 8 thus meets the requirement of a doubly decisive test if the selected case displays the condition of the working hypothesis and the rival hypothesis. 44 When one has no opportunity to analyze a suitable case because it is not available empirically, the results of the test of one hypothesis are unrelated to the other hypothesis and the test is not of the doubly decisive test. If one finds that globalization leads to less spending due to competitive pressure in a country with no left-wing government in place, it is still an open question as to whether left-wing governments are a cause of higher spending (and vice versa). Intelligible case selection thus is again central for laying the best possible basis for causal inference.
The last design to be discussed, design 9, includes two hypotheses contrasting nonexclusive causes and outcomes. Jakobsen’s (2010) analysis of liberalization in the Danish telecommunications and electricity sector offers hypotheses meeting these criteria. He examines whether differences in the level of Europeanization and globalization can explain the differences in the extent and process of liberalization in the two sectors. 45 As regards the process, Jakobsen hypothesizes that the timing of liberalization follows an increase in competitive pressure when liberalization is driven by globalization. An observable implication related to the influence of Europeanization concerns the observation that liberalization follows an EU timetable. Both hypotheses center on nonexclusive causes because globalization and Europeanization are mutually compatible determinants of liberalization (which is what Jakobsen finds in his study). Similarly, the outcomes are nonexclusive because it is possible that the timing of liberalization follows an increase in competitive pressure and the EU timetable.
The two hypotheses on the timing of liberalization have theoretical uniqueness and potentially lack empirical uniqueness. They display theoretical uniqueness because they entail different outcomes. But although theoretical uniqueness is given, the requirements of a doubly decisive test are not met because the confirmation of one hypothesis is unrelated to the confirmation of the other. On the empirical dimension, empirical uniqueness is only given when one selects a case that displays the condition of the working hypothesis and does not exhibit the condition included by the competing hypothesis. From the viewpoint of knowledge generation, though, it is beneficial to select a case having the both conditions present because it allows us to test two (noncontradictory) hypotheses at the same time.
Uniqueness Reconsidered
The discussion of uniqueness that builds on Table 2 shows that the role of this dimension is much more complex than has been appreciated so far. In summarizing the previous paragraphs, three issues stand out. First, it is fallacious to simply speak of “uniqueness” in comparative hypothesis tests. If uniqueness is implicitly understood as empirical uniqueness, it is true that a successful test of the working hypothesis is sufficient for an unambiguous causal inference. This is long-standing knowledge in the case study literature that has discussed these matters under the rubric of determinate versus indeterminate causal inference and control for rival explanations (Collier et al. 2004; Hall 2008). However, an interesting insight of the previous sections is that empirical uniqueness is not necessary for an unambiguous inference. In design 9, for instance, it is possible to generate an unambiguous inference on the working hypothesis even if the selected case also permits it to test the competing proposition.
Furthermore, I explained that empirical uniqueness alone does not tell us whether the working and rival hypotheses yield a theoretically exclusive or nonexclusive observable implication. This is essential information because it helps determining whether the successful test of the working hypothesis also permits it to refute the rival hypothesis (and vice versa). Similarly, the implicit equation of uniqueness with theoretical uniqueness is misleading. Theoretical uniqueness does not ensure empirical uniqueness, potentially leaving one with an ambiguous empirical picture after process tracing has been done. I have shown that the confirmation of a theoretically unique implication is not sufficient for a confirmatory causal inference (e.g., design 1b). Moreover, I have demonstrated that passing a test of a theoretically unique implication is not necessary for an unambiguous causal inference (design 4). Severely undermining the conventional reading of the doubly decisive test, one therefore should not casually jump from a successful test of an empirically and/or theoretically unique implication to inferences about the competing hypothesis.
Second and relatedly, the dimensions of uniqueness and certainty do not exhaust all relevant features for comparative hypothesis testing. For causes, outcomes, and mechanisms covered by the working and rival hypothesis, it is additionally necessary to consider whether they are identical, exclusive, or nonexclusive. The interplay of exclusiveness and nonexclusiveness of causes, outcomes, and mechanisms determines whether two hypotheses yield exclusive observable implications that allow one to realize a doubly decisive test.
Third, a doubly decisive test can be performed in comparative hypothesis testing, but it is only one possible variant of tests. Table 2 shows that a doubly decisive test is attached to 4 of the 10 possible designs. In 6 of the 10 designs, discarding the rival hypothesis when finding the working hypothesis confirmed (and vice versa) is not justified even when the latter entails a theoretically unique implication. This insight again casts doubt on the widespread use of the idea of “rival” and “competing” hypotheses that is also used here. The discussion of the 10 designs shows that only four justify speaking of rival propositions because they are directly competing with each other. In the other six designs, the two hypotheses are not mutually exclusive and are not directly rivaling each other in terms of finding empirical confirmation.
Mutual Exclusiveness in Hypotheses
In the previous discussion of uniqueness, it was presumed that the hypotheses are formulated in nonexclusive terms. In Ziblatt’s (2009) study, the hypothesis that a capture mechanism links landholding inequality to electoral fraud does not rule out that a social power mechanism is viable and could achieve the same. The present section serves to discuss the implications of explicitly formulating exclusive hypotheses. Slightly rephrasing the capture mechanism hypothesis, it now reads “the capture mechanism is the only mechanism connecting landholding inequality to electoral fraud.” This rephrased capture hypothesis is now effectively a claim about monocausation, that is, a monocausal hypothesis, as the capture of the local administration and nothing else is conceived of as a viable mechanism. Although monocausal hypotheses entail a strong theoretical claim that is rarely made in the social sciences, it is a feasible one, the implications of which need to be considered in more detail.
The formulation of a monocausal hypothesis is without further implications for the empirical analysis when there is no compelling theoretical reason or empirical evidence to argue that another mechanism or condition can bring about the outcome as well (which seems unlikely). When the hypothesis is found confirmed, one can infer that the relationship is causal and the outcome an instance of monocausation. A different and more realistic situation is when the monocausal hypothesis is rivaled by a second proposition. In the Ziblatt example, the monocausal capture hypothesis is challenged by the social power hypothesis. 46 In terms of the previous discussion of uniqueness, a comparative test of the two propositions involves nonexclusive mechanisms linking the same cause to the same outcome (design 1b). Further assume you find supportive evidence for the monocausal capture hypothesis. Considering that this proposition is formulated in exclusive terms, does this have any implications for the social power mechanism? No, it does not. The monocausal proposition makes a claim about capture as the only mechanism leading from inequality to fraud. This is an empirical claim that can be wrong and which cannot be determined by testing this hypothesis alone. Consequently, a full-fledged test of the monocausal capture proposition mandates a test of the rival hypothesis as well. The monocausal hypothesis is only substantiated when it is backed by confirming evidence and when the rival hypothesis fails its test.
Conversely, this implies that we can reject a monocausal hypothesis when the rival hypothesis is found confirmed and without having collected any evidence related to the former. If we are able to gather evidence that social power matters in bringing about fraud, the argument that capture is the only way in which fraud can occur is automatically invalidated. In this example, this means that a failed test of the social power hypothesis is a necessary element for a confirmatory inference on the monocausal capture hypothesis. At the same time, failure of the social power proposition is not a sufficient criterion for confirming the monocausal capture hypothesis because the latter could be wrong as well.
In total, the formulation and test of a monocausal hypothesis does not change anything about the arguments made in the previous section. The analysis of a monocausal hypothesis only allows for a doubly decisive test when the design takes one of the forms in which working and rival hypotheses make mutually exclusive predictions. Presuming one does not realize one of these special designs, what comes on top of them in a test of a monocausal hypothesis against a competitor is that the latter must be shown to be empirically invalid and the former to be empirically accurate.
Certainty, Uniqueness, and Mutual Exclusiveness: An Expanded Typology
In light of the previous discussion of the dimension of uniqueness, two modifications of the typology are warranted in order to provide a better ground for case selection and strengthen causal inference. First, since the uniqueness of an observable implication can be determined on a theoretical and an empirical level, it is necessary to clarify what the dimension of uniqueness covers. Knowledge about the theoretical uniqueness of an implication is not very insightful because of the ambiguous link between theoretical and empirical uniqueness. On the other hand, knowledge about the empirical uniqueness of implications is more relevant because this is directly related to the generation of unambiguous causal inferences. For this reason, the typology should be amended by clarifying that uniqueness refers to the empirical (non-)uniqueness of implications. I noted above that an exclusive focus on empirical uniqueness is never wrong in process tracing, but that this is not necessarily the best case selection strategy to follow with regard to causal inference. However, “uniqueness” is a constitutive and important dimension of the 2 × 2 typology and the case selection rules summarized in Table 2 reflect a complex interplay between identical, exclusive, and nonexclusive causes, mechanisms, and outcomes. This interplay cannot be easily integrated into a comprehensible typology, explaining why I limit the discussion to empirical uniqueness here.
Second, the previous section shows that the typology assigns too much inferential weight to tests of implications with high empirical uniqueness. This deficiency is remedied by adding a third dimension to the 2 × 2 typology. I call this dimension mutual exclusiveness and it captures whether the working and rival hypothesis entail mutually exclusive observable implications. 47 Table 3 expands the 2 × 2 table introduced above by this dimension. 48 The four types of tests covered by the original 2 × 2 typology are presented in this table in order to show where they fall once one takes mutual exclusiveness into account.
A 2 × 2 × 2 Typology of Hypothesis Tests.
The assignment of the four types to one of the cells in the expanded typology highlights its twofold value. First, it underscores that the current reading of the doubly decisive test conflates high uniqueness of an observable implication with the presence of contradictory implications. Second, the enlarged typology shows that the original 2 × 2 typology does not permit it to make inferences about the rival hypothesis. If one wants to learn if empirical uniqueness is complemented with a mutually exclusive implication, it is necessary to take the corresponding dimension into the picture. When the test that one realizes falls into the first or third column of Table 3, a successful test of the working hypothesis permits it to reject the rival hypothesis. 49 On the other hand, one should refrain from inferring anything about the rival hypothesis from the test result of the working hypothesis when the test is located in column 2 or 4.
Conclusion
Process tracing is a valuable tool for causal inference and has received increasing attention in the recent literature. A particular emphasis has been laid on a 2 × 2 typology of hypothesis tests. The present article contributes to the existing discussion in three ways. First, I have introduced the distinction between theoretical and empirical uniqueness of observable implications. This distinction is salient because case selection is the vehicle for translating theoretical uniqueness into empirical uniqueness, which is a prerequisite for unambiguous causal inference. However, I have also demonstrated that there are specific constellations between the working hypothesis and its competitor for which it is advantageous to choose a case that lacks empirical uniqueness.
Second, I have shown that, in principle, it is possible to choose between the tests in empirical research. The purposeful choice of cases in which certainty, theoretical uniqueness, and exclusiveness assume the desired level renders it possible to adjudicate among the available tests. This does not mean that the required cases are always available for process tracing as this is an empirical matter. However, the discussion of case selection strategies demonstrates what cases one should be looking for in order to realize one type of test or the other.
Third, it has become apparent that the interpretation of the doubly decisive test conflates unique implications with implications for which the working and the rival hypothesis contradict each other. The inclusion of a dimension “mutual exclusiveness” remedies this shortcoming and underscores the difference between unique and mutually exclusive implications. When the available propositions do not yield contradictory predictions on a theoretical level, it is, of course, not possible to introduce them to the analysis via the intentional choice of cases. On the other hand, the presence of theoretical contradictions should play a role in the case selection stage of a process-tracing analysis because they allow it to realize a doubly decisive test as the most powerful test that is on offer.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
