Abstract
Social scientists using statistical models and more qualitative techniques frequently employ divergent approaches to thinking about causality. Statistical methodologies tend to draw on probabilistic understandings of causality. Qualitative research traditions, however, have advanced a sophisticated framework around necessary and sufficient conditions. In particular, the qualitative comparative analysis approach has embraced theory development that emphasizes equifinality and complex causal relationships. This article reviews the two traditions and explores how a causal framework grounded in necessary and sufficient conditions can be adapted to statistical models. A logistic regression analysis of major contributions to peacekeeping missions is used to illustrate both the viability of blending the two traditions as well as the potential for more sophisticated theory development and testing.
Introduction
Consider a simple problem: What differentiates someone who joins a rebel group from someone who chooses not to join? Different theoretical approaches might produce different explanations. For example, rational choice theorists may emphasize the private benefits available to individual rebels choosing rebellion (see Lichbach, 1994; Tullock, 1971), while scholars building on the Marxist tradition may emphasize class- or group-based grievances (see Dahrendorf, 1959; Gurr, 1970; Muller & Seligson, 1987). It is very much to be expected (even unremarkable) that different theories would focus and direct our causal explanations. Yet, less often recognized, the epistemological and methodological assumptions that scholars hold also shape how a causal explanation is structured.
An explanation of who joins a rebel group that is grounded in statistical methodologies might look at any number of variables derived from multiple theories (see Humphreys & Weinstein, 2008). The causal links connecting variables to outcomes are seen as cumulatively accumulating risk factors, increasing the probabilistic likelihood that an individual will join a rebellion. Yet an explanation grounded in the qualitative tradition would be more likely to focus on identifying a combination of factors that work together to produce rebel group participation. Elizabeth Wood’s account of rebel participation in El Salvador is an excellent demonstration of this alternative approach to constructing causal arguments. Wood’s (2003) argument is that various motivations (participation, defiance, and pleasure in agency) combine with one of two pathways (local past patterns of violence and proximity to insurgent forces) to facilitate collective action and by extension participation in rebellion.
Rather than an additive approach to causation, qualitative scholars, such as Wood, often develop a series of explanations each representing pathways that can produce a given outcome. Goertz and Starr (2002) also observe this ontological pattern, noting that qualitative scholars tend to structure combinational causal explanations using the logic of necessary and sufficient conditions. The statistical and qualitative approaches to conceptualizing causal relationships have often operated in opposition and isolation with little cross over. Yet, Braumoeller and Goertz (2000) demonstrate that “a large number of theoretically important necessary condition hypothesis exist” (p. 854), suggesting that social science needs better tools for evaluating these kind of relationships. In this article, I argue that there is significant scientific ground to be gained by quantitatively oriented scholars through an understanding of causality that can accommodate necessary and sufficient conditions. Furthermore, the strategies needed to adapt a typical regression analysis to do so are relatively simple and relatively easy to communicate.
This article begins with an introduction of the logic of necessary and sufficient conditions before turning to a review of how this approach to causality converges and diverges from the statistical/probabilistic approach to causality. I then sketch out a simple strategy for using the logic of necessary and sufficient conditions in a regression context. This strategy provides a way to test hypothesis that have a mix of “additive” and “combinational” causal mechanisms. Furthermore, this approach allows a researcher to draw support for a hypothesis not just from the rejection of the null hypothesis where hypothesized but also from the absence of statistically significant patterns where necessary conditions are absent. This approach is illustrated with an example, testing various theories of peace keeper provision. Finally, I offer advice for deploying the necessary and sufficient conditions logic in a statistical context.
Necessary and Sufficient Conditions and Causality
The epistemological and methodological divide between qualitative and quantitative scholarship is deeply entrenched in many contemporary social science traditions (see Mahoney & Goertz, 2006); thus, it is easy to forget that the two traditions grew up in parallel. Druckman (2005) notes that the comparative method and the statistical method both emerged from the positivist epistemological tradition. Indeed, Lijphart (1971) does not see the statistical and comparative methods as distinct but rather on a continuum reflecting the number of cases available. This was also the view of Smelser (1973), who argued that the comparative method was the default approach when the number of cases was too small to facilitate effective statistical analysis.
Yet as Hollis and Smith (1996) note, our methodologies are interrelated with our ontologies and epistemologies. And indeed, resting at the heart of the two approaches is a categorically different way of working with causality. Charles Ragin (1987) provides the sharpest articulation of this difference in his rearticulation of the comparative method as qualitative comparative analysis (QCA). Ragin’s method extends the necessary and sufficient conditions approach to causality using set theory and Boolean algebra. He similarly differentiates the qualitative and statistical traditions in terms of “variable-orientated” research and “case-oriented” analysis. In the variable-oriented tradition researchers seek out a list of variables that are associated by way of correlation with an outcome variable. Researchers seek to estimate average causal effects that can be generalized to a population of cases. Druckman and Wagner’s (2019) recent comparative study of justice in peace negotiations provides an example of analyses of a large number of cases with control variables. In contrast, the case-oriented approach ideally develops a small set of explanations that can account for the outcome variable across all cases. Set theory and Boolean algebra rather than the statistically derived equation of a line serve to describe the causal relationships in QCA.
It is of course possible to find variables that are individually necessary or sufficient to produce a particular outcome. Barrington Moore’s (1966) famous assertion, “No bourgeois, no democracy” (p. 418), is a clear a statement of necessity. Yet more often social scientific arguments involve constellations of variables that may be necessary and work together to produce a jointly sufficient explanation for an outcome. Two types of causal relationships are of particular relevance for scholars employing QCA.
The first is the identification of variables that are themselves insufficient but can combine with other variables to produce sufficiency. A variable that is a necessary component of a jointly sufficient set of variables is termed an Insufficient but Necessary part of an Unnecessary but Sufficient set (INUS; see Figure 1) (Mackie, 1965). For example, Wright (1977) argues that early state formation emerges out of the interplay of resource availability (an INUS condition) and one of several different types of catalyzing challenges that force a reworking of how information is managed within an administrative system. The combination of these factors can be read as sufficient to facilitate the jump in organizational complexity that archeologists associate with state formation.

INUS and SUIN conditions and equifinality.
The second type of causal relationship focused on by QCA researchers addresses variables that are neither necessary nor sufficient but often play a role in larger causal processes. This can occur when a variable plays an important role in a causal process, but there are several other mechanisms that could play a similar role. In such a situation each of the variables representing a different causal pathway would be termed Sufficient but Unnecessary part of an Insufficient but Necessary set (SUIN; see Figure 1) (Ragin, 1987). Returning to Wright’s argument on state formation, there are any number of factors that could catalyze an administrative transformation. Dynamics of conquest identified by Carneiro (1970) or the necessity of public works projects to fend off starvation (Wittfogel, 1957) could both catalyze social transformation. Wright’s framework treats these factors as SUIN conditions. Neither factor is necessary to produce state formation, but some factor is necessary to force administrative innovation.
Given the categorical structure implied by the language of INUS and SUIN conditions, many causal arguments built off of the necessary and sufficient conditions structure seem to operate with dichotomous variables. This observation was also made by Braumoeller and Goertz (2000) who develop a statistical test for necessity to be used with binary data. They note that necessary and sufficient conditions hypotheses are rarely articulated in terms of continuous variables. Yet Ragin has been at the forefront of creating greater flexibility in the measurement of variables while describing causally complex patterns. His work adapting the qualitative comparative method for use of fuzzy set variables is perhaps the most well-developed adaptation of necessary and sufficient conditions away from discrete categories.
Modeling Complexity in Regression
Thus far, the correlational and necessary and sufficient conditions logics have been presented as competing and divergent approaches to causality; approaches that all but demand a distinct set of tools. Certainly, these two traditions have operated largely independent of each other. For example, Mahoney and Goertz (2006) argue that Ragin’s QCA and work on fuzzy sets has been largely ignored by quantitatively oriented scholars working in the statistical/probabilistic tradition. Similarly, Mahoney (1999) notes that scholars frequently invoke necessary and sufficient conditions theories when working with nominal variables, but frequently shift the structure of hypotheses to correlational statements when using interval ratio variables. 1
Attempts that have been made to bridge the gaps between the two research communities have often taken the form of creating tests of necessity conditions that can communicate findings more effectively to scholars working in the quantitative and probabilistic tradition. There have been several different attempts to develop or adapt statistical tests for necessary and sufficient conditions, including Yule’s Q (Bueno de Mesquita, 1981) and the “del statistic” (Hildebrand, Laing, & Rosenthal, 1976; Siverson & Starr, 1989). Most recently, and arguably most precisely, Braumoeller and Goertz (2000) develop tests for both the presence of a necessary condition and for the condition’s “triviality” in a relationship. While there may be strengths and weaknesses of each statistic, which can be debated, these statistics tend to focus on evaluating necessity in a bivariate relationship rather than attempting to model complexity more generally.
Modeling complexity necessarily begins with a solid theoretical foundation to help differentiate different combinations of processes that might contribute to the same outcome. In qualitative research, this kind of theorizing on equifinality is relatively common (see George & Bennett, 2005). In a regression context, this means reflecting on how the relationship between one or more independent variables and a dependent variable may change. Ragin (1987) notes that “a specific cause may have opposite effects depending on context” (p. 27).
To some extent this is widely recognized and well understood. The phenomenon is discussed by Druckman (2005) in his treatment of experimental designs. He provides an example, shown in Table 1, involving the effect of mediation on bargaining duration where attempts to resolve a dispute involve either groups or individual representatives.
Crossover Effect Illustration.
Note. From Druckman (2005).
Assuming that the average duration represented by each cell in Table 1 reflects an equal number of observations, then we could infer that the group representation and self-representation rows have the same 20-minute duration. Yet the effect of mediation for the two groups is the exact opposite. Mediation accelerates conflict settlement for groups but slows it in situations when individuals are representing themselves.
In statistical analysis, “moderation” (i.e., the use of interaction terms) is the standard approach used to model this kind of causal conditional complexity. The review of mediation and moderation by Hayes and Rockwood in this issue provides a useful illustration of how regression can be turned to the task of modeling complexity. Similarly, Brambor et al. (2006, p. 64) embrace interaction effects as an essential methodological and theoretical approach to modeling, noting “it could be argued that any causal claim implies a set of conditions that need to be satisfied before a purported cause is sufficient to bring about its effect.” Ragin (1987, p. 65) accepts that interaction effects can facilitate “multiple conjunctural causation” but is somewhat skeptical that this is an alternative to the comparative method. He notes that these tests work well only under limited and very specific conditions.
Mahoney and Goertz (2006), however, are highly skeptical that interaction effects can possibly capture causal complexity in the way it is understood in a necessary and sufficient conditions context. They offer three concerns. First, they argue that including all relevant interaction terms would be prohibitively impractical. On this point, Hayes and Rockwood’s discussion in this issue on complex structures of moderation and mediation is highly relevant. While there may be a practical upper limit there is still significant potential in the thoughtful modeling off complexity. Second, Mahoney and Goertz (2006) argue that while interaction effects can approximate conditional relationships represented by the Boolean “AND” operator, there is no comparable technique to model the Boolean “OR” operator. Their third critique notes that interaction effects do not resolve the basic problem that relationships can change across subsets of data: Imagine that in a statistical study the impact of X1 is strongly positive in the population. Does this mean that X1 cannot have a strongly negative impact for a particular subset of cases? The answer, of course, is “no.” The impact of X1 as one moves from a superset to subsets is always contingent in statistical models; there is no mathematical reason why X1 could not be negatively related to the phenomena. Similarly, the estimate of the parameter β12X1 × X2 could change dramatically when moving from the whole population to a subset. (p. 236)
The contingency of coefficient estimates is certainly a possibility in any statistical analysis, and while good practice should always strive to fit statistical models fully, there is certainly something to be said for simplifying assumptions and parsimony (Healy, 2017; Schrodt, 2014).
While interaction effects are a useful tool for modeling complexity in a limited sense, this article explores a different strategy that I will refer to as regression disaggregation. Regression disaggregation involves splitting a data set for separate analysis. This approach was used by Humphreys and Weinstein (2008) in their analysis of who joined rebel groups. Their initial survey data included individuals who joined freely and those who were kidnapped and forced into militia service. Not surprisingly, separating out the two groups of survey respondents yielded different patterns.
The logic of this approach is quite straightforward when data are disaggregated using a categorical variable, yet it is also possible and potentially useful to disaggregate our analysis at natural or theoretically salient breakpoints in continuous variables. My own project (Urlacher, 2013) on peace overtures made by governments to rebel groups attempts to do this by separating out conflicts that produced a large number of fatalities from those that produced a small number of fatalities. Such an analysis revealed a crossover effect like the one illustrated in Table 1 for multiple independent variables.
This approach to modeling complexity was proposed by Ragin (1987) but quickly and summarily dismissed because splitting a sample would reduce the sample size and the corresponding degrees of freedom. He posits that “there are strong pressures on the variable-oriented researcher to keep sample splitting to a minimum” (p. 66-67). Yet, if indeed there is unmodeled complexity operating within a set of data, this will have the effect of inflating standard errors. Correctly specifying a model in terms of independent variables, the functional form of those variables, and the complex interconnections between them should in general contribute to smaller standard errors, which may potentially offset the loss of statistical power resulting from splitting a sample.
Yet, even if the change in standard errors resulting from disaggregating a regression analysis is a wash, this approach when paired with a solid theoretical foundation provides multiple opportunities to evaluate a theory. Conditional hypotheses should identify not only the anticipated relationship between variables but in which subgroups those relationships are likely to be observed and which subgroups a relationship should not be relevant. This implies, contrary to common statistical practice, that failure to reject the null hypothesis can constitute positive evidence for a causal argument, when the null hypothesis is correctly anticipated. This possibility raises a number of issues, which are addressed later in this article, following a demonstration of how a disaggregated regression design can constructively advance a statistical analyses of equifinality. The example hypothesizes conditional relationships that are statistically significant in some models and but not significant in others.
Illustration: Peacekeeper Provision
Most peacekeeping deployments fall well under 25 troops. Yet UN (United Nations) peacekeeping missions may involve the deployment of tens of thousands of peacekeepers for years on end. Thus, states that supply large numbers of peacekeepers play a critical role in sustaining the UN peacekeeping system. Thus, understanding who makes major contributions to peacekeeping missions has practical implications. I proposed that large-scale peacekeeper provision was driven by the combination of capacity and incentives, both of which were necessary to motivate major contributions of troops to peacekeeping missions (Urlacher, 2008). This pairing of motivation and capabilities mirrors a common formulation put forward by Cioffi-Revilla and Starr (2003) and by Starr (1978).
Similarly, Bove and Elia (2011) argue that some states have a comparative advantage in contributing peacekeepers. This might explain why only a handful of states make major sustained contributions of troops even though the UN stipend would be an attractive sum of money for many developing states. Indeed, for some states peacekeeper provision is profitable in strict financial terms. Pakistan is often cited as a state that has taken peacekeeping on as a for-profit activity. In contrast, most high-income states would struggle to deploy a peacekeeper for the roughly $1,000 per month per soldier allocated to UN peacekeeping missions. Given the differences in labor costs it is plausible to expect a different dynamic driving contributions of rich and poor states.
The UN stipend represents a viable way for some states to meet the incentive condition; however, there are a number of other factors that might satisfy this condition as well. Specifically, I consider the role of former colonial relationships and immediate regional security. The thinking is that former colonial relationships as measured by Hensel (2009) might activate either a sense of “obligation” on the part of a former colonial power or domestic political networks that might push for a significant contribution to international peacekeeping efforts. Anecdotally, Great Britain, France, and Italy have been actively engaged in African peacekeeping missions involving former colonies. Similarly, the proximity of a conflict could ramp up the interest of a state in contributing to a peacekeeping mission. Proximity is measured by the logged number of miles between capital cities of a potential contributor and the state in which a mission is based. Information on geographic distances is taken from data collected by Gleditsch and Ward (2001). In the logic of necessary and sufficient conditions, the UN stipend, geographic proximity, and colonial links would each be Sufficient but Unnecessary parts of an Insufficient but Necessary condition (i.e., SUIN).
When considering the capacity to contribute peacekeepers two factors seem relevant. First and foremost a state must be able to field a large and operationally capable number of soldiers. While all states have a security apparatus not all states prepare their forces for deployment abroad. While the logistics and transportation of peacekeepers is often facilitated by great powers (Neack, 1995), the military institutions of a state may not be well suited for deployment abroad (Roomy, 2004). Operating away from one’s home territory or on different terrain creates practical and institutional challenges. To overcome these challenges, the militaries of great powers practice maneuvers in varied locations and under varied conditions. To try to estimate military preparation for foreign deployment, I use the COW National Material Capabilities Data (Version 3.02) (Singer et al., 1972). These data are used to construct an index of force projection according to the process described by Fordham (2006). 2 In the language of necessary and sufficient conditions military capacity is likely to be a necessary condition, yet for wealthy states it is likely to be trivial. The vast majority of advanced industrial states have invested in building some offensive military capacity or participate in an alliance network that regularly conducts large-scale joint military operations. For poorer states, there is likely to be greater variation in military capacity and this is likely to be a more salient factor.
In addition to the ability to deploy a large contingent of soldiers abroad, the capacity to participate in an international peacekeeping force is augmented by the cultural and linguistic ability (i.e., linguistic interoperability) of the officer corps and to a lesser extent the enlisted soldiers. Linguistic interoperability is essential if orders are to be effectively disseminated and if peacekeepers are to be able to communicate with units from other states (Crossey, 2005). Diehl (1988), in his review of peacekeeping missions, points out that the most common command and control problem . . . was language. By organizing forces from many different nations, it was often difficult for commanders to communicate on a one-to-one basis (much less in any larger aggregation) with their subordinates. (p. 497)
Peacekeeping missions are partially able to mitigate this coordination challenge because troops generally operate under their own commanders in their own units. Still, a military that can operate in a major world language (particularly English and French) is better positioned to contribute large numbers of peacekeepers than militaries that are only able to operate in languages with limited speakers internationally. In measuring linguistic interoperability I referred to the CIA Fact Book to indicate if countries had English or French as official languages or if the languages were widely used.
Linguistic interoperability contributes to the realization of a necessary condition but it likely contributes in a way that follows the probabilistic and additive understanding of causality typically theorized in regression models. That is to say English or French language competence may make it easier for a state to deploy a large number of peacekeepers, but a dearth of linguistic skills need not be a barrier in the way that the absence of force projection capacity would be.
Research Design
Having laid out the necessary and sufficient conditions logic of peacekeeper provision, I now turn to processes of modeling major peacekeeper contributions. From 2000 to 2012, I collected annual data for each country on the number of soldiers provided to each of 16 different UN missions. Cases in which a country contributed more than 200 troops to a mission were coded as Major Contributions. Because the dependent variable in this analysis is dichotomous, logistic regression is employed. 3 In an effort to address clustering in the standard errors around the 16 UN missions from which observations are drawn, clustered robust standard errors are employed. Clustered robust standard errors are adjusted to account for the correlation in error terms that results when groups of cases are drawn from a common situation. To address the time dependence of contribution or noncontribution of troops to UN missions, temporal variables are included (although not reported) in all models, following the recommendations of Carter and Signorino (2010).
Because the theoretical framework presented in this article suggests that many variables will have different effects as states become wealthier, the analysis is disaggregated into three separate models, each incorporating a different range of gross national product (GNP) per capita. While there is no accepted standard for dividing states along the continuum of rich and poor, the division adopted here is a slight modification of the World Bank’s country classification scheme. The World Bank classification system sorts countries as low, lower middle, upper middle, and high income. In 2012, high-income states had levels of GNP per capita a little over US$12,000. This reflects the top 30% of the world’s states. Low-income states are defined here as countries with GNP per capita of less than $4,000 GNP per capita. This approximates the World Bank’s low income and lower middle-income groups in 2012. Approximately 37% of states fall into this group. This leaves 33% of countries in upper middle-income range, which are referred to in this analysis simply as “middle-income states.”
Data on the GNP per capita of a state is drawn from the World Bank’s World Development Indicators (World Bank, 2012). These data are used to disaggregate the regression analysis and is also included as a statistical control. Several other independent variables are included in this analysis. For example, the number of authorized troops is included for each mission. The logic behind this variable is that missions with higher authorizations are more likely to have major contributions than smaller missions.
Additionally two variables measuring economic linkages are included. The COW International Trade, 1870-2014 (Version 4.0) data (Barbieri, Keshk, & Pollins, 2009) are used to measure the relative economic importance of a peacekeeping mission to a potential contributor. Exports from a contributor to the state (or states) receiving peacekeepers are divided by a potential contributor’s gross domestic product. Similarly, imports are divided by gross domestic product. Both variables indicate the scale of trade relations relative to a potential contributor’s economy. 4
The democracy character of a potential contributor is also included. The Polity IV data (Jaggers & Gurr, 1995) are employed to measure a country’s autocratic and democratic features. The Polity Score for each state is the associated democracy score, less a country’s autocracy score. This measure potentially runs from −10 to 10.
The risk associated with contributing peacekeepers is measured in terms of mission-related fatalities. Data on fatalities are drawn from online United Nations’ data on mission-related fatalities broken out by year. The number of fatalities due to hostile action (rather than illness or accidents) associated with a mission in the preceding year is used as an indicator of the risk a mission poses to peacekeepers.
Last, following on work by Victor (2010) the internal stability of potential contributors is included. The World Bank’s “political stability” measure is used as a proxy for (Kaufman et al., 2014). The political stability measure is part of the Worldwide Governance Indicators, which are scaled from 0 to 100. Higher scores for the political stability measure reflect greater political stability and a lower likelihood that a political system will be overthrown by irregular or violent means.
Analysis
The results of the analysis of peacekeeper provision are presented in Table 2. The hypotheses about major peacekeeper contributions suggested that for poor states (and to a lesser extent for middle-income states), the incentive condition was already met by the UN stipend. Thus, the capacity variables would be more salient for these states. Indeed, all three of the capacity variables were statistically significant in the low-income states model. Notably, a 1 standard deviation increase in the Force Projection Score is predicted to increase the likelihood of contributing a major peacekeeping mission by 10.9%. Similarly, countries that are French speaking are 41.2% more likely to contribute peacekeepers than non–French-speaking countries. The effect for English-speaking countries is even greater, increasing the likelihood of deploying 200 or more troops by as much as 46.5%. For middle-income states the Force Projection Score was also salient, increasing the likelihood of contributions by 25.6%. The effects of military capacity is substantively and statistically significant in Models 1 and 2, but also of some theoretical importance, the three capacity variables seem to be of less salience for wealthier states.
Logistic Regression Analysis of Major Peacekeeping Operations.
Note. GDP = gross domestic product; GNP = gross national product. “LN” indicates that a variable has been logged to reflect diminishing returns. “M” indicates that a variable has been scaled by 1,000. “D” indicates that a variable is dichotomous.
Asterisks correspond to the following thresholds for rejection of the null-hypothesis in a two-tailed test of significance: *p < .05. **p < .01. ***p < .001.
The correlation of GNP per capita with higher levels of education and greater military capacity means that wealthier states will more reliably possess the capacity to participate in large scale international peacekeeping operations should they choose to do so. Indeed, Figure 2 illustrates the trend toward greater military capacity for wealthier states. Thus, while these factors have little effect as statistical predictors for wealthy states, they might still be salient in a causal process that requires both capacity and incentives. If the vast majority of states in the high income category have met the necessary threshold for capacity, the condition becomes trivial and the important factors will relate to incentives.

Average force projection by income category.
Indeed, in Model 3, the three variables proposed as sources of motivation are statistically significant in the analysis of high-income states. As distance between capitals increases the geostrategic urgency of the conflict appears to decrease. A 1 standard deviation increase in the distance between state capitals yields a 13.7% decrease in the likelihood that high-income states will contribute large numbers of peacekeepers. Similarly, former colonial ties seems to mobilize high-income states to intervention. Former colonial ties increase the likelihood that a large peacekeeper contribution will be made by 43.2%. Last, safer UN peacekeeping missions seem to draw more contributions from high-income states. As mission fatalities increase, high-income states are less likely to make major contributions.
In evaluating the theoretical argument put forward to explain peacekeeper provision, it is important to understand and to anticipate which combinations of variables will be statistically significant in specific models and to anticipate the conditions where these variables will be less salient. Thus, given the analysis presented here, there is considerable evidence to support the initial argument that major peacekeeper provision is driving by two necessary conditions (capacity and incentives). As Table 3 seeks to highlight, for high-income states incentive factors are statistically significant while capacity-related factors are not. Given that capacity-related factors are correlated with income level, it is not surprising that the capacity variables are trivial for high-income states.
Evaluating Conditions for Major Peacekeeper Provision.
Similarly, for low-income states, there is potentially sufficient incentive in the UN stipend. While the distance between capital cities does play a role in further incentivizing participation, it is the capacity variables that are the primary drivers of contributions. As Table 3 highlights, for low-income states the capacity variables are statistically significant and are in alignment with the theoretical expectations. The disaggregation of these different situations helps illustrate the importance of both capacity and incentives in the process and that these conditions can be met through a number of different pathways.
While sample size for each model was in the 2,000s rather than the more than 7,000 cases covered in the entire data set, the decision to disaggregate the models provides opportunities to test multiple conditional hypotheses at the same time. This allows for a more strenuous test, first, because the sample size is reduced. This has the effect of slightly increasing the size of the standard errors and thus requiring a greater difference from the null hypothesis to justify rejection of the null. Second, the rigor of the test is also increased because the hypotheses require greater specificity to be supported. The ability to predict in which model a statistically significant relationship will occur indicates a deeper understanding of the causal processes that are in operation. From the perspective of theory testing, there is clear value in a merger of the necessary and sufficient conditions framework with regression analysis. Yet from the perspective of statistical practice this approach is fraught.
Drawing Conclusions From Null Findings
The possibility of drawing affirmative support for a theory from null findings is both an inversion of common statistical practice but also opens the door wide for researchers to engage in ex post theorizing. Indeed, there are good reasons to discourage scholars from reading too much into the differences observed among subgroups. Richard Peto famously highlights the potential absurdity of subgroup exploration. Asked to revise a paper submitted to The Lancet to include a discussion of subgroup effects, Peto included a subgroup analysis of individuals according to zodiac sign. The implication being that a statistical fishing expedition is likely to turn up significant results by dint of randomness. 5 To the extent that nonsignificant results do not work against an argument, this problem is made doubly worse.
To some extent, the necessary and sufficient conditions causal framing provided by QCA should generate a level of specificity that would guard against this kind of atheoretical statistical search. As with all frequentist statistics, it is incumbent on the researcher to state a hypothesis before testing it. For a researcher employing a disaggregated regression approach, a theory should be able to state relationships between independent and dependent variables with a level of specificity that will declare where these relationships will be statistically significant and where they will not be. A conditional theory that cannot anticipate nonstatistically significant relationships is a conditional theory without support.
Yet clear statements of theory only address a small part of the larger problem. After all, a null finding can result from an absence of a relationship in the broader population, insufficient statistical power, a Type II error in which the sample incorrectly reflects the population, or model specification decisions made by the researcher. A failure to correctly specify a model or selecting the incorrect functional form can both contribute to null findings. Thus, a researcher might seek an affirmative statistical test. 6
An affirmative statistical test could be conducted in two different ways. First, researchers could seek, for purposes of parsimony, to present disaggregated models, but also conduct a statistical analysis of all variables that interact with the disaggregating condition. Thus, in the analysis presented above, 11 of the independent variables would interact with the GNP per capita variable, bringing the total number of variables in the model to 23. For robustness purposes, I conducted this analysis. Notably, the Force Projection Score variable was statistically significant as would be anticipated given the results from Model 1, but the interaction term was not statistically significant as might be expected given the results of Model 3. The fully specified interaction model can serve as a check against potentially arbitrary thresholds and in some instances can provide affirmative evidence.
In my analysis, the Force Projection Score variable was positive and statistically significant. The interaction term was also positive, but a large standard error left the effect not statistically significant. If, however, the result had been negative and statistically significant, it would be possible to show that the pattern operating for low-income states dissipated as states become wealthier.
A second type of affirmative test would be to conduct a separate test of the coefficients associated with Force Projection Score in Models 1 and 3. Again, if a researcher could show that the slope from Model 1 is positive and is statistically larger than the slope in Model 3, even taking into account sampling variation, it would be possible to conclude that two distinct statistical processes were in operation. Statistical tests for this kind of comparison are not readily available in most statistical packages but have been explored by Dupont and Plummer (1998) in a linear regression context and by Long and Mustillo (2018) and by Allison (1999) in a logistic regression context.
Yet researchers operating in the QCA tradition might be less concerned with demonstrating that two distinct statistical processes are in operation. Indeed, such a framing is largely grounded in the “variable-oriented” research tradition. The necessary and sufficient conditions framework, in contrast, is “case oriented.” That is the goal is to build combinations of conditions that are useful for understanding different clusters of cases. Thus, a QCA researcher is less interested in demonstrating a statistically significant difference in the slopes associated with a variable for different combinations of cases. Rather, knowing that a condition is important in one context is useful. And knowing that a condition is not important in another context is also useful. It may be worth demonstrating that “not important” means a statistically significant but substantively meaningless relationship, but it is unclear that this is more theoretically salient than a situation in which the standard errors are so large that it is not possible to determine what if any systematic effect is operating.
Concluding Thoughts
This project began (and has ended) with the recognition that there is often a divide on how qualitatively and quantitatively oriented scholars articulate causal arguments. Mahoney and Goertz (2006) observe that while qualitative researchers often focus on identifying multiple causal pathways behind a particular outcome, this mode of thinking is rarely invoked in quantitative studies. In practice scholars often articulate their theories using a mix of correlational and necessary and sufficient conditions (Goertz & Starr, 2002), but statistical analysis rarely seeks to test causal complexity.
There is of course no mathematical barrier to organizing statistical analysis to test a combination of correlational and necessary and sufficient conditions causal arguments (as I have tried to show in the peacekeeping project), but it is not commonly part of the practice of quantitative research in the social sciences. The effectiveness of the approach demonstrated above requires navigating three design challenges.
First, researchers must identify a dimension to disaggregate. Ideally, this dimension would contribute to a causal process as an INUS condition. That is the condition should be a necessary component of a causal process, but is relevant only when other factors are present. Thus, an outcome can be observed in the form of statistically significant relationships across models where the condition is satisfied but these relationships will not appear where the INUS condition is not met. If needed, additional interaction effects can be incorporated into these disaggregated models to better model complex causality.
Second, researchers must identify appropriate thresholds for disaggregating statistical analysis. These thresholds may result from the structure of a variable (a categorical variable creating salient points of disaggregation). Empirically a structural break in the data may also create a defensible point for disaggregating analysis. For example, Garasky (2002) considers migration patterns of young adults, disaggregating his analysis to consider patterns in urban areas and rural areas. Yet, when no obvious break point exists for disaggregation, a researcher may need to test multiple disaggregation points. In my own analysis of how different levels of violence affected the willingness of conflict participants to pursue negotiations, I considered situations with low levels of violence (less than 50 and less than 100 battle deaths annually) and situations with high levels of violence (more than 1,000 and more than 2,000 battle deaths annually; Urlacher, 2013). Testing multiple thresholds can help demonstrate that findings are robust. This is particularly important when there is not a strong theoretical reason to disaggregate the analysis at particular points.
Third, researchers should clearly specify in their hypotheses under which conditions they expect to see statistically significant relationships occurring and when they do not anticipate statistically significant relationships to occur. The ability to identify for a specific variable both where it will appear statistically significant and where it will not, demonstrates that a complex, causal process is operating and is reasonably well understood.
While disaggregated regression is a technique that may not be appropriate in every situation, social science often pushes researchers to consider conditional and complex causal arguments. Testing such arguments necessitates that researchers consider a range of techniques and methods to more directly test conditional relationships (see also the Hayes and Rockwood article in this issue). It is also the hope that as methods for testing complex and conditional arguments becomes more widely used that researchers will reflect more on the conditions under which correlational relationships hold.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
