Abstract

Let me begin by situating myself. I am a quantitative researcher. I have used regression in 80 percent of my published empirical papers and qualitative comparative analysis (QCA) in 33 percent of them. I have taught conventional graduate statistics courses at Berkeley and Duke as well as several short courses on QCA. I have argued elsewhere that more connections are needed between conventional methods and QCA (see Vaisey 2009). For these reasons, I was happy to see my former colleague Sam Lucas and Alisa Szatrowski (this volume, pp. 1–79; hereafter L&S) take on QCA from a “quant” perspective.
Although I found some of L&S’s ideas useful, I also found several important errors in their analyses and interpretations. I fear that some researchers, skimming the paper or abstract, will conclude that L&S have dealt a significant blow to QCA and that the method is not worthy of further investigation. That would be unfortunate. In the brief space that follows, I outline what I see as the most important of L&S’s errors. I then look at what they get right and conclude with some ideas about future work at the intersection of QCA and conventional statistics.
What Lucas and Szatrowski Get Wrong
Simulation Errors
I agree with L&S that simulations play an important role in learning about the properties of numerical methods. On the basis of their first simulation (Study 1, Table 4), L&S conclude that QCA gets the wrong answer even in optimal circumstances. However, my analysis using the same data and program has QCA get the right answer. Because this analysis also forms the basis of L&S’s (incorrect) claim about how QCA cannot handle “overdetermined” causality (pp. 39–41), this mistake negates two of their most important points.
I was also unable to reproduce L&S’s results from Table 5. The information they provided did not allow an exact replication, but my own simulation 1 showed that QCA was able to find the right parsimonious solutions even with a noncausal factor included. The noncausal variable did show up in the complex solutions, however, suggesting that relying on complex solutions without additional (counterfactual) assumptions will be misleading in the absence of complete cell coverage (more on this below). When cell coverage is complete, however, my simulations show that the complex solution is accurate.
Misinterpreting Configurations
L&S continually refer to QCA solution terms as “interactions” when they should be called “configurations,” and despite their claims, there is a difference. Configurations do not necessarily imply multiplicative effects, because they can be the result of additive processes reaching a threshold. Consider a sample of children who either have or do not have one of each of three coins: a nickel (N), a dime (D), and a quarter (Q). Imagine we want to analyze which combination of coins is sufficient to allow a child to buy a 15-cent piece of candy. The QCA solution would be Q + D*N (either a quarter or a dime and a nickel). This does not mean that QCA is telling us that dimes and nickels “interact” to be worth more than the sum of their parts but that they add up enough to cross a threshold. (Unfortunately, some users of QCA seem to misunderstand this as well.)
Asymmetric Causation
L&S are correct that the parameters of an equation for Pr(Y = 0) must be the negation of the parameters for Pr(Y = 1). But QCA is not modeling the probabilities directly but rather the combination of factors that lead to the crossing of a probability threshold. If, as with QCA, the goal is to model the necessity and sufficiency of causes for a particular outcome, it is perfectly reasonable to ask what leads to probabilities, say, above .9 or below .1. For rare events, there may be many pathways to “practically never” and only a few (or one) to “almost certainly.” This is a substantively interesting form of asymmetry even if the underlying model of the process is “symmetric” in L&S’s sense. Of course, the answers depend on the choice of threshold, but this doesn’t strike me as particularly problematic. Arbitrary cutoffs and rules of thumb are inescapable in the practice of data analysis.
What Lucas and Szatrowski Get Right
QCA Is a Numerical Technique
L&S are right that regardless of the spirit that animates its users, QCA is fundamentally a technique for manipulating numerical data. Knowledge of cases is ideal, but not necessary, for using QCA. L&S are also right that the use of techniques such as regression does not prevent a researcher’s going back to those cases for which such knowledge is available. Calling crisp-set QCA “cell oriented” is therefore accurate, and it is perfectly reasonable to compare it with, say, log-linear models.
Rather than focusing on its case orientation, it would be more accurate to say that QCA is complexity oriented. It begins by assuming that everything attached to Y = 1 matters and then simplifies only with positive evidence (or counterfactual assumption). Regression begins with simplicity (extrapolating across empty regions of the vector space) and adds terms only with positive evidence. Because of the curse of dimensionality, the data space will almost always be too empty for the techniques to meet in the middle. (But even where this is possible, QCA’s focus on thresholds rather than on marginal effects will complicate comparison.)
Naive Use of QCA Can Lead to Spurious Results
L&S’s Study 6 is intended to show that noncausal data can yield “causal” results in QCA. This is indeed a risk, and I have seen it several times in papers I have reviewed. Because QCA begins with maximum complexity and simplifies only with positive evidence (or counterfactual assumption), it is easy for it to assert that any configuration that happens to be attached to a positive outcome actually caused it. Complex solutions are most vulnerable to this because configurations without members cannot appear in the solution, leaving rows containing noncausal components without the “partners” they need to reduce those components out. For this reason, complex solutions should almost never be considered “solutions” at all. Because they confound real relationships with patterns of missingness, they should be considered starting points at best (unless cell coverage is complete).
These problems have straightforward workarounds, however, even if they are not used often enough. A combination of higher frequency thresholds, statistical testing of both configurations and individual sets, and—most important—transparent counterfactual assumptions for all predictors (see Ragin 2008, chaps. 8 and 9) can all but eliminate false positives. The use of fuzzy sets also helps because the information from a single case is allocated to many different configurations. To illustrate this, I replicated L&S’s Study 6 using fuzzy sets and the Bonferroni-adjusted statistical test outlined in Vaisey (2007). I found false positives in only 3 percent to 4 percent of analyses—about what one would expect by chance. Unfortunately, many naive users will simply dump variables into the software and then start making up stories without considering these issues.
It is worth pointing out, however, that regression is hardly immune to this problem. Raftery (1995:119–20), among others, showed that it is all too easy to find statistically significant relationships between noise and noise, even with relatively small samples (n = 100). I concede, however, that the naive use of QCA poses a greater risk for false-positive results than the naive use of regression, because QCA’s complex solutions can be driven entirely by cell missingness. But experienced and knowledgeable users will not have this problem.
A Way Forward
There is a genuine need for more engagement between conventional quantitative researchers and QCA practitioners. With more space, I could outline several paths this engagement might take. For example, truth-table analyses of predicted probabilities from logistic regression might be useful, as might the use of regression or matching to establish plausible counterfactual assumptions for truth-table reduction. In what I think is the best extant example of integration, Eliason and Stryker (2009) showed how fuzzy-set QCA can incorporate statistical testing and measurement error and how it can build upward from single causes while preserving many of its other strengths. Connections such as these will provide valuable additions to the discipline’s methodological toolkit and help us see patterns we would otherwise miss.
At first glance, it seemed as if L&S wanted to be part of building these bridges. But despite their correct claim that QCA is fundamentally a numerical technique and their emphasis on the real risks of spurious findings in the hands of inexperienced users, I fear that the tone of their article (and its several errors) do not advance the discussion in the way I had hoped.
