Drawing Lessons from Case Studies by Enhancing Comparability

Abstract

External validity is typically regarded as the downside of case study research by methodologists and social scientists; case studies, however, are often aimed at drawing lessons that are generalizable to new contexts. The gap between the generalizability potential of case studies and the research goals demands closer scrutiny. I suggest that the conclusion that case study research is weak in external validity follows from a set of assumptions that I term the “traditional view,” which are disputable at best. In this view, external validity is treated as a matter of mere representativeness. I argue that it is best understood instead as a problem of inference and that the emphasis should be placed on the comparability of the study rather than on the typicality of the case. By making case studies highly comparable, their external validity can be reliably and efficiently assessed and, in this way, their generalizability potential enhanced.

Keywords

external validity case study research comparability

In an influential book on the principles and practice of the case study method, John Gerring defines the case study as “the intensive study of a single case where the purpose of that study is—at least in part—to shed light on a larger class of cases (a population)” (2007, 20). And, in fact, case studies are often performed with the purpose of “drawing lessons” in the form of conclusions that apply beyond the single case and explain other outcomes in addition to the one studied directly. Case studies in fields such as economics, political science, and educational research are also used to suggest hypotheses that help inform policy decisions in other contexts. In the former case, issues of mere generalizability arise: they regard the range of conditions under which the conclusions of the case study are expected to hold. In the latter case, these concerns about generalizability are further complicated by the need to formulate guidelines on how to intervene in unstudied contexts.

In the philosophical literature, issues of generalizability are usually discussed under the name of external validity (Campbell and Stanley 1963; Cook and Campbell 1979; Guala 2005, 2010; Steel 2008, 2010). External validity is more easily understood by way of its opposite. A scientific hypothesis is said to be “internally valid” when it is true of the studied context. It is said to be “externally valid” when it is also true of contexts that have not been studied yet. These concepts were first introduced by Campbell and Stanley (1963) in their work on experimental designs. And, indeed, the relevance of the distinction is immediately apparent when thinking of laboratory sciences. Experiments are usually set up with the ultimate purpose of teaching lessons about the nonexperimental world. Results that are only true of the studied sample and have no bearing on nonexperimental populations would be of questionable relevance. Knowing the conditions under which these results are applicable outside the laboratory becomes therefore a prominent concern. Though a less pressing concern, generalizing also counts among the goals of case study researchers. External validity is thus an issue for this methodology as well.

Philosophers, however, so far have hardly worried about the generalizability of case study research. For their part, social scientists that make use of case studies proved rather timid in addressing this issue. This tendency is partly changing nowadays. The interest in case studies and their methodological riddles experienced an upsurge in the last decade especially among political scientists (Brady and Collier 2004; Gerring 2004, 2007; George and Bennett 2005; Mahoney and Goertz 2006b). And the renewed attention on the method brought to the forefront the problems of generalizability it encounters. External validity is here addressed in terms of a trade-off. These authors emphasize, in fact, the specificity of case study research by describing its advantages and disadvantages with respect to other empirical methods. Furthermore, they usually assume the existence of a trade-off between internal and external validity and, in the latter, find the downside of the case study design. Treated as a comparative weakness of case study research, external validity is, however, only shortly discussed and quickly dismissed.¹ This situation generates an interesting tension and calls for attention: there is, in fact, a gap to bridge between purposes and means. Generalizing is set forth by case study researchers as a prominent goal, but methodological discussions related to it seemingly conclude with a gloomy perspective.

In this article I first examine the reasons that led most scholars to the conclusion that case study research is a weak methodology with regard to establishing external validity. As we shall see, this conclusion is based on assumptions that are, at best, disputable. In the second section, I revisit this line of reasoning that underwrites what I term “the traditional view” on external validity and outline some of the objections raised against it. I focus in particular on the role that the concept of typicality plays within this view and argue that the centrality given to it diverts debate from the real issue of external validity. One unfortunate result of this has been to lead debate to the dead end where it stands now. I propose to refocus the debate on the external validity of case studies by bringing the concept of comparability to the forefront. This refocusing has two major beneficial effects. First, my analysis demonstrates why it would be best to situate external validity as a problem of inference rather than mere representativeness, as in the traditional view. Second, the approach that I develop suggests strategies for strengthening the generalizability potential of case studies. The goal, in short, is not so much a refutation of the traditional view’s account of the pitfalls of case study research with respect to external validity but a shifting of perspective that reveals unnoticed room for improvement.

The Traditional View on the External Validity of Case Study Research

The use of case studies is common in the social sciences and apparently increasing (Gerring 2007). Interestingly, the case study method starts to be used as an autonomous tool of investigation even in fields that typically relegated it to an ancillary position, such as economics (Rodrik 2003; Bates et al. 1998). John Gerring (2004, 2007) quite surprisingly notices that, even though widely employed across the sciences, the case study method is still regarded as a weak methodology, and he attributes the low consideration in which it is held to the general lack of understanding that still surrounds it. Several scholars lately tried to rehabilitate this methodology by providing a thorough analysis of its specificity. Brady and Collier (2004), George and Bennett (2005), Gerring (2004, 2007), Mahoney and Goertz (2006a), Ragin (1992, 2000) all contribute to the methodological reflections on the case-based method by emphasizing its distinctiveness with respect to the other research designs. These works find some convergence in their understanding of what case study research is good for. Nonetheless, they tend to agree on the fact that external validity counts as a weakness of the method. This conclusion is supported by a set of assumptions on what external validity is and how it should be evaluated. I term these the “traditional view” on external validity.

Some of these beliefs are widely shared in the generic literature on external validity and are thus not confined within the debate among case study researchers. At the same time, not all scholars above would probably endorse each of these assumptions with the same degree of confidence. Even if it is not fully expressed by any of these authors, I take George and Bennett (2005), Mahoney and Goertz (2006b), and Gerring (2004, 2007) as holding firmly to this view. The assumptions on which it rests are, in fact, traceable in the following excerpts:

Recurrent trade-offs [of the case-study methods] include . . . the related tension between achieving high internal validity and good historical explanations of particular cases versus making generalizations that apply to broad populations. The inherent limitations include a relative inability to render judgments on the frequency or representativeness of particular cases. (George and Bennett 2005, 22)

Questions of validity are often distinguished according to those that are internal to the sample under study and those that are external (i.e., applying to a broader—unstudied—population). The latter may be conceptualized as a problem of representativeness between sample and population. Cross-case research is always more representative of the population of interest than case study research. . . . Case study research suffers problems of representativeness because it includes, by definition, only a small number of cases of some more general phenomenon. Are the men chosen by Robert Lane typical of white, immigrant, working-class American males? Is Middletown representative of other cities in America? These sorts of questions forever haunt case study research. This means that case study research is generally weaker with respect to external validity than its cross case cousin. The corresponding virtue of case study research is its internal validity. (Gerring 2007, 43)

In qualitative research, it is common for investigators to define the scope of their theories narrowly such that inferences are generalizable to only a limited range of cases. Indeed, in some qualitative works, the cases analyzed in the study represent the full scope of the theory. By contrast, in quantitative research, scholars usually define their scope more broadly and seek to make generalizations about large numbers of cases. Quantitative scholars often view the cases they analyze simply as a sample of a potentially much larger universe. (Mahoney and Goertz 2006b, 237)

Even though not fully developed and thorough discusses by the authors who endorse it, the traditional view displays some internal and external coherence. Internal coherence among the assumptions enables the conclusion that external validity is a comparative weakness of case study research. External coherence is granted by the fact that this conclusion sits comfortably in a theory of case study research that ascribes to the method comparative advantages and disadvantages with respect to the other research designs. Specific normative implications are then derived regarding when the case study design is the appropriate method to use and how to make it stronger.

I discuss below these assumptions and their normative implications. Some of them have been independently challenged in the extant literature on external validity. I mention them and the related criticisms only shortly. I focus instead on the assumptions whose normative implications, to the best of my knowledge, have not been challenged yet. The set of beliefs that constitutes the traditional view is the following:

External validity is a property of research designs and of the scientific results they deliver.

Internal and external validity stand in a trade-off relation.

External validity is a matter of representativeness.

External validity is a quantifiable property. Whether it is high or low depends on the scope (breadth) of the population to which the results of the study apply.

In virtue of these assumptions, the case study method is characterized as comparatively weak in external validity. Assumption 1 treats external validity as depending on intrinsic features of the research design: a given method is thus characterized as good or bad at providing generalizable results. This is at odds with the original formulation by Cook and Campbell (1979), where external validity is used to qualify solely the result of an experiment. In this formulation, an experiment is externally valid if its results can be generalized to a broader population. More generally, recent literature now commonly treats external validity as a property of a whole design rather than of a particular application of it (Lucas 2003). Assumption 2 asserts a trade-off between internal and external validity. It follows from assumption 2 that a design described as having a comparative advantage in the former respect is, in virtue of the trade-off, comparatively weaker in the latter. Assumptions 1 and 2 enabled the scholars above to qualify case study research as high in internal validity and low in external validity.² These assumptions rationalize the methodological prescription that recommends the use of case study research when the main goal is achieving internal validity and other designs, such as the statistical methods, when the goal is deriving broad generalizations instead.

The soundness of this methodological principle that counsels the use of case studies is certainly disputable once assumptions 1 and 2 are also disputed. Assumption 1 has been criticized by Lucas (2003) in the context of a debate on the external validity of the experimental design. Lucas mounts a defense of the experimental method. Although it is dismissed by several scholars as poor in external validity, Lucas rejects the criticism as essentially misdirected. Specifically, he responds that “critiques of investigative techniques as being low in external validity because findings cannot be generalized quite often should be directed at the theory under test, rather than at the methodology employed to test it” (238). Assumption 2 has been addressed by Jimenez-Buedo and Miller (2010). They notice a tension between the belief widely held that internal and external validity stand in a trade-off relation and, at the same time, that the former is a prerequisite for the latter. Yet they conclude upon analysis of the experimental practice that the alleged trade-off relation is far less cogent than the traditional view would have one believe. These criticisms suggest that the methodological norm based on assumptions 1 and 2 does not have the self-evident status that the traditional view presumes.

Let us now turn to assumptions 3 and 4. They can be rephrased as follows:

3.1. A case study is externally valid if the case it studies is representative of a broader population.

4.1. The broader the target population is the higher is the external validity of the research design.

Assumption 3 sets a condition for the generalizability of scientific results. Results that are obtained within a study apply outside of it only if the context studied represents the target context in some sense to be specified. The traditional view borrows its idea of representativeness from the statistical discourse. The external validity inference is here conceived as an inference from sample to population legitimized by the former being a statistical representative of the latter.

Translated into a qualitative framework, a case is said to stand in a sample-to-population relation with the target universe of cases when it is a typical case within that universe. Typicality is therefore understood as the key requirement for ensuring external validity to the case study within the traditional view. In this perspective, methodological precepts oriented to strengthen the external validity of case studies would all go in the direction of giving rules for the selection of the cases. The following excerpt is an example:

The typical case study focuses on a case that exemplifies a stable, cross-case relationship. By construction, the typical case may also be considered a representative case, according to the terms of whatever cross-case model is employed. . . . One may identify a typical case from a larger population of potential cases by looking for the smallest possible residuals . . . for all cases in a multivariate analysis. In a large sample, there will often be many cases with almost identical near-zero residuals. . . . Thus researchers may randomly select from the set of cases with very high typicality. (Seawright and Gerring 2008, 299)³

Reduced to a matter of representativeness, the problem of external validity thus amounts to adopting the selection procedure that maximizes the probability of choosing the case most typical of the target of interest.

In the traditional view, two major problems threaten the external validity of case study research. The first lies on the difficulty of reliably establishing the typicality of the case selected, the solution to which consists in further refining the selection procedure of the case to study. The second is the intrinsic limitation to the degree of external validity case study research can reach. According to assumptions 4 and 4.1, the degree of external validity depends on the breadth of the population to which the results are generalizable. In virtue of assumptions 3 and 4, case study research is low in external validity because its capacity for being representative of a broad universe of cases is very limited indeed. Even if one succeeds in identifying typical cases, so the argument goes, their typicality is always confined to a small population. Case study research, in fact, studies intensively either one case or a very small set whose degree of representativeness is not only hard to establish but also very limited. Representativeness, it is said, increases with the size of the sample and so in turn does external validity.

External Validity in Case Study Research: From Typicality to Comparability

The traditional view treats the problem of external validity as a problem of representativeness. This has two major normative implications. First, its methodological precepts are all and only oriented to guide the selection of the “right” case, understood as a typical one. Second, the traditional view ascribes the difficulty that case study research has in putting together a representative sample as the source of the method’s incapacity, or the extreme weakness, in achieving external validity. But this reasoning goes wrong already at the first step, and this makes problematic its gloomy conclusion. External validity is not, I argue, essentially a problem of representativeness but rather one of inference and so a problem to which the representativeness of the case might offer (one) possible solution. The challenge of external validity actually consists in fact in identifying correctly the circumstances under which the results of a study can be generalized to other cases. The inference from the studied case to some new contexts needs to be thus justified by some factors that give us reason to believe that what was found true of the former is most probably true of the latter as well. Typicality might be one of these factors. The information that the case at hand is typical, in fact, backs up the inference through which we conclude that what is true of the case is also true of the target. Finding the typical case is therefore a pragmatic solution to what truly is an epistemic problem. Typicality is a solution to the problem of generalizability; typicality per se is not the ultimate problem to solve. The traditional view conflates these two concepts—the typicality of a case and its generalizability and, in so doing, not only fails to capture the essential distinction but also confuses one solution with the entire problem. As a consequence, the methodological norms it imparts to guide the selection of the case cannot respond to the epistemic challenge of external validity

The traditional view, in fact, confines the methodological discourse on external validity to the stage of the selection of the cases and in so doing implicitly suggests that the problem of external validity is fully solved by singling out the representative case from the target universe. Representativeness, however, only offers a solution if the strategy used to establish it properly responds to the epistemic challenges posed by external validity. That is, the typical case cannot be identified by presupposing knowledge that its identification is expected to deliver in the first place. The problem with the strategy described by Gerring in the excerpt above is exactly this one. The way he suggests for the selection of the typical case presupposes a knowledge of the cases that we are not supposed to possess when the problem at hand is correctly described as one of external validity. If we already know the causal relationship that we are interested in generalizing beforehand, there is nothing left to generalize in the first place. Gerring’s strategy probably solves successfully issues of representativeness but cannot double as a solution to an inferential problem.

The scholars from the traditional view failed to respond properly to this challenge because they failed to distinguish conditions for the external validity of the results and epistemic criteria that help establish whether these conditions hold. The conditions for external validity are the circumstances that justify the generalization; typicality is the one explicitly acknowledged by these scholars:⁴

CEV:⁵ If the case is typical of a broader universe of cases, the result obtained in the former is generalizable to the latter.

Typicality, however, cannot double as an epistemic criterion for the assessment of external validity. Once the conditions for the generalizability of the results have been defined, independent strategies should be devised that help establish whether those conditions hold. These are epistemic criteria that inform us about the representativeness of the case and, at the same time, do not presuppose the knowledge of the target that we are expected to extract from the case study itself. This criterion is comparability.

EC:⁶ Comparability of the study is required to establish whether the case is typical of the target universe of cases and the result hence generalizable.

If the case study is comparable in the appropriate respects to the target, it enables us to elicit from the case both the information that is to be generalized and the information that is required to decide about the generalizability of the same results.

The notion of comparability has been introduced by LeCompte and Goetz (1982) in a work on the validity of the ethnographic methods. LeCompte and Goetz understand external validity in terms of typicality and comparability:

The fieldworker’s problem is to demonstrate what Wolcott conceptualizes as the typicality of a phenomenon, or the extent to which it compares and contrasts along relevant dimensions with other phenomena. Consequently, external validity depends on the identification and description of those characteristics of phenomena salient for comparison with other, similar types. Once the typicality of a phenomenon is established, bases for comparison may be assumed. (51)

LeCompte and Goetz have the merit of hinting to the epistemic problem at the core of external validity but still fail to disentangle fully conditions for validity and criteria of assessment. Typicality refers to the condition that the case has to satisfy for having results from the study that are generalizable. Typicality, however, cannot be established a priori, nor it can be inferred from knowledge of the target universe of cases that the generalization itself is expected to provide. The typicality of the case and the generalizability of the results are established upon comparison with other/new cases. As already widely discussed in the philosophical literature, external validity is truly an empirical hypothesis and has to be settled on a case-by-case basis (Guala 2005, 2010; Steel 2008, 2010). Comparability is therefore the epistemic requirement to be imposed on the design of the study in such a way that, by contrasting its results with what we observe in other situations, we are capable to adjudicate the typicality of the case at hand and the generalizability of its results.

By rendering the case study comparable in the appropriate respects, the problem of its external validity becomes ultimately solvable—that is, decidable. This does not mean that external validity is in this way granted, only that it can be reliably established. The discourse on external validity so far developed among case study researchers is misdirected and therefore not helpful to this end. Required in addition are strategies that are epistemically viable for assessing the typicality of the case and the generalizability of the findings. Disentangling the two issues by distinguishing neatly between typicality and comparability is a first step in this direction.

A second move that needs to be introduced involves making the notion of comparability more precise. I return to this point below. Before turning to this aspect, however, I want to emphasize a final point in relation to the traditional view. Its focus on representativeness as if it was the ultimate challenge to establishing external validity has biased the debate and led its current dead end. That is, the consensus has it that external validity constitutes an irremediable weakness of case study research. And this is attributed to the fact that it studies a very limited number of cases and is in this way subject to an inherent limit on the achievable degree of external validity. This conclusion, however, links high external validity to a capacity for offering broad generalizations. This assumption about the requisite breadth of the generalization, however, is disputed in the literature.

There are scholars who hold the view that the generalizations that science allows are always very limited in scope. In a discussion on the external validity of experiments, Francesco Guala (2002) mentions Bruno Latour, David Gooding, and Andy Pickering as promoting a form of radical localism and Ian Hacking and Nancy Cartwright as defending milder positions in a similar spirit. In its extreme version, radical localism denies any external validity to scientific hypotheses except when the outside world can be carefully engineered and made alike to the laboratory such that the experimental results can be directly exported (1196). Guala defends a less skeptical position that admits of several ways to solve the problem of external validity. According to Guala,⁷ the problem amounts to minimize the error in the inference from the experiment to the outside world; this is achieved by making the two contexts as similar as possible. One way to this end is what he calls “engineering the world.” Another strategy is adapting the experimental setting to the outside conditions. For instance, the former can be modified as to reproduce more accurately nonexperimental settings. Even though various strategies exist to generalize reliably from experiments to the outer world, external validity is bound to remain a “local” matter. That is, the generalizations that science allows never travel too far and never apply too broadly. In this perspective, case studies pose no special problem; all studies possess only limited generalizability.

Moreover, from the perspective of this alternative approach to external validity, similarity between the two contexts is the condition that grants generalizability to the results:

CEV₁: If the case is similar to the target case/cases, the results obtained in the former are generalizable to the latter.

Similarity is a broader concept than typicality. Typicality presupposes similarity between the case and its target but further requires a sample-to-population kind of relation between the two. This idea, originating as it does in a statistical context, badly fits case study research where random sampling is not a feasible strategy. Furthermore, it is restrictive in that it asks that the relevant population be clearly defined before engaging in the study of the case that is supposedly representative of it. This led to the type of discourse on external validity I discussed above. Similarity instead does not require any a priori definition of the relevant target and leaves the issue of generalizability open to the empirical analysis that would follow the case study research. Unlike experiments, in case study research, the studied context (the case) and its target cannot be “made” similar as an experiment and the nonexperimental setting are. The case, in fact, cannot be adapted in any meaningful sense to the target, whereas, as Guala suggests, the experimental settings can be slightly modified to fit some features of the outside world. And, in general, given the complexity of the phenomena that case studies examine, the idea of engineering the outside world as to reproduce the study conditions is simply not practicable. The way that is open to case study research is finding similarity between the studied system and the target in vivo.

Improving the External Validity of Case Study Research: Enhancing Comparability

On the basis of what is said above, we can thus relax EC as to encompass the broader condition of similarity:

EC₁: Comparability of the study is required to establish whether the case is similar to the target case/cases and the result hence generalizable.

Even though intuitively appealing, comparability as described by EC₁ is too vague to be compelling. We thus need to refine the criterion further to distinguish what qualifies as a comparable case study and what does not. Furthermore, one needs to specify what makes a case study highly comparable and what detracts from it. In this way we would hint to some principles that might help strengthen the external validity of case study research by making its assessment more reliable. To evaluate comparability, it is worth keeping in mind that when assessing the generalizability of a result, one faces severe epistemic constraints. The external validity hypothesis is, in fact, empirically settled by the comparison between the case studied and the target case, of which we know very little. If we knew of the target what we already know of the case, there would be no worries for external validity in the first place. Certainly we do not know whether the result or hypothesis that is true of the case is also true of the target, since this is what one aims to establish. But, in general, any inference of external validity is bounded by the limited knowledge of the target case (Steel 2008). Hence, provided that it is correct, the inferential strategy that is epistemically cheaper is the one to be preferred.

With this proviso in mind, I suggest that comparability is of the right kind if it is effective. Effectiveness requires that the study render available the information necessary to establish whether, upon comparison, the case is sufficiently similar to the target context so as to justify the generalization of the results obtained. Take, for instance, the most common case in which the result to be generalized is a causal relationship. The case and the target have to be similar in the respects that are causally relevant to the hypothesis for this to be valid in the latter as well (Guala 2010; Steel 2010). The study then needs to inform us about the respects that matter to the causal relation in the case such that we can proceed to the comparison with the target context and eventually to the inference.

In case study research where engineering the world is not an option nor is adapting the case to the target,⁸ knowledge of the relevant causal factors is what enables the inference. Without it, even a tentative assessment of external validity would not be possible. Complete knowledge of the relevant causal factors is, however, only an epistemic ideal. Ideal aside, one can say that the more complete this knowledge is, the more reliable the inference will be.

Provided that comparability is effective in the sense described above, we can then distinguish between high and low comparability depending on whether the inference from the study is more or less epistemically efficient. The more information about target case required to establish the external validity of the hypothesis, the lower the comparability of the case study will be. If the case study only describes the causal factors that are relevant to the causal relationship, then the comparison between the studied case and the target needs to be fully articulated—that is, contrasted along all relevant respects before the hypothesis can be generalized to the new case. Since epistemic efficiency is a virtue when external validity is at stake, a case study that requires full comparison is low in comparability. If the study describes the causal structure instead, then comparison can be partial and only regard certain elements in it. A causal structure, or causal mechanism, is the set of interconnected, regularly operating causal relationships that generate one or more regularities between (observable and unobservable) events (Guala 2010, 1072). It requires that the causal relations among the relevant factors be fully spelled out. If the study also describes the causal structure, its comparability is high indeed because limited comparison between the case and the target in a subset of the causal factors is sufficient to draw conclusions about the behavior of the whole mechanism.

The examples that follow illustrate the distinction just sketched between a “full” causal relationship, on the one hand, and a “causal structure” or “causal mechanism,” on the other. The first case study, on Botswana, is a case of high comparability and enables in fact partial comparison between the causal structures of the case and those of the targets. The second case study is an example of low comparability, where comparison between the cases needs instead to be full before confirming any hypotheses of external validity.

* * *

In a case study on Botswana, Acemoglu, Johnson, and Robinson (2003) explain the unusually good economic performance observed in the country in the last decades. If compared with the average in sub-Saharan Africa, in fact, Botswana performed outstandingly in terms of per capita income growth rate in the last 35 years. The authors start with the assumption based on previous studies that proximate determinants of its economic success are the institutions and the related policies that the country developed over time. Institutions are conducive to growth when they correspond to a social organization that ensures effective property rights to a broad cross section of the society. The authors refer to this cluster as property right institutions.

What they aim to explain is, however, why Botswana was able to develop the institutions it now possesses and thus search for what can be defined as the deep determinants of growth (Rodrik 2003, 3). To this end, they adopt a case-oriented methodology. They finally offer a country narrative in which they reconstruct the processes through which Botswana developed the institutions it actually has.

The examination of the country history suggests five (structural) features as plausibly responsible for its property right institutions and good economic policies:

Botswana’s is very rich in natural-resource wealth.

It had unusual precolonial political institutions that enabled an unusual degree of participation in the political process and placed restrictions on the political power of elites (Kgotla).⁹

British colonial rule in Botswana was limited. This allowed the precolonial institutions to survive to the independence era.

Exploiting the comparative advantage of the nation after 1966 directly increased the incomes of the members of the elite.

The political leadership of BDP,¹⁰ particularly that of Seretse Khama,¹¹ inherited the legitimacy of these institutions, which gave it a broad political base.

The causal influence of each of these features on Botswana modern institutions is justified by describing the mechanisms through which this influence is conveyed. These mechanisms, theorized in the background literature on development economics to which the authors refer, are used to explain why these factors are causally relevant to the emergence of property rights institutions and how. Consider, for instance, the mechanism of political losers (Acemoglu and Robinson 2000):

An institutional setup encouraging investment and adoption of new technologies may be blocked by elites when they fear that this process of growth and social change will make it more likely that they will be replaced by other interests—that they will be political losers. Similarly, a stable political system where the elites are not threatened is less likely to encourage inefficient methods of redistribution as a way of maintaining power. (Acemoglu, Johnson, and Robinson 2003, 103)

According to the mechanism of political losers, elites do not oppose the adoption of institutions and policies favorable to growth if they feel their power not being threatened by the change. In Botswana (feature 2) precolonial institutions ensured some degree of political stability and went almost unaffected by the imposition of British colonial rule (feature 3). In addition, the legitimacy of Seretse Khama and the broad coalition he formed further strengthened it (feature 5). The authors conclude that features 2, 3, and 5 influenced the building of property right institutions by setting the mechanism of political losers at work and thus ensuring a high degree of political security to the existing elites. A similar use is made of the other two theoretical mechanisms: they trace the causal relations among the factors that jointly determine the emergence of property rights institutions and the ensuing economic growth.

In the final section of the work, Botswana is compared with four African countries—namely, Somalia, Lesotho, Cote d’Ivoire, and Ghana. The comparison is, however, only partial. It has the purpose of checking the hypothesis that the difference in one factor is sufficient to disrupt one of the mechanisms conducive to property right institutions and therefore to growth. Each country is thus compared to Botswana with respect to one or two of the features above and never along all five dimensions. For instance, consider the third mechanism that Acemoglu, Johnson, and Robinson (2003) describe, which they call constraints.

When (precolonial) institutions limit the powers of rulers and the range of distortionary policies that they can pursue, good policies are more likely to arise (see Acemoglu and Robinson 1999). Constraints on political elites are also useful through two indirect channels. First, they reduce the political stakes and contribute to political stability (mechanism of political losers), since, with such constraints in place, it becomes less attractive to fight to take control of the state apparatus. Second, these constraints also imply that other groups have less reason to fear expropriation by the elites and are more willing to delegate power to the state. (104)

The mechanism of constraints is set into operation by factor 2—that is, by the type of precolonial institutions. If these institutions have the right properties, as kgotla had in Botswana, then they are effective in placing constraints on rulers. These, in turn, affect the emergence of property right institutions both directly and indirectly. At the same time, factor 3 can have the opposite effect of inhibiting the mechanism of constraints. In fact, strong British colonial rule alters precolonial institutions and disrupts the mechanism of constraints if it was in place before.

Acemoglu, Johnson, and Robinson thus compare Botswana and Somalia with respect to factors 2 and 3 and find them similar in factor 3 but different in factor 2. Similarly to Botswana, British government had in fact only marginal interest in Somalia and imposed very soft colonial rule. Somalia had, however, precolonial institutions that induced intense factional conflict and were therefore incapable of placing constraints on political elites. From this partial difference in the type of precolonial institutions between the two countries (factor 2), Acemoglu, Johnson, and Robinson infer that the mechanism of constraints was not operating properly in the country and it thus impeded also the proper working of the mechanism of political losers. As a consequence, property right institutions did not emerge and in turn its economic performance faltered.

A similar line of reasoning is then applied to the comparison with the other countries. The comparison is limited to one or two structural features whose difference is taken as evidence that the corresponding mechanism is not operating properly in the context of comparison. These observations license the inference that this explains the absence of appropriate institutions and, so, the bad economic performance.

It seems that Acemoglu, Johnson, and Robinson reason in a way akin to what Daniel Steel (2008) calls comparative process tracing. Comparative process tracing consists in comparing the mechanisms in the study and the target at the “critical junctures.” From this limited comparison, one can establish the presence (or absence) of the mechanism in the target and, thus, the presence (or absence) of the causal relationship that ensues. In this case study, Botswana is compared to each target case in only a limited number of factors involved in the working of the mechanisms that are conducive to growth through the emergence of property right institutions. Partial comparison is possible because the causal structure (or causal mechanisms) through which the relevant factors convey their causal influence is specified. If the causal relations among the factors are described, the difference in a subset of them is sufficient to formulate conclusions about the behavior of the causal mechanism and the causal factors connected to it. This study therefore can be said to attain a high degree of comparability because a limited amount of information of the target case is sufficient for formulating an assessment of its external validity. In particular, it shows higher comparability than the one I illustrate below. Nevertheless, even in this case, there is still room for improvement. Mechanisms are, in fact, sketched rather than fully specified. Epistemic efficiency increases when the causal structure is more precisely displayed.

* * *

In the second case, the hypothesis whose generalizability is at stake is the policy hypothesis about the effectiveness of community-based programs to defeat malnutrition. It draws on a report by the World Bank in the early 1990s on successful nutritional programs in Africa (Kennedy 1991). It aims to identify the factors that make programs against malnutrition work. To this end, it combines the use of two qualitative methodologies—namely, large sample survey and case-oriented studies. Eileen Kennedy (1991) motivates the study by appealing to the fact that the literature on malnutrition in the 1970s and 1980s focused on only the types of interventions implemented. The 1989 meeting of the International Nutritional Planners in Seoul, however, concluded that “how” a program is implemented is as important as, or maybe more important than, the type of intervention for successful programming (1). In line with the recommendations of nutritional planners in Seoul, Kennedy thus uses survey and case studies jointly to identify what factors matter for the effective implementation of programs against malnutrition. The ultimate goal is to learn lessons that can be generalized to other African contexts (2). The evidence is combined in the following way.

The survey offers prima facie evidence of what factors are required for successful implementation. It combines the findings from 110 answers received from policy makers and program implementers. The respondents answered two types of questions: whether the program was successful and what factors they thought were key to its success.¹² Furthermore, 6 programs among the 66 found to be successful were selected for an in-depth analysis: the Macina Child Health Project, in Segou Region, Mali; the Infant Feeding Project, in Togo; the Imo State Child Survival Project, in Nigeria; the Applied Nutrition Program, in Ghana; the Mali Institutional Development Enterprise and Nutrition Program, in Mali; and the Nutrition Project, in Kinshasa, (formerly) Zaire. These projects were selected out of the 66 because they represent different types of community-based programs that can succeed in combating malnutrition. The survey and additional interviews singled out seven factors as important for successful implementation: community participation, program flexibility, institutional structure, recurrent cost recovery, multifaceted program activities, training and staff qualifications, and infrastructure (Kennedy 1991, 7). The same factors were found to be present in almost all 6 cases. The case studies then focus on how the seven factors were actually implemented in the specific context.

The main conclusion drawn from the report is that different types of programs can succeed in defeating malnutrition. Whether they actually succeed, however, depends on a set of conditions regarding how the programs are in fact implemented. Even though the study cannot be considered in itself sufficient to definitely establish which of the factors identified are ultimately necessary to the effectiveness of the program (to this end, a quantitative analysis is further required), it offers prima facie evidence for it. It thus gives support to the idea that there is no such a thing as a one-size-fit-all intervention, but, rather, different ways can be pursued. The strength of this study derives from the effort it makes to single out the causal factors that matter to the successful implementation of the programs against malnutrition. Whether the ones identified are all and only the factors that are jointly necessary for the causal relationship to ensue is a problem of internal validity. Nevertheless, we can consider the study effectively comparable in the sense described above. By listing the set of causal respects that matter to the causal hypothesis, it enables the comparison with new cases along the relevant dimensions. This allows, then, for the formulation of conclusions about its external validity. Whether these conclusions are reliable depends in the end on the confidence we have in the causal inference drawn within the studied case.

These case studies, however, fail to develop a narrative that explains why the factors identified matter to the success of the program and how. Their contribution is, in fact, limited to a more or less exhaustive list in which the relevant factors are described in detail, and so are the modifications in the specifics of implementation that occurred over time. An analysis of the causal processes at work is however lacking, despite the fact that this was set as a research goal at the very beginning of the paper (Kennedy 1991). Differently from the case discussed above, the causal relationships among the factors of implementation are not specified, and the underlying causal structure is thus not even hinted at. As a consequence, even though we can consider these studies effectively comparable to new target contexts, they have low epistemic efficiency. The comparison needs in fact to be full. This would provide an identification of the work done by each factor in all relevant causal respects in the case before any attempt to evaluate the generalizability of the hypothesis of interest.

Conclusion

The debate on the external validity of case study research is stunted and, so far, developed under the influence of the statistical perspective. The resulting approach was biased in an unfruitful direction that ultimately led the debate to the dead end, where it seems to stand now. Reaching external validity in case study research was, in fact, essentially regarded as a hopeless endeavor. This conclusion stands in a stark contrast with the struggle for generalizations in which case study researchers engage at the same time. One way to solve this tension is by making external validity a decidable issue. This can be done once we abandon the old paradigm, the traditional view, and its focus on representativeness as the problem of external validity. External validity becomes decidable only if the study first renders available the evidence that is necessary to circumvent the epistemic impasse in which any inference of this kind finds itself. Making case studies stronger in external validity therefore means strengthening the design in such a way that it helps researchers reach judgments of external validity with a higher confidence and epistemic efficiency. One way to this end is by enhancing its comparability.

External validity should be right on the agenda of philosophers and methodologists who worry about making case study research a design better understood and better used. Generalizing is not only a valuable goal in itself for scientific practice but also the middle step on the way to sound policy making. Drawing the right lessons from the contexts we know already is part and parcel of planning effective interventions in contexts with which we are not acquainted yet. One way to learn how to use the knowledge obtained from the former in the latter is by discussing issues of external validity. The inferential problem that it underpins is in fact the first the scholar encounters when transferring the knowledge gained from epistemically privileged systems to less privileged ones. Experimentalists were the first to worry about it, and they still reflect upon it extensively. I think case study researchers should do the same.

Footnotes

Acknowledgements

I want to thank Paul Roth and Stephen Turner for their valuable comments on earlier versions of this article and the audiences of the Philosophy of Social Science Roundtable in Paris (March 2011), the Erasmus Institute for Philosophy and Economics PhD seminar in Rotterdam (May 2011), and the conference of the International Network for Economic Methodology in Helsinki (September 2011). I am also grateful to Lara Kutschenko, René Lazcano, and Luis Mireles Flores for much helpful discussion. Remaining errors and omissions are mine.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

The author(s) received no financial support for the research, authorship, and/or publication of this article.

1

is an interesting case in point. His book on the principles and practice of case study research devotes two chapters to investigate techniques to strengthen the internal validity of case studies; external validity, after being briefly mentioned in the introductory chapter, is only discussed indirectly in the chapter on case selection.

2

They are not, by themselves, sufficient for this conclusion.

3

4

Typicality might be taken as a restrictive condition. In the literature on external validity, several scholars endorse similarity instead, which is a broader concept. I discuss the relation between typicality and similarity below.

5

CEV stands for condition for external validity.

6

EC stands for epistemic criterion for assessing the external validity of the study.

7

Guala refers to Debra Mayo as a defender of this view.

8

Guala says that in experimental economics, this causal knowledge can be sometimes black-boxed without implications for the external validity inference (, 1080).

9

Kgotla is an assembly of adult males in which issues of public interest are discussed (Acemoglu, Johnson, and Robinson 2003, 93).

10

BDP is the dominant party in the country and stands for Botswana Democratic Party.

11

Seretse Khama was BDP political leader and president of Botswana from 1965 until 1980.

12

In sum, 110 individuals/institutions responded to the mail survey, out of the 330 initially contacted. In addition to this first pool, 78 individuals involved in various ways in the implementation of programs against malnutrition were interviewed.

References

Acemoglu

Johnson

Robinson

J. A.

2003. An African success story: Botswana. In In search of prosperity. Analytic narratives on economic growth, edited by Rodrik

, 80-119. Princeton, NJ: Princeton University Press.

Acemoglu

Robinson

J. A.

1999. The political economy of institutions and development. Background paper for World Development Report 2001. Washington, DC: World Bank.

Acemoglu

Robinson

J. A.

2000. Political losers as a barrier to economic development. American Economic Review 90:126-30.

Bates

Robert H.

Greif

Avner

Levi

Margaret

Jean-Laurent . 1998. Analytic narratives. Princeton, NJ: Princeton University Press.

Brady

H. E.

Collier

2004. Rethinking social inquiry: Diverse tools, shared standards. Lanham, MD: Rowman & Littlefield.

Campbell

D. T.

Stanley

J. C.

1963. Experimental and quasi-experimental designs for research. Chicago: Rand McNally.

Cook

T. D.

Campbell

D. T.

, eds. 1979. Quasiexperimentation: Design and analysis issues for field settings. Chicago: Rand McNally.

George

A. L.

Bennett

2005. Case studies and theory development in the social sciences. Cambridge, MA: MIT Press.

Gerring

2004. What is a case study and what is it good for? American Political Science Review 98:341-54.

10.

Gerring

2007. Case study research: Principles and practices. Cambridge, UK: Cambridge University Press.

11.

Guala

2002. Experimental localism and external validity. Philosophy of Science 70:1195-205.

12.

Guala

2005. The methodology of experimental economics. Cambridge, UK: Cambridge University Press.

13.

Guala

2010. Extrapolation, analogy, and comparative process tracing. Philosophy of Science 77:1070-82.

14.

Jimenez-Buedo

Miller

L. M.

2010. Why a trade-off? the relationship between the external and internal validity of experiments. THEORIA 25:301-21.

15.

Kennedy

E. T.

1991. Successful nutrition programs in Africa: What makes them work? Washington, DC: World Bank.

16.

LeCompte

M. D.

Goetz

J. P.

1982. Problems of reliability and validity in ethnographic research. Review of Educational Research 52:31-60.

17.

Lucas

J. W.

2003. Theory-testing, generalization, and the problem of external validity. Sociological Theory 21:236-53.

18.

Mahoney

Goertz

2006a. Scope in case study research. Unplublished working paper.

19.

Mahoney

Goertz

2006b. A tale of two cultures: Contrasting quantitative and qualitative research. Political Analysis 14:227-49.

20.

Ragin

1992. What is a case? Exploring the foundations of social inquiry. Cambridge, UK: Cambridge University Press.

21.

Ragin

2000. Fuzzy-set social science. Chicago: University of Chicago Press.

22.

Rodrik

2003. In search of prosperity: Analytical narratives on economic growth. Princeton, NJ: Princeton University Press.

23.

Seawright

Gerring

2008. Case selection techniques in case study research. Political Research Quarterly 61:294-308.

24.

Seawright

Gerring

2008. Across the boundaries: Extrapolation in biology and social science. New York: Oxford University Press.

25.

Seawright

Gerring

2010. A new approach to argument by analogy: Extrapolation and chain graphs. Philosophy of Science 77:1058-69.