Abstract
The historical turn in social science has prompted scholars to engage with the work of historians on a large scale. Here, social scientists face two standard problems of selection bias: confirmation bias and convenience sampling. So far, the record of dealing with these problems has been poor, and little has been done to specify how social scientists can sift a broader body of historiography. We present a criterial framework that describes how social scientists can mitigate bias when using historical studies. We term the idea behind this framework the Ulysses Principle because it can be understood as a way of avoiding the siren call of confirmation bias when using historical sources as the primary evidentiary base. The framework includes considerations about conceptual consistency, the theoretical vantage points of historical sources, and updated evidence. The three criteria and the trade-offs between them are illustrated using two recent examples from comparative historical analyses.
He who is deficient in the art of selection may, by showing nothing but the truth, produce the effect of the grossest falsehood.
However, there is a catch with which every social scientist working with historical data is painfully familiar. When going historically, do we “really have a representative body of data from which to draw conclusions?” (Webb, Campbell, and Schwartz 1970:54). As we turn to distant times, primary sources become scarcer, and the secondary data available to social scientists mainly take the form of narrative works by historians, which are often riven by disagreements (cf. Skocpol 1979:14-15). 3 This raises the problem of how to process an often bewildering variety of sources, that is, historiography, in ways that allow social scientists to identify the relevant facts and to translate these into something that fits their conceptual containers.
The fact that historians also interpret the evidence complicates matters further as their implicit theories and historiographical perspectives (e.g., Marxist or Annales) influence the evidence they present (Becker 2008:109-45; cf. Kuhn 1962). 4 These biases are then imported by social scientists who use these historical sources without systematically considering the vantage point and premises on which they are based. A series of problems pertaining to selection bias 5 are likely to follow (Goldthorpe 1991; Lustick 1996). First, there is an inherent danger of confirmation bias as social scientists will be prone to invoke “works by historians using implicit theories about how events unfold and how people behave very similar to the theory under consideration” (Lustick 1996:607). Second, the use of historical works is often characterized by convenience sampling or what has been termed the fortuitous fallacy (Fischer 1970:97-99), which comes in two forms: Either scholars simply select the first pieces of relevant information they come across 6 or they browse several secondary sources and accept the dominant interpretation without weighing in the quality of the different accounts.
Needless to say, these problems apply not only to historical analysis. 7 In fact, confirmation bias in particular is a natural propensity or inclination of human beings (Nickerson 1998). As Evans (1989) points out, “[c]onfirmation is perhaps the best known and most widely accepted notion of inferential error to come out of the literature on human reasoning” (p. 41). However, it is likely to be especially pronounced in historical analysis because any historical description that is not exhaustive is by definition a result of selection (McCullagh 2000:42). Therefore, selection bias has to be held in check when the purpose is to use historical evidence to make descriptive or causal inferences.
Historians are acutely aware of this risk because they are, by training and general inclination, “suspicious of the way [other] historians constructed proofs of their hypotheses out of nonquantitative data” (Hexter 1979:250). However, it seems fair to say that much recent social science “doing history” has had a poor record in this respect (see, e.g., Kreuzer 2010; Møller 2016). There are probably many reasons why social scientists tend to treat history in a superficial way including scarce resources (for instance, limited language skills and training in collecting and analyzing primary sources), lack of interest in the historical facts in themselves, and incentives to frame findings in a resounding way and to publish them quickly. However, at least part of the problem may be that social scientists lack consciousness about the challenges they face and that they lack clear criteria for how to sift a broader body of historiography. 8
It is therefore surprising that the issue of how to process historiography has been widely ignored in recent methodological debates. For instance, the topic has not been systematically addressed in the recent debates about data access and research transparency (see Büthe and Jacobs 2015; Lupia and Elman 2014), it has received sparse attention in works on process tracing (see Beach and Pedersen 2013, 2016, Ch. 6; George and Bennett 2005), and the recent calls for a historical turn in the social sciences have barely touched upon this challenge (Capoccia and Ziblatt 2010; Mahoney and Thelen 2015).
Against this backdrop, we propose three criteria for how social scientists can mitigate bias when using the works of historians in their research. These criteria are operational and relatively straightforward to apply. They include being conceptually consistent when processing the work of historians, giving less weight to historical works of other social scientists and to work by historians with similar theoretical claims as the one under consideration, and favoring updated empirical evidence (see Figure 1). The three criteria can be used both as a guide for scholars who directly engage with historical accounts and as a checklist for scholars who wish to determine the quality of already published work in order to decide whether or not to invoke a given finding or set of findings.

A criterial framework for the use of historical work.
We term the general idea behind the criterial framework the “Ulysses Principle” because it can be seen as an attempt to tie oneself to the mast to avoid the Siren call of confirmation bias and convenience bias when navigating the treacherous waters of historiography. This article proceeds as follows. First, we review previous advice about how social scientists can enlist the findings of historians. Second, we elaborate and specify the three criteria. Third, we illustrate them by revisiting two important recent contributions to comparative historical analysis.
State of the Art
The most elaborate attempt to discuss how social scientists can use historical sources remains Lustick’s (1996) two-decade-old exposition of the problems that follow from an unconscious or merely nonchalant reading of historiography when generating historical data (see also Goldthorpe 1991). 9 Lustick’s core advice, which we endorse, is simple: Social scientists must have a deeper understanding of the works of historians to avoid selection bias due to, for example, overemphasizing unconvincing or disputed evidence that supports their theories. That is, we need to account for the patterns in historiography on a particular subject and to try to find a systematic way of choosing which historical interpretation to rely on (and which to avoid) when the works of historians serve as the evidentiary base that we use to appraise certain hypotheses. This point is relevant both when social scientists use historical evidence to assign values to variables that can be used in cross-case analysis and when they use historical evidence to analyze a particular historical process based on within-case analysis or historical narratives (see Beach and Pedersen 2016; Lange 2012). It is even relevant when quantitative-minded scholars whose inferences are based on statistical analysis enlist historical anecdotes for illustrative purposes. 10
More particularly, Lustick (1996) urges social scientists to “treat our database as ‘historiography’ or ‘histories’ and not ‘History’,” thereby increasing the number of cases “from the number of episodes to the number of accounts of those episodes” (p. 605). This procedure enables scholars to adjudicate between conflicting interpretations by assuming a normal distribution of points of view and then opting for the consensus. 11
We are sympathetic to the idea that social scientists should look at historiography rather than “history,” but we find Lustick’s more particular suggestions problematic. The idea that we can expand the N by treating each historical account as a “case” presumes an independence of observations which is implausible considering that historiography is basically a product of interactions (often vehement) between historians. The idea of a normal distribution falls prey to what Fischer (1970) terms the “fallacy of the prevalent proof” (p. 51). After all, majority opinion may be wrong. Most problematically, Lustick fails to take the notion of research cycles where knowledge evolves into consideration. In fact, he is silent about whether updated evidence should be treated in different ways than older evidence. 12 If we followed what might be termed the “consensus theory of truth,” 13 a number of convincing breakthroughs within historiography—normally based on new primary evidence—would be left out of consideration because they would, by definition, make up a minority viewpoint until enough further work had corroborated them. This way of thinking would therefore mean that social scientists would be biased toward invoking older historical evidence at the expense of newer evidence, which is the exact opposite of what we advise below.
Instead, we argue that social scientists should consider more carefully the shape of the distribution within historiography if the evidence we extract is not to be affected by selection bias. This is the core logic behind what we term the Ulysses Principle. More particularly, we urge scholars to factor in the nonindependent character of the historical observations, some of which are likely to be biased by the general approach or explanatory thesis of their authors. To this end, we propose a criterial framework that serves to guide social scientists selecting historical evidence. This is based on the idea underlying Gerring’s (2001:ch. 2) more general “criterial framework” for good social science, the premise of which is that practical research is characterized by circularity and trade-offs between different criteria due to the high level of interdependence between the associated tasks. 14 After formulating the three criteria, we discuss a series of possible trade-offs that may affect how they can be applied in practice. As in the case of Gerring’s criterial framework, the list of criteria we propose is not exhaustive. We include some tentative suggestions about additional criteria later in this article, but we believe that the three criteria formulated and discussed below are the most important ones if the objective is to avoid selection bias when social scientists enlist the narrative work of historians.
More particularly, the Ulysses Principle (and hence our criterial framework) is inspired by the way social scientists doing experiments take precautions against influencing the evidence, for example, by using double-blind designs and—following the most recent trend—by preregistering propositions and research designs (see Miguel et al. 2014; Webster and Sell 2014). Seen from this vantage point, our criteria can be understood as a way of facilitating what has been termed symmetrical testing, that is, assessing one’s own explanation as critically as one would assess alternative explanations (cf. Bennett and Checkel 2015:24). 15
Criterion I: Conceptual Consistency
The first thing social scientists should consider when processing historical studies is what we term conceptual consistency, that is, whether there is an equivalence between their concepts and those used by the scholars whose historical descriptions are being enlisted as empirical evidence (Goertz 2006:95-96).
16
Indeed, this is the most important criterion to observe when enlisting historical data as the analysis becomes meaningless when violated. Fischer (1970) elaborates this point nicely: …every fact in history is an answer to a question, and that evidence which is useful and true and sufficient in answer to question B may be false and useless in answer to question A. A historian must not merely get the facts right. He must get the right facts right. From this simple rule of relevance may be deduced: historical evidence must be a direct answer to the question asked and not to some other question. (p. 62)
The criterion of conceptual consistency does not mean that social scientists can only invoke the findings of historians if there is perfect equivalence between their concepts and those used by these historians. The point is that social scientists need to make the case that the historians (or, for that matter, other social scientists) are describing empirical features that, at a minimum, are relevant based on their own definitions—whether they are using this information for historical narratives or to code variables. The challenge of conceptual consistency is likely to be especially pronounced if the concept in question is multidimensional—as in the case of feudalism in the examples drawn from the literature that we use below. With more specific unidimensional concepts containing fewer defining attributes, it is often easier to know whether historians’ findings can be invoked. A good example is war, which is normally defined solely on the basis of the number of battle deaths (although a second dimension concerning the identity of the parties to the conflict is often introduced to decide whether it is an internal or external war).
When working with multidimensional concepts, social scientists can follow Sartori’s (1970) advice to ascend the latter of generality, that is, they can apply more abstract concepts that subsume the context-sensitive meanings of historians. Likewise, social scientists can score disaggregated indicators based on broader historical findings and then proceed to aggregate a composite concept. However, even in these cases, social scientists need to be self-conscious about establishing equivalence. From this follows:
We can illustrate the kind of consistency we are urging by touching upon the burgeoning literature on the advent and character of medieval representative institutions (Abramson and Boix 2017; Blaydes and Chaney 2013; Stasavage 2011; van Zanden, Buringh, and Bosker 2012). Here, social scientists are in the fortunate situation that among historians, these institutions have attracted “perhaps more scholarly attention than any other subject within the institutional history of medieval Europe” (Cerda 2011:62) and that historians broadly use the same concepts as social scientists. Already Marongiu (1968) contrasted “pre-parliaments” or assemblies of notables with genuine parliaments or representative institutions (pp. 52-53), and historians and social scientists broadly agree on the operationalization of the latter. A pre-parliament or assembly of notables turned into a genuine, representative institution the moment localities—town councils or shires—began to send representatives (proctors; Kagay 1981; Maddicott 2010; Procter 1980; van Zanden et al. 2012). This means that it is relatively easy to achieve a high conceptual consistency when enlisting the works of historians (see Abramson and Boix 2017; Møller 2017).
Criterion II: The Vantage Point of Historical Accounts
Social scientists often make empirical distinctions based on prior work by historians or by historical sociologists without any discussion of the implicit or explicit theoretical perspectives of these prior studies. However, the eye of the beholder matters for what we see. All history is written from some vantage point, and all historical facts available to social scientists have in some sense been preselected (McCullagh 2000:42). Therefore, we need to take the purpose of prior historical works into consideration when sifting the empirical evidence presented.
Social science work invoking history is virtually always guided by an explicit explanatory purpose; the works of historians are often not as they simply seek to describe a set of events in the most valid way possible. Here, there is at least a difference in degree concerning the extent to which the reading of the evidence is colored by the theoretical purpose. 18 It follows from this that one should beware of building empirical edifices on the foundation of historical analysis done by other social scientists. It is simply dubious to assume that such work combines the primary or secondary evidence of historians in a way that makes it possible to establish valid empirical distinctions. At most, one can use social science work as an entry point for identifying historical sources to consult—and even here, one should recognize that these sources have been preselected from a broader body of historiography. Hence, as a general rule, social scientists should shy away from using the works of, for example, historical sociologists or economic historians as data sources and instead directly enlist the primary or secondary sources presented by trained historians. 19
However, even when dealing with works by historians, we face some of the same challenges. Indeed, some historians present accounts that clearly are theoretically colored. 20 As anticipated in the discussion of Criterion I, this is not solely a question of whether the researcher promotes a particular explanatory claim. The more general approach (e.g., a Marxist study or a study in the Annales tradition) will tend to shape everything from the selection of the object of study, to the sources emphasized, and the notion of causality employed. What to do in this situation?
It follows from our considerations that the ideal sources to turn to are studies by historians that do not propound an explicit explanatory thesis. As the Kagay example at the end of this section shows, these are obviously those least “polluted” by the theoretical vantage point. Moreover, there is often a positive correlation between a narrow temporal and spatial scope and less theory-driven works. More atheoretical works normally attempt to cover less ground, which is a further advantage when relying on them for valid data.
However, if this is our only yardstick, there is a danger that we will end up relying on the lesser historians or—more generally—disregarding much needed data. Hence, if good atheoretical works by historians are not available, we urge social scientists to consider whether or not the theoretical vantage point of the historical work is in line with the hypotheses under consideration. A theory-laden study based on a theoretical vantage point that conflicts with the thesis that the social scientist is probing will sometimes provide even stronger evidence in favor of a particular thesis than a relatively atheoretical study. Thus, our warning against studies with a strong theoretical bent mainly concerns those where the explanatory thesis or the general approach corresponds with the expectations of the social scientist enlisting historical evidence. Obviously, this is where the danger of confirmation bias is greatest. If social scientists wish to include this kind of work as part of their evidentiary record, they at the very least need to cross-check whether the information they extract from such sources is corroborated by other work that does not conform to the same vantage point. From this follows:
This criterion can be seen as a more particular version of Webb et al.’s (1970:5) general advice about measurement: “Components ideally should be weighed according to the amount of extraneous variation each is known to have and, taken in combination, according to their independence from similar sources of bias.” On an even more abstract level, this criterion can be said to rest on a more general principle of science, which sustains a number of common practices in empirical research. For instance, attempts to come up with many observable implications, each of which a theoretical argument could fail, and the use of placebo tests, where no causal effect is expected, are related ways of avoiding confirmation bias. 21
We can once again use the literature on representative institutions to illustrate our point. A good example of the kind of secondary work unaffected by a general explanatory bias is Kagay’s (1981) overview of the development of the corts of Catalonia and the cortes of Aragon and Valencia, respectively, in the period 1064–1327. Kagay reviews primary sources, mainly from archives and chronicles, and some secondary sources by other historians with the simple aim to establish (i) when assemblies convened, (ii) which groups were summoned, (iii) what was on the agenda, (iv) what was decided, and (v) whether groups arrived as representatives or not. Every now and then, Kagay reflects on possible causes of these developments—for instance, by relating some of his findings to other historical work—but he does not propound a particular thesis. This is probably as good as it gets for social scientists trying to avoid bias when enlisting works by historians (see Møller 2017).
Criterion III: Updated Evidence
Social scientists routinely invoke older historical interpretations in their work. This probably reflects either that they sometimes stumble over studies that they find interesting or that these older studies have become famous outside of the academic discipline of history. This is not necessarily problematic. Indeed, many insights can be gleaned from such studies. The problem is that social scientists often use these older works as part of their evidentiary record without taking into account more recent developments within the field. This might make sense, but it need not, and if older studies are to be used for historical evidence, they must first be situated within a broader body of historiography. Thus, before invoking older evidence, we need to probe how it fits with the general development of literature on the subject and, more particularly, whether more recent evidence has overturned it.
Let us push a bit more at these issues. We can begin by noting that a large number of theoretical insights formulated by scholars doing comparative historical analysis seems to have sprung from reading older work by historians. For instance, Moore’s ([1966] 1991) insight in Social Origins of Democracy and Dictatorship that English democracy was not a product of gradualism but based on a bourgeois revolution is to a large extent based on Tawney’s works, some of which date back to 1912 (Lustick 1996:608-610). Skocpol’s Protecting Soldiers and Mothers from 1992 was, according to her own account, triggered by reading Rubinow’s Social Insurance, With Special Reference to American Conditions from 1913. Some of the most important theoretical insights of Thomas Ertman’s (1997) Birth of the Leviathan were based on a reading of two essays by Hintze, which had originally been written in the late 1920s and early 1930s (Hintze [1929] 1962, [1930] 1962). Finally, Hartz’s (1955) and Lipset’s (1963) famous works on the roots and content of a special American political culture were inspired by observations made by Tocqueville ([1835/1840] 2012) in Democracy in America.
However, gaining theoretical insights is one thing, enlisting empirical evidence is another. One of the key criticisms of Moore’s Social Origins is that he invokes Tawney’s (1912, 1941) work not only to formulate his theory but to corroborate it empirically (see Goldthorpe 1991; Lange 2012:147; Lustick 1996:608-10). There are several problems with this. The first is that Moore favors Tawney’s work over other historical interpretations of the English development, which is potentially problematic considering that Tawney’s thesis is similar to Moore’s in key respects (see Criterion I). Yet an often ignored point is that most of Tawney’s work was rather dated when Moore used it for empirical evidence. For this to make sense, it would be necessary to show that more recent empirical evidence had not overturned Tawney’s findings. To his credit, Moore ([1966] 1991) attempts to do this by arguing that Tawney’s interpretation is superior in spite of more recent work contradicting it (pp. 6-8, fn. 6, 8). This is not entirely convincing, though, considering that later studies have been able to consider Tawney’s finding and enlist new evidence.
We here formulate a simple guideline based on the assumption that scholarship is more often than not cumulative: The empirical findings of older studies must be set aside the more recent interpretations within the field. If there is a conflict between the older findings and the more recent interpretations, we would normally favor the present knowledge, at least if this is based on new evidence that prior scholarship has not taken into account. If the new evidence changes our interpretation of particular events, it means that the older evidence has become outdated.
This is exactly how scholars normally invoke evidence or arguments. Knowing prior debates and how the literature as a whole has developed, they routinely disregard outdated evidence and invoke the more recent evidence instead. Needless to say, a research field might not simply leave classic interpretations behind but sometimes bring them back in. The point remains that before we can lift evidence out of a particular field, we need to become acquainted with these general developments, for example, by reading review essays that sketches how the literature has progressed over time. Moreover, social scientists should in general shy away from invoking very old work for evidence. It is paradoxical to come across social scientists who freely base their empirical claims on historical research done 60 or 80 years ago, considering that they would very rarely (if ever) do so within their own field of study.
The scenario where new evidence overturns older interpretations is where we part ways most categorically with Lustick’s (1996) notion of a normal distribution of historical interpretations. New historical research has the advantage that it can consider older interpretations and enlist new evidence that has become accessible, for example, due to the opening of archives or new archeological discoveries. It would be nonsensical to disregard new evidence simply because it—by overturning conventional wisdom—makes up a minority point of view in numeric terms when we review the broader literature. From this follows:
It is important for us to stress that the point of Criterion III is that new information is preferable and should therefore be given more weight when it adds new knowledge, that is, it is not automatically preferable. We should not be dictated by the fads of the field; old knowledge should only be abandoned when the evidence in favor of new knowledge is strong. Here, we urge social scientists to consider carefully whether the new interpretation seems to follow from new evidence or simply from a reinterpretation of already existing evidence.
Numerous examples from the literature could be used to illustrate this criterion as new scholarship often overturns the descriptive findings of older scholarship and hence questions the empirical basis of their causal interpretations (see, e.g., Kreuzer 2010; Mahoney 2003). In the context of revisiting extant analyses below, we include a telling example based on feudalism.
Weighing the Evidence
Our criteria do not mean that historical works with discrepant definitions, studies based on strong theoretical vantage points that corresponds with the thesis under consideration, and older accounts are irrelevant for social scientists. The point is that such sources should be given less weight than historical works with similar definitions, relatively atheoretical studies or work based on competing theoretical vantage points, and accounts presenting more updated evidence, especially if there is a conflict between the former and the latter.
This way of thinking can be further refined by drawing on Bayesian reasoning. This presents a way of explicitly factoring in whether new findings should be attributed more weight than older ones. The point here is that if there is enough new information that contradicts established priors then the posterior will change. A good illustration of how this can be used to get at historical data can be found in Blyth’s (1995) reestimation of the number of deaths in the Gulag using a Bayesian framework. Our criteria can be seen as a simpler version of this approach to weighting historical evidence, that is, they may be said to be based on what has been termed “folk Bayesianism.”
The three criteria can thus be summarized as three variations of the same caveat: Before enlisting evidence from studies premised on different understandings of key concepts, we need to probe whether this information is corroborated by studies whose definitions are more in line. Similarly, before enlisting evidence from accounts with strong theoretical vantage points that are congruent with the thesis of the social scientist, we need to probe whether this information is corroborated by accounts that are relatively atheoretical or that are based on competing theoretical vantage points. Finally, before enlisting evidence from older sources, we need to probe whether more recent evidence has overturned it.
If scholars are coding variables based on historical sources, this implicit Bayesian logic can be formalized by constructing a systematic scoreboard that assigns scores to the selected historical works according to the degree of conceptual consistency, to what extent the theoretical vantage point is in line with the thesis under consideration, and how updated the evidence is. If scholars are using the studies to make historical narratives, the same reasoning applies, but it can be used in a less formalized and hence more qualitative way where the scholars take the criteria into consideration when interpreting the sources.
Potential Trade-Offs Between the Criteria
The very idea of a criterial framework recognizes that in specific research situations, scholars will encounter trade-offs between the criteria that also need to be factored in (cf. Gerring 2001). To illustrate this, we can first return to Criterion III. Here, we face the potential risk of another source of bias, namely, what could be termed “recency bias,” but is more commonly known as argument ad novitam (Fischer 1970: 299-302). In historical fields where much research has been carried out (say, regarding the causes of the French Revolution or the First World War), it is sometimes necessary to present a completely novel or even extreme interpretation to be able to publish a new work. In this case, older studies might be less biased or polluted by the attempt to make the case for a general explanatory thesis (Criterion II). This is something that social scientists need to consider when making themselves acquainted with the development of the historical literature.
Another obvious example would be a tension between considerations about conceptual consistency (Criterion I), on the one hand, and considerations about the theoretical vantage point of extant work (Criterion II) and/or older versus more updated studies (Criterion III), on the other hand. For instance, social scientists might find that unbiased works generally have different definitions of core concepts than what is needed for their analysis, that older works are more in line than newer works with respect to conceptual consistency, or that relatively atheoretical studies are vaguer with respect to (the often implicit) definitions. In these cases, considerations about conceptual consistency (Criterion I) might well trump considerations about Criteria II and III. Here, we can also return to the notion of research cycles. One aspect of these cycles is that vantage points often change in general; for instance, in many fields of study, bottom-up social history has tended to crowd out top-down political history. This might mean that older works are preferable when working with particular concepts.
Thus, there can be good reasons to favor older studies, but in these cases, one must explicitly argue why they are superior given the task at hand (and very old studies should still be avoided). Hence, when employing the criteria, scholars should carefully consider whether trade-offs mean that one or more of the guidelines must be jettisoned to safeguard the others. If so, these trade-offs need to be factored in when weighting the evidence—whether or not this is done in a quantitative or qualitative way.
Finally, we wish to emphasize that our three criteria do not present an exhaustive laundry list of what to consider when social scientists enlist evidence from historians. The reasoning about evidence, we have presented, justifies other cautions. For instance, Hexter (1979) pertinently warns against what he terms “source-mining” or “the examination of a corpus of writing solely with a view to discovering what it says on a particular matter narrowly defined—going through the indexes and leafing through the pages” (p. 241). As Hexter concedes, due to limited resources, all researchers will of course perforate historical works to some extent. However, source mining increases the danger of confirmation bias because the social scientist (or historian) mining the sources is apt to read particular observations out of their context. That is, the danger of finding what one is looking for increases when the broader interpretation of the historian is not used as a check. 22 This could be developed into another criterion. Likewise, issues such as the number of different sources and the extent to which different kinds of data are enlisted as evidence could also make an extended list of criteria for the proper use of history in social science analysis (see Webb et al. 1970:55-56). More important than any particular criteria is the way of thinking about historical evidence on which the criteria are based—that is, the Ulysses Principle.
Illustrating the Criteria
The final part of this article illustrates the criterial framework. To do so, we have chosen to revisit two interesting examples from recent comparative historical analysis, both of which enlist evidence from historical studies to “measure” the concept of feudalism. We have selected these examples because feudalism is a concept with which most social scientists are familiar. The examples show the problems that follow from not observing some or all of our criteria. More particularly, they clearly illustrate our points about the importance of staying updated with the research cycles within the neighboring discipline of history.
Illustration I: Blaydes and Chaney
In a prizewinning article, Blaydes and Chaney (2013) argue and demonstrate empirically that one of the causes for medieval representative institutions in Europe—and their absence in the Islamic world—was a feudal military organization that strengthened aristocracies at the expense of monarchs.
One of the great strengths of the article is that Blaydes and Chaney enlist new numismatic data to measure ruler spells in Europe and the Middle East, respectively. These data are used to proxy their dependent variable “constraints on the sovereign.” However, we will focus on how they attempt to measure their key independent variable, that is, feudalism. Here, Blaydes and Chaney make some simple empirical distinctions between areas of Europe that were feudal and areas that were not, and they establish when the former became feudal. More particularly, they claim that feudalism arose in the eighth century in the Carolingian area and that it had spread to the rest of Western Europe by 1100 AD. They also claim that it had reached all Catholic parts of Eastern Europe by the 14th century whereas it never made genuine inroads in the Orthodox parts of Eastern Europe (Blaydes and Chaney 2013:24, 29-30). It is somewhat nebulous exactly how they document these empirical patterns. 23 However, based on a closer reading, their key claim that feudalism first emerged in the Carolingian Empire seems to be based on Strayer (1970), whereas their description of how it spread across the European space is based primarily on the historical description of Anderson (1974). 24 Let us try to see how this fits with our criteria.
Criterion I (conceptual consistency)
Blaydes and Chaney (2013:20, fn. 13) define feudalism “as a system of military mobilization and organization distinct from manorialism, the economic system that provides the basis for feudalism.” Strayer 25 and Anderson operate with a very different understanding of feudalism, which has more to do with the political regime than with military organization (Fulbrook and Skocpol 1984:183-84; Møller 2015). 26 Consequently, Blaydes and Chaney’s work is characterized by conceptual inconsistency as a particular case might not be feudal in a military sense just because it is feudal in a political sense (Møller 2015, 2016).
Criterion II (the vantage point of historical accounts)
A social science work such as Anderson’s book is not a good source for making empirical distinctions for the simple reason that his interpretation of the works of historians is likely to be colored by the explanatory purpose: Empirical evidence is preselected to support a particular interpretation. 27 To return to one of our previous examples (see fn. 19), there is little difference between referring to Anderson’s Lineages of the Absolutist State for empirical evidence and citing Marx’s Das Kapital for a similar purpose. 28 More particularly, by enlisting Anderson and Strayer’s studies to make empirical distinctions with respect to the feudalism variable, Blaydes and Chaney fall into the trap of confirmation bias. This is the case because both Strayer and Anderson’s definitions include the attribute of “fragmentation of political power.” In fact, in Anderson’s case, the concept of feudalism rather explicitly includes institutions of constraints such as representative institutions (cf. Fulbrook and Skocpol 1984:183-184). In this way, Blaydes and Chaney end up scoring their independent variable based on what are political correlates (or even constitutive attributes) of the institutions of constraints they attempt to explain with this variable. 29
Criterion III (updated evidence)
Strayer and Anderson’s accounts are also clearly dated. Indeed, the extent to which this is the case is rather staggering. Blaydes and Chaney completely disregard that most of the historians working on the subject in recent decades have been hugely skeptical of whether feudalism ever characterized medieval Europe (see Bisson 2009; Brown 1974; Reynolds 1994, 2012; Ward 1985). This body of work is commonly known as “the revolt against feudalism.” The most important work here is that of Reynolds (1994, 2012), which Blaydes and Chaney do not consult.
It should be noted that many of these more recent works present a different definition of feudalism than that of Blaydes and Chaney. For instance, Reynolds (1994) uses a narrow definition based on what she terms “feudo-vasallic institutions.” In that respect, Reynolds’ work might not provide a good measure of Blaydes and Chaney’s feudalism variable, which as described above construes feudalism as a system of military organization. However, neither does Strayer nor Anderson, and in any case, the subject of feudalism is a clear example of a situation where historians have overturned prior “conventional wisdom.” This is something that Blaydes and Chaney obviously need to discuss before making empirical distinctions about feudalism based on older studies.
Illustration II: Hui
We next turn to Victoria Hui’s (2005) prizewinning book War and State Formation in Ancient China and Early Modern Europe. Hui’s endeavor is to explain why a relatively similar geopolitical pressure, anchored in the existence of multistate systems, was conducive to internal checks and balances in European polities and to perpetuating interstate balance in the European multistate system, whereas it created the exact opposite in ancient China, that is, a coercive universal empire that suffocated the competition of the multistate system. This work also has numerous strengths, which we do not consider here. We focus on the way Hui deals with feudalism as a variable that could potentially condition the effect of geopolitical pressure.
In the book, Hui rejects that initial differences in state–society relations can explain the identified divergence in outcomes between ancient China and early modern Europe. The reason is that both contexts were characterized by similar state–society relations in the form of “feudalism” before the intensification of geopolitical pressure (see also Hui 2001). Hui (2005) examines this claim by processing historiography on feudalism in medieval Europe and ancient China (the so-called Zhou feudalism), thus showing that historians argue that these were broadly equivalent (pp. 195-205).
Criterion I (conceptual consistency)
To define the concept of feudalism, Hui (2005:196) resorts to Downing (1992), who construes feudalism as a political regime form. This makes sense in theoretical terms because Hui’s very interest in feudalism is sparked by the potential objection that the later European development of political checks and balances (or constraints) could be endogenous to prior state–society relations. Given this purpose, feudalism should be conceived in terms of incipient political constraints rather than as a mode of production (based on manorialism or landlordism) or a military system (based on the armed vassal). Moreover, Hui (2005) directly bases her empirical observation that feudalism characterized medieval Europe before the intensification of geopolitical pressure on Downing’s work (pp. 195-205), thereby obviously securing a high conceptual consistency on the European side. Hui (2005:196) is also consistent on the Chinese side of the comparison where she invokes Creel’s (1970) work on feudalism in ancient China to show that in this context, too, feudalism preceded the intensification of geopolitical pressure. Creel’s (1970:196, fn. 143, 32, fn. 10, 319-320) work is based on Strayer’s ([1965] 1987) definition of feudalism as a method of government, meaning that Hui is looking at historical evidence based on the same concept in medieval Europe and in ancient China.
Criterion II (the vantage point of historical accounts)
Downing’s book is a historical work by a social scientist whereas Creel’s book is a work by a historian. Both have a rather clear explanatory thesis about the existence of feudalism in medieval Europe and ancient China, respectively. Downing (1992:10) uses the prior existence of feudalism to make the case for a crucial premise of his analysis, namely, the existence in Latin Christendom (and its absence elsewhere) of “medieval constitutionalism” before the 16th-century military revolution. A major hypothesis of Creel is that ancient China was characterized by a feudalism uncannily similar to that later found in medieval Europe. 30 Hui thereby enlists work in support of the observation that feudalism characterized both medieval Europe and ancient China that is likely to have preselected sources or interpreted them in a way that biases the analysis toward this conclusion.
Criterion III (updated evidence)
Hui meanwhile ignores that more recent historiography has presented very different findings about feudalism. In the context of discussing Blaydes and Chaney (2013), we have already mentioned that more recent work by historians has questioned the extent to which feudalism characterized medieval Europe. To avoid repetition, suffice to say here that Hui does not in any way discuss this “revolt against feudalism.” For instance, Susan Reynolds’ work is not referenced by her either.
Let us see whether the same objection can be made with respect to the Chinese side of Hui’s comparison. As mentioned above, to a large extent, Hui bases her identification of ancient China as an instance of feudalism on Creel. Doing so, she ignores that more recent historiography on China has been critical about identifying ancient China as feudal (e.g., Cook 1997; Li 2003, 2006, 2008). Hui’s analysis here falls prey to what we above termed the fallacy of prevalent proof. 31 Many historians echo Creel in describing ancient China as feudal, even today (e.g., Hsu 1999). However, scholars in Creel’s generation had little actual access to China, especially with respect to archeology. After the opening of China in the 1980s, a massive amount of new data has therefore been released (cf. Loewe and Shaughnessy 1999:5; von Falkenhausen 2006:18). The point is that this more recent work—which explicitly seeks to establish whether or not feudalism can be identified empirically, and which is based on better archeological evidence—has reached a very different conclusion than the older work (see particularly Li 2003).
Consequences for the General Findings
In the case of feudalism, both Blaydes and Chaney (2013) and Hui (2005) fare rather poorly with respect to our three criteria. In fact, Blaydes and Chaney’s analysis falls short on all three points: The historical accounts they enlist are based on different definitions than their own, they are clearly biased toward identifying feudalism in the areas that fit the predictions of Blaydes and Chaney’s theory, and their conclusions cannot be corroborated based on more updated evidence. Hui’s analysis does well on conceptual consistency, but she, too, enlists dated historical accounts with strong theoretical vantage points likely to create confirmation bias.
As such, the illustrations show that our criteria are not trivial. When enlisting historical studies to deal with a concept with as rich a history as feudalism, social scientists must review the broader historiography to avoid selection bias. To what extent do these problems undermine the findings of Blaydes and Chaney and Hui, respectively? Blaydes and Chaney might well have identified an important disjunction between Western Europe and the Middle East (and to a lesser extent Eastern Europe) taking shape in the Middle Ages. Moreover, this disjunction might have to do with the presence of a strong nobility in Western Europe, as they claim. However, Blaydes and Chaney present virtually no historical evidence that this had anything to do with feudalism as a system of military organization. Obviously, there could be many other historical factors at work, setting Western Europe apart from these other regions in the period from the late 8th to the 15th century.
Paradoxically, the criticism made above does not necessarily undermine Hui’s general findings. One possible reading of the more recent work on feudalism is that neither ancient China nor medieval Europa was characterized by feudalism, that is, the exact opposite of Hui’s empirical claim. This, too, would mean that there were no important initial differences in state–society relations that could condition the effect of geopolitical pressure. However, this is a pure coincidence, and in any case, there is little basis for Hui’s secondary argument that both medieval Europe and ancient China were characterized by (uncannily similar) feudalism. Indeed, we see Hui’s analysis of feudalism in ancient China as a clear specimen of a situation where social scientists must need rely on updated evidence rather than on older studies, even if the latter trumps the former numerically. It also shows why we cannot simply rely on the majority opinion in historiography as the findings of the specialized literature on feudalism have only affected the more general historical descriptions of ancient China to a limited extent. 32 Only a qualitative familiarity with historiography allows us to reach the most valid conclusion in this case.
Conclusion
In this article, our point of departure is that social scientists increasingly enlist the work of historians for evidence but that they often do so in a methodologically unconscious way, which is prone to create problems of selection bias. It is important for us to emphasize just how strong this siren call is. However, we also stress that it can be resisted by taking proper precautions, just as Ulysses did by having himself tied to the mast. Based on an experiment, Mynatt, Doherty, and Tweney (1977) have shown that people tend to choose research settings that do not allow tests of alternative hypotheses. Yet they also show that if falsifying information is explicitly provided, people tend to use it to reject incorrect propositions. Our criteria are meant to ensure that scholars resist the temptation of easy confirmation. The logic behind them is inspired by new developments by scholars doing experiments including double-blind designs and registering preanalysis plans.
We share the concerns about how social scientists engage historiography with Lustick (1996). However, we have gone further in detailing some of the most important challenges and giving hands-on advice for how to solve these. On this basis, we have formulated three criteria for proper use of historical work in social science analysis. The three criteria direct attention to the fact that for social scientists, some historical sources are likely to be less biased than others. Our criterial framework can be used to systematically factor in this point when weighing the evidence, whether this is done in a quantitative way in order to construct variables or in a qualitative way to do in-depth historical analysis in the form of narratives or process tracing.
The total edifice of our criterial framework might at present seems unrealistic for social scientists doing history. Nevertheless, even if they cannot run the gamut, scholars can still try to adopt this way of thinking about historical evidence—and thus increase self-conscious use of it. If these issues are ignored, social scientists enlisting the prior work of historians are likely to import bias, which by definition will create erroneous claims. Seen in this light, it is quite surprising that recent methodological debates have had so little to say about these issues.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
