Abstract

Looking back over the past four decades one cannot fail to be impressed by the advances in the scientific study of conflict processes. One aspect of this rests simply with the growth in the number of people who consider themselves “peace scientists” and with the general acceptance that the field now enjoys. While individual studies continue to be challenged on methodological and epistemological grounds, as they should be, it is rare for someone to make general claims to the effect that international relations simply cannot be studied in a rigorous, systematic and reproducible (i.e. scientific) way. This was not always the case (see, for example, the collection of essays in Knorr and Rosenau, 1969).
More importantly, we have made tremendous progress in becoming a true discipline. Forty years ago, most debates in the field of international relations were merely un-resolvable arguments over which of several “isms” (e.g. realism, idealism, Marxism) provided the best explanation for state behavior. Few theories were based on formal logic and providing “evidence” generally involved demonstrating how a case or two could be interpreted in a manner consistent with one’s argument. The lack of transparency in method, the absence of widely accepted data and the focus on ambiguous grand perspectives made it impossible to cumulate knowledge. Our recent emphasis on stating our theories precisely, our insistence on reproducibility in the use of evidence and our efforts to link closely the development of theoretical and empirical work have brought us to a point where we do build on one another’s work and where we can see a clear progression in our understanding of international conflict processes. It is now commonplace to see the results from empirical work lead us to change our theories and for these new theories to guide explicitly the next round of empirical research.
Our ability to progress has been due in no small part to our willingness to adopt the tools of science. By applying the rules of formal logic, we make our theories far less ambiguous, which in turn makes it possible to test hypotheses. We cannot rely on the ambiguity in our theory to let us claim that whatever we observe empirically can be explained. By relying on statistical analyses and the rules of inference, we make it possible to judge whether the evidence we observe is consistent with our theoretical expectations or not. If we base our empirical conclusions on some fallacy, a reader will eventually point that out. By assembling our data systematically and by paying close attention to the reliability and validity of our measures, we insure that our tests are appropriate for our theories. We can judge whether key concepts are being measured appropriately and whether our variables are represented on scales that are sufficiently precise.
My purpose in this essay is to evaluate how well we (the community of peace scientists) have done at developing the scientific tools used by our discipline. I will argue that we have made absolutely phenomenal advances over the past 40 years in the techniques we use to develop theory and in the tools we use to analyze data, but, we have lagged woefully in our efforts to improve the quality of our data. I will identify some of the problems with our data and I will conclude with a call to the next generation of peace scientists to devote at least some of their creative energies to improving the quality of the data available.
Scientific progress
“Science” is the process by which we improve our understanding of the world around us. Any science has two primary components: theory, which constitutes possible explanations for things that do, or do not, happen; and empirics, which provide a basis for judging whether our theories tell us anything useful about the real world. Generally, we think of scientific progress in terms of theoretical and empirical advances. That is, if we extend old theories or develop new ones that allow us to explain more than we did before, or if we produce new empirical knowledge, we have advanced our scientific understanding. Scientific progress depends largely on the creativity and hard work of scientists, but the ability to achieve progress is often driven by advances in the technologies used in research. Let us consider how these technologies have improved for international relations scholars over the past 40 years.
Theory
Until the latter half of the 20th century, theorizing in peace studies consisted primarily of verbal presentations of grand perspectives. These “isms” (realism, idealism, Marxism, etc.) provided very general views of how international politics “works” and incorporated general notions regarding why wars occur. To be sure, many of those advancing or defending these perspectives attempted to present logically valid arguments, but there was little in the way of identifying specific cause–effect relationships and there were practically no attempts at identifying falsifiable hypotheses. After World War II, there was a significant effort to systematize the arguments underlying at least some of these perspectives. Today’s scholars might see little in the works from that era that would appear rigorous, but I recall one of my graduate school professors relating that one of his graduate school professors dismissed Morgenthau’s work on the grounds that it was “too scientific” to be of any use in the study of international politics. It is absolutely clear that scholars of that era were attempting to be very specific in identifying the assumptions upon which their arguments were based—see Gulick’s (1955) attempt to specify the assumptions underlying balance of power theory for my personal favorite.
Through the 1960s and 1970s, peace scientists began using the tools of formal, mathematical, logic to refine their theoretical arguments. These early efforts generally followed along the lines of Richardson’s arms race model (Richardson, 1960a) that was based on differential equations, or used simple 2 × 2 game theory models. One early example of such work can be found in Schelling (1968: 40): he modeled a deterrence situation with a 2 × 2 game in which column could either behave or misbehave and row could either punish or not. He imposed specific payoffs for the outcomes (as opposed to representing them with variables) and identified a dominant strategy equilibrium. 1 Contemporaries who were not fans of formal theory criticized such efforts as being over-simplistic and not particularly helpful; after all, once you have asserted that column always prefers outcomes associated with misbehaving and row always prefers outcomes associated with not punishing, it is pretty obvious what you will conclude and it is equally obvious that your conclusion will be wrong empirically in many cases.
The usefulness of the early efforts at formalizing our theory was less in the insight provided (although it is easy to underestimate this) than in the fact that these efforts laid the foundation for what was to come. These models were founded on an axiomatic system that makes it possible to trace each argument back to its primitive terms and to follow the logic explicitly to identify any fallacies or contradictions. This established a method for theory building that makes for more rigorous, systematic theory and that makes it possible to see how our knowledge cumulates. 2 These early formal models led fairly quickly to the use of other types of formal models including expected utility models, spatial models, bargaining models and differential equation models. It is particularly interesting to note how far game models have come since the time of Schelling. We now allow for multiple moves, multiple actors, incomplete information, learning, continuous choice sets and fully generalizeable utility functions, and we have a host of solution concepts. One need only to look at any recent article that uses game theory to see how the tool has evolved (see, e.g. Powell, 2004).
My point here is that we have made great advances in the technology of theory building over the past 40 years. The increase in the level of math required to read theoretical work is alone sufficient to prove this point. That would not matter if those advances had not also led to significant improvements in our theoretical understanding. Clearly, they have. Early formal work demonstrated that the underlying logic of the grand theories is fallacious (see, e.g. Niou et al.’s [1989] argument that balance of power theory is vacuous, or Bueno de Mesquita and Lalman’s [1992] demonstration that realism leads to hypotheses that find no empirical support). Other formal theoretic work has led us to understand, among other things, selection effects, commitment problems, audience costs and foreign policy substitutability. It is beyond my purpose here to discuss these theoretical advances further, but it is unquestionable that we have developed a much better theoretical understanding of the causes of war and peace over the past 40 years because of the advances in formal modeling, and that the method allows us to see the cumulation and growth in this knowledge.
Data analysis
In a very strict sense, peace scientists have been engaged in quantitative empirical analysis for millennia: David was smaller than Goliath and the story would not have been interesting enough to record had it been otherwise. Nearly everyone who claims we should not count, or compare different cases, or generalize, actually does so, just not systematically. There were early, sporadic, attempts to bring systematic analysis to peace science (Richardson, 1960b; Wright, 1965), but the major, sustained effort began with Singer’s Correlates of War project. There is no doubt that much of the significance of this project rests with the data collection effort, but having systematically collected data only matters if the means for analyzing it are available.
The advances in the technology available for data analysis that have been made in the past 40 years are astonishing. Much of the early work by Singer and his colleagues relied on bivariate correlation analysis. For example, Singer and Wallace (1970) were being quite sophisticated when they wrote “we run these correlations in two ways, using both a rank-order (Kendall’s Tau) and product-moment (Pearson’s r) statistic” (emphasis added). Today, we expect much more from statistical analysis—consider a table in an article by Dixon and Senese (2002: 563), in which we are told “Main entries are second-stage censored probit estimates with robust standard errors in parentheses. First stage estimations (not shown) include controls for …”. While Singer and Wallace’s work was cutting edge in 1970, I use Dixon and Senese as an example because it is not cutting edge in the 2000s. Quite the contrary, the methodology used by Dixon and Senese is quite common and something we expect everyone to understand. 3
These advances have greatly improved our ability to learn from the empirical evidence. Early advances allowed us to move beyond analyzing only bivariate, linear models. Among other things, this permitted us to consider contingent relationships. We then began adopting methods designed to evaluate specific types of phenomena (e.g. survival models, selection models). Now, we are further refining our methods in ways that significantly reduce the bias in our estimators, that allow our empirical analyses to fit more closely with our theoretical expectations and that provide us with a broader range of choices regarding how we capture empirically the processes we study.
Data collection
In stark contrast to the advances we have made in our methods for developing theory and in our methods for data analysis, our technologies for creating data are nearly identical to those used in 1970. In many cases, we use the exact data that were collected decades ago—in the Dixon and Senese article referenced above, “power” was measured using the Correlates of War (COW) Composite Index of National Capabilities (CINC) score, which is the same as that used in the early 1970s (see Singer, 1972). To be sure, improvements, refinements, updates and corrections have been made to our frequently used data sets. We have made great efforts to improve the accuracy and reliability of our measures. Moreover, technological advances have led to a reduction in the number of errors that creep into any data set. 4 My point is that we are still relying on measures that were developed decades ago, and may therefore no longer “fit” our theories, and that we are using the same data-gathering methods that were employed by the COW project in its early years.
Generally, we rely on only a few types of data. There are events data (based on specific “happenings” like militarized disputes or economic sanctions) and data generated by some entity at regular intervals (such as governmentally reported data on budgets or gross domestic product (GDP), or measures of concepts like democracy or human rights records that are coded by some individual or organization). For the former type of data, we turn to the historical record and have a coder—presumably well trained and armed with a well-prepared and systematic set of coding rules—decide upon the values of the variables that characterize the event. For the latter type of data, we rely on the information provided by the organization and are restricted to their definitions, measurement techniques, veracity and time periods of observation. For all the advances we have made in theory development and in quantitative analysis, we have generally chosen to force existing data to serve our purposes rather than deciding what we need to explore the empirical world as best we can.
This relative lack of progress in our data collection procedures would not be a problem if our data were adequate for testing our theories. They are not. Our newer theoretical developments are based on more rigorous conceptual definitions and they lead to more specific and contingent expectations than did earlier theories. Modern statistical methods give us more efficient, less biased estimators, and we are making great efforts to align these methods with our theories. These advances help little, however, if our data are not capturing theoretically important concepts or contain significant, and possibly biased, measurement error. Several types of problems with our data are becoming increasingly severe.
First, there is a strong tendency for researchers to operationalize their concepts by using existing data, rather than determining what data are needed and collecting it. This is completely understandable given the cost, in time and money, of collecting data and the scarcity of resources for doing so. Particularly since virtually all of our indicators are surrogates anyway, it is easy to justify using what is already available. Scientific peace research has long been criticized on this point, but refinements in theory and statistical techniques make it more of an issue.
Consider the concept of “power,” which many would consider to be the central concept in the study of international relations. The COW project incorporated measures of six variables into its index of national capabilities, CINC. This is highly correlated with other possible measures of power, such as GDP or military expenditures, to some extent because those measures are components of CINC. Early studies that utilized this measure were essentially testing hypotheses derived from realism—which was notoriously ambiguous regarding what is meant by “power” and quite imprecise regarding what effect “power” has on anything. In testing such hypotheses using measures of correlation, imprecision in measurement was not much of a problem. Newer theories are much more precise regarding the nature of “power” (e.g. whether it derives from military capability, economic dependence, etc.), much more specific regarding what is affected by it, by how much, and under what conditions, and our statistical models are able to address these issues. A general, ambiguous measure is not capturing what is theoretically important.
Second, we often use indices to capture our underlying concepts. CINC is one example; measuring democracy with the POLITY data is another, and there are many more. There is nothing inherently wrong with using an index, provided that care is taken to ensure that the index is a valid measure of the theoretical concept it is meant to represent. The validity of most of the indices we use is grounded simply in the fact that some group of researchers adopted them on the basis that they seem reasonable and then, over time and with repeated use, the measures became the standard. The components of CINC, for example, are equally weighted but there is no particular justification for that. Scientifically accepted methods of index construction exist, but, to my knowledge, they have rarely been applied—particularly to many of the indices we frequently use. 5 Again, this is not a major concern when our theory is ambiguous and our data analysis is crude, but it begins to matter when we are testing precise, nuanced and contingent expectations.
Third, our newer techniques for developing theory lead to expectations that are far more precise than is captured in our existing data. I can illustrate this with a personal example. One of my graduate students, Yoshiharu Kobayashi, and I developed a model that, among other things, led us to hypotheses relating changes in foreign aid to changes in defense expenditures for the recipient. We tried to test these hypotheses in the typical, crude way—trying to determine simply if defense expenditures went up when aid went up—and found absolutely nothing. The theory was able to provide much more specific hypotheses, however, and in looking at these we found that our theory would lead us to believe (given the levels of defense expenditures, the amounts of foreign aid and the changes in foreign aid that exist in the data) that almost all of the expected changes in defense expenditure would be less than $500,000. In fact, the vast majority of these were expected to be far less than that. The precision of our defense expenditure data was not up to the challenge: the data were rounded to the nearest $1,000,000. That is, the rounding error in the data swamped the changes our theory led us to expect.
Finally, we have not paid sufficient attention to when our measurements are taken. Many of our variables are measured annually (e.g. GDP, trade figures, democracy), which may, or may not, correspond with our theoretical expectations. We often try to compensate for this by incorporating lags into our analysis, but in reality, this is an ad hoc solution that might not solve the problem. Consider the following: if you see an explosion some distance away, you will not hear the explosion for several seconds. We know that this is because light travels faster than sound and, if we know the distance, we can calculate how long it will take to hear the explosion after we have seen it. If we took a single observation, we might conclude that explosions make a flash, but not a sound (or a sound, but no flash). Moreover, we would not pick up the sound with even two observations, unless the second occurred at precisely the right time. So, if we used a constant lag we would conclude that only explosions x miles a way make a sound. We peace scientists know that many of our effects do not accompany their causes instantaneously. When we adopt a constant lag, we are implicitly assuming that the errors in doing so will average out. As the explosion example shows, however, this is probably a bad assumption.
To show that this can occur in peace research, I again offer a personal example. Glenn Palmer and I were examining the effect of being the target of a militarized interstate dispute on military expenditures and on social expenditures. We had reason to believe that the effect would be positive for the former and negative for the latter. If we examined the relationship at time t, we found the expected positive relationship for military expenditures but no relationship for social expenditures; on the other hand, if we considered time t + 1, we found no relationship for military expenditures and the expected negative relationship for social expenditures. It might appear that we have avoided the problem by conducting the analysis with two different lags. We have not—this is an ad hoc solution and the data problem remains. We have not taken our observations at the times the theory says we should; rather, we are restricted to using the gross, annual observations.
Herein lies the source of the title of this address. The Iron Chef television program pits the worlds’ greatest chefs against each other in competitions to see who can produce the best menu using a secret ingredient that is revealed one hour before the tasting that determines the winner. They have access to the most technologically advanced kitchens possible. Even the worlds’ greatest chef, using the best equipment possible, could not produce a culinary delight if the ingredient was road-kill. Peace scientists have extraordinarily well-developed theory and impressive technologies available to test our hypotheses that follow from this theory. Yet, if the ingredients in the analyses are not desirable, the results will leave us wanting.
What should we do?
So, what do I think we should do? Let me say that I believe we should continue our efforts to advance theory and our ability to examine the evidence to test that theory. One of the things I have most enjoyed about being a political scientist over the past 30 years is that I have seen first hand the advances that come when we are able to build on one another’s work and when there is a useful interplay between theory and evidence. We are a discipline and our research does cumulate. Our debates are usually grounded in logic and the rules of inference and there are recognized, acceptable standards by which we judge our contributions. 6 That was not so much the case 60 years ago. My point here is that I believe that further, significant, advances require that we think seriously about our data. We need to improve it along several dimensions. The question then is how do we do that?
First, we have to rely less on what information is readily available and more on what our theory tells us is important. It really does ultimately come back to theory and our ability to develop concepts in an unambiguous fashion. Major breakthroughs in any science come when scholars are able to escape the constraints of looking at things in the same way as their predecessors. We need to develop our data at a level of precision that is equal to that in our theories and we need to take our measurements at times that can test our theories. We need to acquire the data that allow us really to test our theories rather than forcing our theories to speak to the data that are available.
Second, we need to become much more creative in thinking about how to test our theories. Theories do not have to be tested in their entirety nor do they have to be tested directly. 7 Many of us have said, tongue-in-cheek, that it would be much easier to study war if we could create wars as a part of an experimental design. This is unnecessary. If we use our theories to derive hypotheses that can be tested, either in the real world or in the laboratory, then we are doing our job as scientists. Our theory might be about war, but if it leads to expectations about how college freshmen behave in experimental situations, then we can test those hypotheses and that is a test of the theory. We are better off using our creativity to develop hypotheses that can be adequately tested, even if they do not address the major central questions of our theory, than we are conducting tests that rely on poor surrogate measures of our central concepts. It is true that we cannot create wars in the laboratory, but neither can physicists recreate a big bang in the laboratory. They test their theories by using them to derive hypotheses about things that can be observed.
Third, we should get in the habit of being as clear and precise in our expectations as our theories will allow. We often test hypotheses that are less precise than our theories provide simply because that is all the data will allow. We test whether y increases as x increases when our theory can tell us by how much y should increase when x increases one unit. Such precision has the benefit of reducing the number of cases we need to test hypotheses adequately. If a theory of planetary motion tells us that the moon will rise at a specific location at precisely 20:31, that it will appear 90.4% illuminated, and that it is waning, a single observation is a test of the theory. 8 We have theories that lead to very precise expectations about specific cases—we should test these precisely.
Fourth, we should not shy away from making predictions and from using these as tests of our theoretical understanding. We have all been told that social sciences are not “real” sciences because their practitioners cannot predict, while “real” scientists like physicists and chemists and biologists can. Always keep in mind that the weather is governed entirely by physical and chemical forces. Even the physical scientists have a hard time predicting when faced with a moderately complex system. If we consider the nature of the things we are dealing with and the sheer amount of resources and technology devoted to predicting the weather, I think a case can be made that we are actually better. Our predictions have to be specific and theoretically driven and they have to be tested with quality data. While the value in what we do will be judged by our ability to predict things in the real world, I again will stress the usefulness of making, and testing, predictions in more controlled environments.
Fifth, we should not be deterred by the paucity of resources devoted to supporting research in our area of study. I recognize that this is a very real obstacle: the current annual National Science Foundation budget for research in all of political science is $10 million, while that for physics is $279 million and the budget for NASA is $19 billion. That is, you could more than fund political science with the interest on the money it costs for one NASA project. Physicists did not always have the resources necessary to provide the technologies that they now take for granted. The timepiece Isaac Newton used in the conduct of his experiments was basically the 16th century English equivalent of “One-Mississippi, Two-Mississipi ….” Nor did physicists wait for someone to provide those amounts of money and then figure out what to do with it. They developed their science, determined what technologies were necessary to provide the data to test their theories, and made the case that governments, and private enterprises, should provide the resources.
I think there is reason for great optimism. We are gaining access to much more precise data for many things, GIS systems are one example, and some of us are testing our theories in the laboratory or basing actual predictions on our science. The past 40 years have brought great progress in our scientific discipline. I think the next 40 will be as exciting, particularly if the next generation of peace scientists will devote at least part of their creative energies toward developing the technologies necessary to provide the quality of data that our theoretical and empirical methodologies require.
Footnotes
Notes
Funding
This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
