Fighting words

Abstract

This article examines the effectiveness of public statements of resolve in international conflict. Several prominent theories, including domestic audience cost theory and theories regarding international reputation, suggest that issuing resolved statements can help a leader achieve a more favorable outcome in conflict bargaining. Because they entail costs for backing down, these statements are believed to credibly convey resolve to an adversary. This can help to alleviate the uncertainty created by private information about resolve and persuade the adversary to back down. Despite the prevalence of this theoretical logic, the effectiveness of statements of resolve at influencing conflict outcomes has rarely been subjected to direct tests, and some recent empirical work has raised doubts about statements’ effectiveness. This article is the first to directly examine the effect of resolved statements on conflict outcomes using large-N analysis. It introduces original data, created using content analysis, which directly measure the level of resolved statements made by US presidents during militarized interstate disputes (MIDs). Analysis of these data demonstrates that a higher level of resolved statements is indeed associated with a greater chance of prevailing in disputes. This finding is substantively significant and robust, providing support for the argument that public statements play an important role in international conflict.

Keywords

audience costs conflict bargaining reputation statements of resolve

Introduction

In September 2012, as Israeli concern over Iran’s uranium enrichment program grew, the Israeli government urged the Obama Administration to issue a ‘red line’ statement regarding the program. Demonstrating the importance they placed on such a US statement, Israeli officials repeatedly made the case for it in public, including to US news outlets and the United Nations General Assembly. They also indicated that without a US red line statement, Israel would be more likely to strike Iran unilaterally (Greenberg, 2012; Netanyahu, 2012; Reuters, 2012). Although the Obama Administration had already made statements of resolve regarding Iran’s enrichment program, it declined to issue a more specific statement or set deadlines for Iranian cooperation (Greenberg, 2012). While US and Israeli leaders disagreed about whether a red line statement was desirable, they both clearly agreed that such a US statement would be important. Israeli Prime Minister Netanyahu’s public comments indicate he believed that a US red line statement was the ‘only’ way to ‘peacefully prevent Iran from getting atomic bombs’ (Netanyahu, 2012). Privately, Netanyahu may have also believed that if deterrence failed, such a statement would commit the United States to take action. On the US side, the policy community seemed to believe that such a statement could tie the hands of the US president and force the United States into military action (Ignatius, 2012; Zakaria, 2012).

This anecdote illustrates the great importance which international leaders place on statements of resolve and the belief among policymakers that statements, despite not having any direct physical cost, can have a profound impact on international conflicts. This belief is shared by many international relations scholars. Fearon (1994) started the recent trend of interest in resolved statements with his theory of domestic audience costs, which many scholars have built upon. Other scholars have argued that statements of resolve are effective due to international reputational costs (Sartori, 2005) or the danger of escalation (Trager, 2010). While the mechanisms in these theories differ, most theories agree that making resolved statements should lead to a more favorable outcome in conflict bargaining.

Despite the persistent belief among policymakers and scholars that statements of resolve can influence international conflict outcomes, this belief has rarely been subjected to direct tests, and some recent empirical work has raised doubts about statements’ effectiveness. This article is the first to directly examine the effect of resolved statements on conflict outcomes using large-N analysis. It introduces new data on resolved statements made by US presidents during militarized interstate disputes (MIDs). These data were obtained through content analysis of public statements by US presidents. I use these data to investigate the relationship between statements of resolve and MID outcomes. The results show that a higher level of resolved statements is indeed associated with a greater chance of prevailing in MIDs. This finding is both statistically and substantively significant and is robust to various specifications. The largest concern raised by this result is the possibility of reverse causation, but I use statistical tests and logical argumentation to make the case that the primary causal effect is indeed the effect of resolved statements on conflict outcomes.

This article consists of five sections. The first section discusses the basis for the theoretical expectation that resolved statements will influence conflict outcomes and the existing research that attempts to address whether this is empirically true. The second section describes the research design, including the coding of new data on US presidential statements of resolve and the setup of the statistical tests. The third section discusses the results, including their substantive significance. The fourth section discusses robustness. The final section concludes.

Theoretical expectations

It has been noted that states involved in conflict bargaining could benefit from being able to honestly convey information about their respective capabilities and resolve. This would allow them (barring other complications) to reach a settlement reflecting the likely outcome of a military conflict without actually bearing the costs of war. Unfortunately, states have private information about their capabilities and resolve, and they have the incentive to make exaggerated claims about these things in order to obtain a better settlement (Fearon, 1995). Given this incentive to misrepresent, we might be tempted to dismiss statements of resolve as a waste of breath. Yet, several scholars have put forward theories regarding how statements might be able to effectively convey resolve to adversaries and thus influence the outcomes of international disputes.

The most prominent theory explaining how statements might be able to successfully convey resolve is domestic audience cost theory. Fearon (1994) began the recent trend of interest in domestic audience costs with a model in which leaders face domestic punishment if they take escalating actions, such as threats or mobilization, in public view and subsequently back down. Escalating under these circumstances effectively signals a leader’s resolve, while raising the risk that the leader will become locked into fighting by the cost of backing down. Other scholars have built on Fearon’s theory by attempting to explain why the domestic audience punishes leaders who back down. For example, Guisinger & Smith (2002) show that when a country’s reputation for credibility resides with its leader, it is rational for the domestic audience to remove a leader who backs down from commitments in order to restore the country’s reputation. Other scholars, such as Smith (1996) and Slantchev (2006), have created models in which voters punish leaders who back down because this is a sign of incompetence. Many other scholars have also incorporated the concept of domestic audience costs into their work.

Other scholars have offered different explanations for how statements might be effective at conveying resolve. Sartori (2005) has advanced the concept of international reputational costs. According to Sartori’s model, if a country makes a statement and subsequently backs down, it will develop a reputation for bluffing. States thus have a disincentive to bluff because losing the ability to communicate credibly makes a state less able to attain its future goals. This disincentive to bluff makes resolved statements credible and effective in bargaining among states with honest reputations. More recently, Trager (2010) has argued that threatening statements can be informative because of the danger that the statements will result in hostile behavior from the target of the statements. If a leader makes statements despite this danger, this is a credible demonstration of resolve.

Each of these theories relies on some sort of consequence or ‘cost’ associated with making statements as an explanation for their effectiveness.¹ The cost may be a decline in domestic support, as in audience cost theory, or harm to the state’s international position, as in Sartori’s and Trager’s theories. Regardless of the nature of the cost, it has a similar effect. The cost deters leaders from making statements lightly or constantly and therefore allows the statements to be genuinely informative about the issuer’s resolve. Specifically, this happens through a signaling mechanism, which differentiates the behavior of resolved and unresolved leaders, and/or through a commitment mechanism, which commits a leader to follow through on statements because of the added cost that statements create for backing down.

If statements are informative due to these mechanisms, then an adversary that hears statements of resolve should be more likely to believe that the issuer of the statements is actually resolved to stand firm. As the adversary’s belief that the issuer of statements is resolved increases, the adversary itself should become more likely to back down. This is because any adversary that underestimated the issuer’s resolve should have its belief corrected by the statements, and once an adversary is convinced that the issuer of statements is fully resolved to fight or continue fighting, the adversary must back down unless it is willing to do the same. If the adversary is more likely to back down, then the issuer of resolved statements should be more likely to obtain a favorable outcome. Therefore, there is reason to expect that resolved statements will increase the probability of a favorable conflict outcome.

This logic should apply to both conflicts that are decided without the use of force and conflicts in which force is used. Although most formal models used to derive theories about statements’ effectiveness focus on pre-force bargaining, the calculations involved in making the decision to fight and making the decision to continue fighting are very similar. The main difference between pre-conflict bargaining and bargaining while fighting is that information can be learned from battle outcomes while fighting (Powell, 2004; Slantchev, 2003; Wagner, 2000). However, battle outcomes mostly convey information about capabilities and do not negate the role of statements in conveying resolve. For example, insurgents fighting US troops in Iraq and Afghanistan knew that the United States was very militarily capable, but had doubt about the costs the USA was willing to bear. Therefore, these groups arguably learned as least as much from US statements as from the outcomes of battles with US troops.

It should be noted that there are some theories which suggest that making statements of resolve is not always desirable. Leventoglu & Tarar (2005) and Kurizaki (2007) argue that public threats might be less efficient than private threats because of the potential to create audience costs on both sides. However, these models do not indicate that public statements are ineffective, just less efficient under some circumstances. A bigger challenge is raised by Slantchev (2010) and Trager (2010), who each argue that signaling resolve to an adversary can be counterproductive because it can prompt the adversary to make military preparations that decrease the signaler’s chance of victory. However, Slantchev and Trager predict that countries are more likely to refrain from signaling resolve when this risk exists. Therefore, we should still expect resolved statements that are actually issued to have a generally beneficial effect on conflict outcomes.

In sum, while it might not be optimal to make public statements under every circumstance, there appears to be widespread theoretical support for the idea that statements of resolve should have a generally positive impact on conflict outcomes. There are, of course, differing explanations for why this is the case, but the purpose of this article is not to adjudicate between these explanations. Rather, this article asks the more fundamental question of whether the theoretical expectation derived from all of these theories is empirically valid. Despite the prevalence of theories suggesting that resolved statements should be effective in international conflict, it has not yet been clearly demonstrated that this is true.

Some empirical work has attempted to investigate the mechanisms that could make statements effective. Several scholars have tried to prove the existence of domestic audience costs and international reputational costs by examining the consequences of being caught in a bluff. For example, survey experiments have found evidence that citizens disapprove of leaders who make resolved statements and then back down, indicating support for the existence of domestic audience costs (Tomz, 2007; Trager & Vavreck, 2011). Sartori (2005) finds some evidence of international reputational costs in a statistical analysis of MIDs, which shows that states with a history of bluffing are less able to deter attacks. However, these studies only address the question of whether resolved statements are costly, not whether they have any effect on conflict outcomes.

Another group of scholars has tried to link the theory of domestic audience costs to conflict outcomes by showing that democracies and other regimes in which it is easier for domestic audiences to punish the leader tend to experience more favorable conflict outcomes (Eyerman & Hart, 1996; Gelpi & Griesdorf, 2001; Partell & Palmer, 1999; Schultz, 1999; Weeks, 2008). While these results are suggestive, none of these studies brings resolved statements into the analysis. Therefore, we do not know if it is truly effective signaling or some other aspect of regime type which is driving these results. Raising further doubt about some of these findings, Downes & Sechser (2012) show that democracy is not a significant predictor of threat success in a dataset consisting solely of compellent threats.

Only a few studies have attempted to look at the impact of statements of resolve directly, and these studies have found a mixed record of success at best. Snyder & Borghard (2011) argue that there is little evidence that resolved statements were effective in three out of four cases they examine. Trachtenberg (2012) finds a mixed record of success for public statements in a dozen great power crises. Based on time-series analysis of the relationship between US presidential ‘saber rattling’ statements and event data measuring adversary behavior, Wood (2012: 129–132) argues that saber rattling either has no effect on adversary behavior or makes it more hostile.

These findings are surprising given the widespread belief among scholars and policymakers that statements are effective. However, these studies have limitations. The research designs based on qualitative case studies are naturally limited in scope, and even though Wood uses large-N analysis, he only examines the behavior of three US adversaries. In addition, Wood’s analysis indicates that resolved statements sometimes precede hostile adversary behavior, but does not examine the ultimate outcome of disputes. Therefore, these existing studies do not provide a fully satisfying answer to the question of whether statements of resolve have an impact on international conflict outcomes. This article attempts to present more persuasive evidence regarding whether resolved statements are effective in international conflict, as is suggested by so many theories. Therefore, the following hypothesis is tested:

Hypothesis: All else equal, issuing a greater level of resolved statements will increase a country’s probability of achieving a more favorable conflict outcome.

Research design

I analyze the effectiveness of statements of resolve in a dataset of dyadic militarized interstate disputes (MIDs) involving the United States. The dependent variable is the outcome of the MID. The independent variable of interest is the level of resolved statements made by the US president during the MID, which is measured using content analysis. The hypothesis above would predict a positive relationship between the level of statements and the likelihood of a more favorable MID outcome. The subsections below describe the data used in more detail and address potential concerns about the data.

Unit of observation

This article analyzes the effectiveness of statements in a dataset consisting of dyadic MIDs as the unit of observation. A MID is defined as any event in which a state threatens, displays, or uses force against another state (Ghosn, Palmer & Bremer, 2004). For purposes of this analysis, MIDs, many of which are multilateral, are broken down into dyadic MIDs, which are pairs of countries on opposite sides of a MID. Because of limits on my ability to measure statements across all countries, I restrict my analysis to dyadic MIDs in which the United States was involved between 1950 and 2010. This results in a dataset of 272 dyadic MIDs, although most models are estimated based on slightly fewer observations because of limitations on the availability of some control variables after 2005.² The dyadic MID observations are drawn from the MID 4.0 Dataset for the years 2002–10, the MID 3.1 Dyadic Dataset for the years 1993–2001, and the Maoz Dyadic MID Dataset for the years 1950–92 (Ghosn, Palmer & Bremer, 2004; Maoz, 2005). Only dyads which actually interacted during a MID were included, and I used conflict start and end dates which were adjusted to be correct for each specific dyad.

Using the MID dataset allows me to test my hypothesis on the largest possible set of relevant observations. As argued by Dafoe, Renshon & Huth (2014), most MIDs engage a state’s reputation, so they should be good cases for testing expectations derived from domestic audience cost theory and international reputational cost theory. As discussed in the previous section, there is theoretical reason to believe that resolved statements can play an important role in conflicts involving force as well as conflicts that have not yet escalated to force. If many MIDs were won entirely by brute force, then the MID dataset would be inappropriate to use, despite these theoretical arguments. However, out of the 240 distinct MIDs in my dataset, only six were decisively won by force.³ Therefore, resolved statements should have the potential to play an important role in the vast majority of MIDs in the dataset.

A separate concern about the MID data might be whether it is appropriate to use MIDs as the unit of analysis for studying statements when, in some cases, the presence of a statement (i.e. a threat) is what defines the existence of a MID. The result of this research design is that clear threats should usually be captured because they constitute MIDs in themselves, while milder statements of resolve are not captured unless they were made in conjunction with other events that constitute MIDs. This should not bias the results for statements within MIDs, but it does suggest that the results in this article cannot necessarily speak to the effect of mildly resolved statements made in non-militarized disputes, which may be either more or less effective.

Dependent variable

The dependent variable used in the analysis, Conflict outcome, is derived from the outcome variable in the MID 4.0 dataset (Ghosn, Palmer & Bremer, 2004). It is coded as 3 if the outcome clearly favors the United States, 2 if the outcome is neutral, and 1 if the outcome clearly favors the adversary. In creating this variable, I treated outcomes coded in the MID data as victory or yield as clearly favoring one side or the other. I treated outcomes coded as compromise, stalemate, released, unclear, or missing as neutral.⁴ Due to the ordinal nature of the dependent variable, the functional form is an ordered probit. Using alternate dependent variables, such as escalation to force or reciprocation, was considered but deemed infeasible because accurate escalation and reciprocation dates are not available for many dyadic MIDs. Using the wrong dates to collect statements to predict escalation or reciprocation could result in predicting these events with statements made after the events occurred or leaving out relevant statements made before the events.

Independent variable

The crucial independent variable for this analysis is public statements of resolve. Public statements are defined as statements made to the press and/or a public audience. The concept of resolve is more difficult to define. Although the idea of signaling resolve is often referenced, it is typically not clearly defined. Most existing studies which have operationalized the concept of signaling resolve have done so by measuring behavior, such as actions taken in MIDs (Sartori, 2005). A few scholars have coded explicit deterrent or compellent threats (Huth, 1988; Sechser, 2011). In addition, Wood (2012: 35–36) attempts to code US presidential statements which constitute ‘saber rattling’, defined as ‘hostile foreign policy rhetoric of all styles and contexts’. However, no one has attempted to fully and explicitly define the universe of statements which signal resolve.

I define statements of resolve as statements that indicate a state is committed to a position. The statements which indicate the strongest level of resolve are those that make an explicit threat. This type of statement clearly establishes the country’s position and creates specific expectations about what will happen if the adversary fails to comply. A slightly lower level of resolve can be conveyed by statements which express a concrete demand or concrete refusal without a threat. Even though these statements do not promise specific action, they still create specific expectations about the state’s position, and any backing down from that position is easy to observe.

Although these types of statements are important in creating strong expectations, they are not the only type of statement that can convey resolve. More implicit statements can also convey a certain measure of resolve. Any public statement by a country which characterizes the status quo or another state’s behavior negatively can raise expectations that the country is committed to changing the situation or behavior. Thus, statements which complain, denounce, or make other negative characterizations about a country or situation can also be viewed as a milder form of resolved statements. While explicit threats are likely to convey more resolve, leaving more implicit statements out of the analysis would leave the majority of statements made during international conflict unaddressed. Therefore, my primary measure of resolved statements includes negative characterizations as well as threats, demands, and refusals.

My measure of statements of resolve was created based on content analysis of the public statements made by US presidents between 1950 and 2010. The decision to use statements by US presidents, and not other world leaders, was driven by the fact that a comprehensive record of US presidential statements was readily available and posed no language or cultural barriers to identifying resolved words. A potential downside of this decision is that the results for the effectiveness of US statements might not necessarily be applicable to other countries because the United States has some unique characteristics. In particular, high US military capabilities could arguably make US statements of resolve more convincing. Nonetheless, because the USA is not always willing to use force to impose its will, it still faces the same basic challenge as other countries in conveying when it is truly resolved. Establishing whether or not statements are effective at conveying resolve for even just one country is an important empirical step forward given that this has never been clearly demonstrated.

Another concern might be whether statements by additional US officials should be included. Unfortunately, no equally comprehensive source of statements by other US officials is available. However, since most administrations try to stay ‘on message’, the president’s statements should be somewhat representative of statements by other officials. Furthermore, if a lower-level official makes a particularly hawkish statement, the press is likely to ask the president about it, ensuring that the president will reiterate or refute most controversial statements by other officials. Finally, while statements by all officials may play a role, presidential statements will usually carry the most weight since the president has the sole authority to authorize force.

I now turn to discussing how the primary measure of statements of resolve was created.⁵ All public presidential statements and remarks released by the Office of the Press Secretary since 1929 are available in the Public Papers of the Presidents of the United States (Peters & Woolley, 2014). I searched this resource and collected all presidential statements between 1950 and 2010 which were made in the context of dyadic MIDs involving the United States. I consider a statement to be made in the context of a dyadic MID if (1) it was made within the time frame of the dyadic MID or within 30 days before, and (2) it was about the dyadic MID adversary. I began the process of identifying these statements by searching the Public Papers for the name of the US adversary in each dyadic MID within the specified time frame. Out of the search results, I excluded statements by people other than the president, statements which were not spoken,⁶ and paragraphs which were not about the adversary.

I coded the set of full statements made in the context of each dyadic MID using the Yoshikoder content analysis program and a content analysis dictionary which I created to measure resolve. I created the content analysis dictionary through an inductive process, reading the statements I collected, identifying resolved words or phrases, and adding these items to the dictionary. After gathering most of the dictionary items in this manner, I also added a few words from Wood’s (2012) dictionary measuring ‘saber rattling’ and words recommended by colleagues. I then used the Yoshikoder program to find all instances in which the dictionary items appeared in the statements. I examined the Yoshikoder output to identify dictionary items which appeared to have inflated tallies, investigated the instances in which these suspicious words or phrases appeared in statements, and edited the dictionary as necessary. The final dictionary includes 264 words and phrases.

The primary weighting scheme which I use for the dictionary has three tiers, corresponding to the conceptualization of the relative strength of different types of statements given in the definition of statements of resolve above. Words and phrases that are commonly associated with explicit threats, for example ‘take action’ or ‘whatever is necessary’, were weighted as 3.⁷ Words and phrases commonly associated with demands or refusals, such as ‘unwavering’ or ‘not negotiable’, were weighted as 2. Finally, words and phrases typically associated with negative characterizations, for example ‘dangerous’ or ‘barbaric’, were weighted as 1.⁸ The most appropriate category for each dictionary item was determined based on consultation with seven colleagues. The categorization of most dictionary items was not controversial.

Using the Yoshikoder program, I obtained a score for each dyadic MID which equals the sum of all weights for each use of a dictionary item by the president in the context of the dyadic MID. To obtain the variable used in the statistical analysis, I normalized these scores, dividing them by the number of days over which statements were collected (the number of days in the dyadic MID plus 30 days before). Figure 1 shows a histogram of the US statements variable. The most notable feature of the histogram is the clustering of statement scores around 0 with only a few high outliers. The six outliers which stand out the most are the Gulf War, three minor

Figure 1.

Histogram of statements scores

disputes with the Soviet Union in the 1980s, the Afghanistan War, and a border dispute between Yugoslavia and Macedonia during the Bosnia conflict.⁹ I will demonstrate later that these outliers can be dropped with little change in the results.

Control variables

As noted in the hypothesis, I expect resolved statements to have an effect on conflict outcomes, all else equal. I do not suggest that statements necessarily outweigh other factors, such as military power. In order to take the other factors which are likely to affect conflict outcome into account, I include several control variables.¹⁰ I present two main models, the first of which is more parsimonious. The parsimonious model controls for the two factors other than statements that I consider most likely to affect the conflict outcome: Relative capabilities, which measures the percentage of total capabilities in the dyad held by the United States (Singer, Bremer & Stuckey, 1972), and Rival democracy, an indicator variable coded as 1 if the US rival in the dyadic MID has a Polity score equal to at least 7 (Marshal, Jaggers & Gurr, 2010). It also includes a time trend consisting of the year at the midpoint of the dyadic MID and its square and cube to account for the possibility that the dynamics of US MIDs and probability of victory have changed over time.¹¹

The second model includes additional control variables. US side A is an indicator variable that captures whether the United States was on the initiating side of the MID (Ghosn, Palmer & Bremer, 2004). Defense pact is an indicator variable coded as 1 if the rival had a defense pact with the United States (Gibler, 2009). Affinity is a measure of the affinity between the adversary and the United States, based on UN General Assembly votes (Gartzke, 2006; Voeten & Merdzanovic, 2009). Territory revision, Regime revision, and Policy revision are indicator variables for the primary type of revision to the status quo sought in the MID. The omitted comparison category for these dummies is MIDs in which neither side sought to revise the status quo or the revision type is coded as ‘other’. US hostility and Rival hostility are ordinal variables giving the highest hostility level¹² reached by each side in the dyadic MID (Ghosn, Palmer & Bremer, 2004; Maoz, 2005). Sanctions is an indicator variable coded as 1 if the United States imposed new economic sanctions upon its adversary at any point during the dyadic MID (Morgan, Bapat & Kobayashi, 2013). I also include dummy variables for the seven countries with which the United States had the most MIDs: the Soviet Union/Russia, China, North Korea, Cuba, Libya, Iraq, and Iran.¹³ These dummies are intended to control for the possibility that certain conflict rivals may be harder to defeat than others due to unmeasured factors.

One control variable that is arguably missing is the US adversary’s statements of resolve. It is not realistic to collect resolved statements for all MID adversaries, so the effect of the other side’s statements is in the error term. However, this is unlikely to create much bias because theory indicates that making more statements should improve the outcome for the United States regardless of what the other side does. Though the effect of statements may cancel out if both sides make them, the USA can still convey more resolve by making statements and letting them cancel out than by remaining silent and giving the other side a potential advantage. Thus, US statements should have a consistently positive effect regardless of the adversary’s statements.

Results

The results of both main models examining the impact of statements of resolve on conflict outcomes are given in Table I. The models are ordered probits using Conflict outcome

Table I.

Main results

	Model 1 (N = 262)			Model 2 (N = 258)
Variable	Coef.	SE	p-value	Coef.	SE	p-value
Statement score	0.226	0.071	0.001	0.271	0.065	<0.001
Relative capabilities	1.844	0.625	0.003	1.813	1.926	0.347
Rival democracy	−0.723	0.526	0.169	−1.216	0.556	0.029
Year	0.032	0.015	0.033	0.034	0.019	0.077
Year squared	0.001	0.000	0.055	0.002	0.001	0.002
Year cubed	0.000	0.000	0.013	0.000	0.000	0.024
US side A				−0.415	0.322	0.198
US hostility				0.370	0.160	0.021
Rival hostility				−0.055	0.104	0.594
Sanctions				−1.603	0.560	0.004
Affinity				0.246	0.470	0.600
Defense pact				0.908	0.397	0.022
Territory revision				−0.195	0.342	0.568
Regime revision				0.901	0.459	0.050
Policy revision				−0.001	0.288	0.996
USSR/Russia				0.458	0.702	0.514
North Korea				−0.375	0.392	0.339
China				0.005	0.813	0.995
Cuba				0.440	0.475	0.354
Iraq				0.605	0.481	0.208
Libya				1.074	0.580	0.064
Iran				1.134	0.537	0.035

These are ordered probit models predicting conflict outcome. Huber-White standard errors are used. Results which are significant at the 90% confidence level are in bold. Year was centered before squaring and cubing.

as the dependent variable. We see that the score for statements of resolve is highly significant in both models. The positive coefficient indicates that a more favorable outcome is increasingly likely as the level of resolved statements increases. Specifically, this indicates that an increase in the statement score moves the expected conflict outcome closer toward the next threshold, either the threshold between a loss and a draw or the threshold between a draw and a win. Thus, the hypothesis that resolved statements should increase the probability of a favorable conflict outcome is supported.

The results for the control variables are largely unsurprising. The USA appears to be likely to achieve a more favorable outcome when it is relatively more powerful, when it uses a higher level of force, when the MID is about the issue of regime, and when the MID is with an ally. The statistical significance of the last result is driven by MIDs with members of the Organization of American States. The USA is likely to receive a less favorable outcome when its rival is a democracy and when it employs sanctions. In addition, all of the time trend variables are significant, indicating that the USA has achieved better MID outcomes over time. Some of the adversary dummy variables are significant, indicating that the USA does better in MIDs against Libya and Iran.

Turning back to the main result, although Table I shows that the effect of statements on conflict outcomes is statistically significant, it is also helpful to look at the substantive significance. The substantive effect of statements of resolve on the chance of a favorable conflict outcome can be seen by calculating predicted probabilities. As recommended by Hanmer & Kalkan (2013), I use average predicted probabilities.¹⁴ This provides a more realistic picture of the substantive effect than calculating the predicted probability for just one fictional ‘average’ case. Figures 2 and 3 show how the predicted probability of a winning outcome (Conflict outcome = 3) increases with the statement score in both models. We see a fairly dramatic increase in the predicted probability of a winning outcome as the level of resolved statements increases, going from near 0 to around 80%.

Figure 2.

Predicted probability of a winning outcome based on Model 1

Figure 3.

Predicted probability of a winning outcome based on Model 2

However, it should be noted that between the Statement score values of 0 and 2, where most observations lie, the rate of change in the predicted probability is relatively flat.¹⁵ Even at such low levels of statements, there appears to be some incremental benefit from making more statements, which might be viewed as non-trivial by presidents in conflict. Still, it is plain that a relatively high level of resolved statements is necessary in order for statements to have a large substantive impact on conflict outcomes. This raises the question of why such a high level of statements is rarely observed in practice. This is a question for future research, but a likely explanation is that making statements carries costs as well as benefits.

We might also wonder which observations have the biggest predicted effect of resolved statements. Table II lists the dyadic MIDs with the largest predicted marginal effects in each model.¹⁶ Most of these MIDs were relatively serious disputes. Most of them also involved some use or display of force in conjunction with statements, but only the Afghanistan War, the invasion of Grenada, and the pressure on Libya over terrorism, which ended with US strikes on terrorist camps, were won primarily by force.¹⁷ It is also interesting to note that most of the MIDs with the top marginal effects involved weaker adversaries, particularly in the Middle East.

Robustness

Although the results above are statistically and substantively significant, it is important to address potential concerns with the models and check for robustness. The results of all of the robustness checks are summarized in Table III.¹⁸ The first concern to be addressed is whether the result might be driven by reverse causality. An alternate explanation for the relationship between higher levels of statements and better conflict outcomes could be that presidents make fewer resolved statements when they anticipate losing and more resolved statements when they anticipate winning. This type of behavior could make it appear that statements affect the conflict outcome, even if they do not.

However, this alternate view of the direction of causality seems less theoretically plausible. While it is plausible that presidents might make fewer statements when they expect to lose in order to avoid the costs of backing down, this perspective has difficulty explaining the positive incentive for presidents to make statements. If statements of resolve do not affect conflict outcomes, why make them? One possibility is that they are made to gain domestic credit for winning, but as Baum (2004: 609–610) argues, foreign policy successes are much less likely than failures to affect the president’s electoral prospects.

Table II.

Observations with the highest predicted marginal effects of statements

Model 1
MID	Adversary	Description	Start of US involvement	Conflict outcome	Marginal effect
2227	USSR	US threat regarding Yugoslavia	1980	2	0.090
4046	Yugoslavia	Conflict on Yugoslavia–Macedonia border	1994	2	0.090
4283	Afghanistan	Afghanistan War	2001	3	0.090
4137	Yugoslavia	Kosovo conflict	1998	3	0.085
2226	USSR	Baltic Sea maneuvers	1980	2	0.076
3974	Iraq	Gulf War aftermath	1991	2	0.076
2353	Nicaragua	Show of force during Nicaraguan civil war	1986	2	0.075
3568	Iraq	Dispute over WMD inspections	1993	2	0.070
4271	Iraq	Response to Iraqi mobilization	1996	3	0.068
3636	Libya	Pressure on Libya over terrorism	1986	3	0.068
Model 2
MID	Adversary	Description	Start of US involvement	Conflict outcome	Marginal effect
61	Cuba	Cuban Missile Crisis	1962	3	0.108
3634	Libya	Libyan intervention in Chad	1983	2	0.108
3636	Libya	Pressure on Libya over terrorism	1986	3	0.108
3098	Libya	Libyan air raids into Sudan	1981	2	0.108
3058	Grenada	Invasion of Grenada	1983	3	0.108
4519	Iran	Exchange of fire across Iraq–Iran border	2004	3	0.107
4273	Yugoslavia	Kosovo conflict	1998	3	0.107
3973	Iran	Attack on US ship	1991	2	0.105
2834	Iran	Iran targets tankers in Gulf	1988	2	0.102
2353	Nicaragua	Show of force during Nicaraguan civil war	1986	2	0.102

Marginal effects are calculated for the probability of a winning outcome.

It is also possible that presidents might seek to create a rally-around-the-flag effect by drawing attention to conflict, but such effects are typically small and short-lived and thus have limited benefits (James & Rioux, 1998).¹⁹ Therefore, it is difficult to explain exactly why presidents make statements if they do not have an impact on conflict outcomes.

It is difficult to definitively rule out the possibility of reverse causation with statistical analysis, but one method that might partially alleviate concerns about this is matching. The reverse causality problem can also be thought of as a problem of treatment selection. The concern is that rather than assigning the treatment, that is, high levels of statements, to certain conflicts randomly, presidents choose to make high levels of statements in conflicts that have characteristics which make them likely to win. By creating a matched sample of observations that are very similar to each other except for differing in the level of statements, it is possible to approximate a situation in which treatment was assigned randomly and thus reduce the bias from non-random treatment selection based on observable factors (Simmons & Hopkins, 2005).

To create a matched sample, I first converted the continuous statement score variable to a dummy indicating whether the level of statements was over the median. This became the treatment variable. I then matched on variables that I expected to influence the decision to make statements, namely, the identity of the president during the MID, Relative capabilities, Rival democracy, Defense pact, US side A, Territory revision, and Regime revision. I used Coarsened Exact Matching, which creates a completely balanced sample based on coarsened versions of the variables used in matching (Iacus, King & Porro, 2012).²⁰

Table III.

Robustness check results for the variable Statement score

	Model 1			Model 2
Robustness check	Coef.	SE	p-value	Coef.	SE	p-value
Matched sample (original score)	0.499	0.209	0.017	0.965	0.338	0.004
Matched sample (binary score)	0.650	0.333	0.051	1.002	0.554	0.071
Controlling for NYT articles	0.143	0.071	0.043	0.205	0.067	0.002
Dropping top 10 outliers	0.538	0.240	0.025	0.509	0.243	0.036
Dropping top 20 outliers	0.725	0.359	0.044	0.662	0.370	0.074
Dropping top 10 Cook’s D values	0.256	0.111	0.021	0.332	0.103	0.001
Dropping top 20 Cook’s D values	0.142	0.177	0.423	0.358	0.135	0.008
Dummy for score over median	0.727	0.276	0.008	0.488	0.278	0.079
Dummy for score over 75th percentile	0.867	0.332	0.009	0.950	0.335	0.005
Natural log of statement score	0.867	0.253	0.001	0.995	0.244	<0.001
Rank of statement score	0.008	0.003	0.005	0.007	0.003	0.008
10-day pre-MID collection period	0.224	0.074	0.003	0.252	0.068	<0.001
20-day pre-MID collection period	0.286	0.096	0.003	0.314	0.076	<0.001
40-day pre-MID collection period	0.265	0.084	0.002	0.312	0.072	<0.001
60-day pre-MID collection period	0.232	0.083	0.005	0.269	0.091	0.003
One-weight dictionary	0.354	0.113	0.002	0.423	0.102	<0.001
Ten-weight dictionary	0.074	0.023	0.002	0.088	0.022	<0.001
Dropping negative characterizations	0.322	0.099	0.001	0.386	0.093	<0.001
Dropping all statements but threats	2.287	0.755	0.002	2.737	0.741	<0.001
Controlling for a count of ‘the’	0.249	0.172	0.148	0.312	0.160	0.052
Controlling for a count of ‘that’	0.204	0.144	0.158	0.237	0.134	0.077
Retaining only one dyad per MID	0.235	0.073	0.001	0.319	0.072	<0.001
Dropping overlapping MIDs	0.236	0.091	0.009	0.403	0.087	<0.001
Dropping one-day MIDs	0.293	0.104	0.005	0.278	0.076	<0.001
Dropping non-revisionist MIDs	0.210	0.073	0.004	0.307	0.074	<0.001
Dropping MIDs with ‘released’ outcome	0.228	0.074	0.002	0.269	0.069	<0.001
Linear time trend	0.240	0.070	0.001	0.259	0.063	<0.001
Natural cubic spline of time	0.242	0.070	0.001	0.286	0.063	<0.001
President dummy variables	0.299	0.086	0.001	0.376	0.079	<0.001
No temporal controls	0.255	0.072	<0.001	0.272	0.066	<0.001
Dropping relative capabilities	0.151	0.064	0.019
Correcting proportional odds violations	0.447	0.136	0.001	0.634	0.157	<0.001
Heckman ordered probit	0.208	0.067	0.002	0.284	0.081	<0.001
Regular (non-robust) standard errors	0.226	0.081	0.005	0.271	0.097	0.005
Ordered logit	0.413	0.124	0.001	0.502	0.118	<0.001
OLS	0.057	0.019	0.003	0.054	0.019	0.005
Dropping unclear and missing outcomes	0.227	0.072	0.002	0.270	0.064	<0.001

After matching, I re-estimated Models 1 and 2 in the matched sample, using both the original continuous measure of statements and the binary measure used for matching. In each case, the statement score remained significant at the 90% confidence level or above. Matching is not a panacea for the problem of reverse causality because it cannot account for unobserved factors that might affect the probability of winning and the decision to make statements. Still, the matching results, together with the weaknesses in the logic underpinning the argument for reverse causality, should increase our confidence that reverse causality is not the most likely explanation for the relationship between statements and conflict outcomes.

Another way in which the result could give the wrong impression about the relationship between resolved statements and conflict outcomes is if the coding of statements is actually picking up the effect of salience because presidents are likely to make more statements about salient conflicts. Many of the factors already included as controls, such as the revision type, should capture the salience of disputes. However, I also tried controlling for salience in one additional way, by including a measure of the average number of New York Times articles per day mentioning the dyadic MID adversary during the dyadic MID time frame. This measure captures salience more directly since the Times staff arguably has a good sense of which issues are salient and also plays a role in making them salient. I found that the coefficient for statements of resolve remained significant after including this additional control, indicating that statements are not merely acting as a proxy for salience.

An additional concern about the result might be whether it is driven by a few outliers. There are several ways to examine this possibility. I initially tested to see how many observations with the highest statement scores I could drop without losing significance for the result. I found that resolved statements did not lose significance in either model until I dropped more than 20 of the top outliers. Next, I tested to see how many observations with the highest values of Cook’s distance, a commonly used measure of influence, I could drop. I found that I could drop the top 13 most influential observations from Model 1 and the top 21 most influential observations from Model 2 without losing the significant result.²¹ I also examined the influence of outliers by transforming the measure of resolved statements in various ways. I found that dummy variables coding whether the statement score was greater than its median value and whether it was greater than its 75th percentile value were also significant predictors of conflict outcome. In addition, I found that the statement score remained significant when I converted it to its natural logarithm and then to ranks.²²

Another important test of robustness is determining if the results are robust to different methods of coding statements of resolve. I experimented with varying the time period for collecting statements and using different dictionaries and weighting schemes for measuring resolve. First, I varied the number of days before the dyadic MID when I started collecting statements between ten and 60 days. Second, I developed two alternate weighting schemes for my dictionary. In one, each word is weighted equally. In the other, words are weighted between one and ten. Third, I tried dropping certain statement types from the calculation of the statement score. I initially dropped only negative characterizations, and then I dropped demands and refusals as well. I found that none of these things made much difference in the significance of the result.

Although the tests above indicate that the result is not dependent on any particular choice in identifying or weighting statements of resolve, there might also be concern about whether the statement score is truly capturing the impact of resolved words or simply proxying for the frequency and length of statements. It is true that the score for resolved words is correlated with the length and frequency of statements because the majority of statements made about MID adversaries are resolved. However, if the words that the president says are meaningful, then resolved words should have a bigger impact on the conflict outcome than neutral words. To test this, I created a count of the word ‘that’ and a count of the word ‘the’ in the statements I collected. After normalizing these counts by the duration of statement collection, I inserted them one by one into Models 1 and 2. I found that the score for resolved statements remained significant at approximately the 85–95% confidence level in all of the regressions. In contrast, the neutral word counts were significant below the 25% confidence level, and two coefficients even had signs in the opposite direction. Therefore, it does appear that resolved words matter more than neutral words and that my score for statements of resolve is not solely serving as a proxy for the length of statements.

Turning to another issue, although I have addressed concerns about the influence of particular observations, there might also be more general concerns about which observations are included. In particular, some observations might not be independent from each other, and others might not be serious disputes. To address whether non-independence of dyadic MIDs that are part of the same overall MID biases the result, I re-estimated the regressions in a subsample that contained only the dyad with the highest statement score in each MID. To address another source of non-independence, I tried dropping all of the overlapping MIDs with the same adversary. Finally, I weeded out minor and accidental MIDs by dropping MIDs which lasted only one day, MIDs in which neither side was coded as revisionist, and MIDs in which the MID dataset coded the outcome as ‘released’. In each case, statements of resolve retained significance.

I also performed a variety of other robustness checks. I tried various ways of controlling for time and dropping temporal controls altogether. I tried dropping Relative capabilities, which is the only variable that prevents observations after 2007 from being included in Model 1. I tried correcting violations of the proportional odds assumption (Williams, 2006), using a Heckman model to account for non-random sample selection,²³ using regular standard errors, using an ordered logit model, using OLS, and dropping the few observations with unclear or missing outcomes. In all cases, resolved statements remained significant. In addition, I tested for multicollinearity and found no evidence that it is problematic.²⁴ In sum, we can conclude that the statistical significance of resolved statements is highly robust.

Conclusion

This article has demonstrated a positive relationship between statements of resolve and the probability of achieving a more favorable MID outcome. This result is robustly statistically significant and is substantively significant as well, particularly for high levels of statements. As is common in statistical analysis, the possibility of reverse causality cannot be eliminated entirely, but my theoretical argumentation and testing suggest that it is not highly likely. On the whole, my findings give support to the hypothesis that all else equal, an increase in a country’s statements of resolve increases the probability that the country will achieve a favorable conflict outcome. This indicates that resolved statements, rather than being empty words, are indeed effective at communicating resolve.

Since this article only examines US statements of resolve, it is not entirely certain that the result would be equally applicable to other countries. The United States might be able to issue more effective statements due to its military prowess, the extent of its international commitments, or its democratic system of government. However, it is still an important step forward to establish that statements can be effective at all. Until now, the effectiveness of statements in conflict bargaining has been a prevalent, but largely untested, theory. This is the first study to find that resolved statements are effective using large-N analysis of real-world data on statements.

There are several directions for future research suggested by this article. One obvious direction would be to extend this research design to other countries. A second possible direction might be to examine resolved statements in a time-series or event history context to see how they evolve over the course of a conflict. Third, further research is necessary to understand why statements are effective, that is, whether this is due to domestic audience costs, international reputational costs, and/or some other mechanism. Finally, since this article only examines statements in MIDs, further research is necessary to determine whether resolved statements are also regularly issued in lower-level disagreements or for general deterrence and whether their effectiveness is similar in these situations.

Footnotes

Replication data

The online appendix, dataset, and Stata do-file for the empirical analysis in this article are available at .

Acknowledgements

I am grateful to my anonymous reviewers, Christopher Clary, Katja Favretto, Patrick Kearney, Andrew Kydd, Lisa Martin, David Ohls, Jon Pevehouse, Inken von Borzyskowski, and participants at the Wisconsin International Relations Colloquium, the 2012 ISA Convention, and the 2012 MPSA Conference for feedback. I am grateful to Dan Wood for sharing his coding materials.

Notes

References

Baum

Matthew A

(2004) Going private: Public opinion, presidential rhetoric, and the domestic politics of audience costs in U.S. foreign policy crises. Journal of Conflict Resolution 48 (5): 603–631.

Bennett

D Scott

Stam

Allan C

(2000) EUGene: A conceptual manual. International Interactions 26 (2): 179–204.

Dafoe

Allan

Renshon

Jonathan

Huth

Paul

(2014) Reputation and status as motives for war. Annual Review of Political Science 17: 371–393.

Downes

Alexander B

Sechser

Todd S

(2012) The illusion of democratic credibility. International Organization 66 (3): 457–489.

Eyerman

Joe

Hart

Robert A

Jr (1996) An empirical test of the audience cost proposition: Democracy speaks louder than words. Journal of Conflict Resolution 40(4): 597–616.

Fearon

James D

(1994) Domestic political audiences and the escalation of international disputes. American Political Science Review 88(3): 577–592.

Fearon

James D

(1995) Rationalist explanations for war. International Organization 49 (3): 379–414.

Gartzke

Erik

(2006) The affinity of nations index, 1946–2002. Version 4 (http://pages.ucsd.edu/∼egartzke/datasets.htm).

Gelpi

Christopher F

Griesdorf

Michael

(2001) Winners or losers? Democracies in international crisis, 1918–94. American Political Science Review 95 (3): 633–647.

10.

Ghosn

Faten

Palmer

Glenn

Bremer

Stuart

(2004) The MID3 data set, 1993–2001: Procedures, coding rules, and description. Version 4.0. Conflict Management and Peace Science 21 (2): 133–154.

11.

Gibler

Douglas M

(2009) International Military Alliances, 1648–2008. Washington, DC: CQ.

12.

Greenberg

Joel

(2012) US, Israel at odds over drawing a ‘red line’ for Iran. Washington Post, 11 September: A9.

13.

Guisinger

Alexandra

Smith

Alastair

(2002) Honest threats: The interaction of reputation and political institutions in international crises. Journal of Conflict Resolution 46 (2): 175–200.

14.

Hanmer

Michael J

Kalkan

Kerem Ozan

(2013) Behind the curve: Clarifying the best approach to calculating predicted probabilities and marginal effects from limited dependent variable models. American Journal of Political Science 57 (1): 263–277.

15.

Huth

Paul K

(1988) Extended Deterrence and the Prevention of War. New Haven, CT: Yale University Press.

16.

Iacus

Stefano M

King

Gary

Porro

Giuseppe

(2012) Causal inference without balance checking: Coarsened exact matching. Political Analysis 20 (1): 1–24.

17.

Ignatius

David

(2012) The ‘red line’ herring. Washington Post, 16 September: A23.

18.

James

Patrick

Rioux

Jean Sébastien

(1998) International crises and linkage politics: The experiences of the United States, 1953–1994. Political Research Quarterly 51 (3): 781–812.

19.

Kurizaki

Shuhei

(2007) Efficient secrecy: Public versus private threats in crisis diplomacy. American Political Science Review 101 (3): 543–558.

20.

Leventoglu

Bahar

Tarar

Ahmer

(2005) Prenegotiation public commitment in domestic and international bargaining. American Political Science Review 99 (3): 419–433.

21.

Maoz

Zeev

(2005) Dyadic MID dataset (Version 2.0) (http://vanity.dss.ucdavis.edu/∼maoz/datasets.htm).

22.

Marshall

Monty G

Jaggers

Keith

Gurr

Ted Robert

(2010) Polity IV project: Political regime characteristics and transitions, 1800–2010 (http://www.systemicpeace.org/inscrdata.html).

23.

McManus

Roseanne W

(2014) The role of statements of resolve in international conflict. Dissertation, University of Wisconsin–Madison.

24.

Morgan

T Clifton

Bapat

Navin A

Kobayashi

Yoshiharu

(2013) Threat and imposition of sanctions (TIES) data 4.0 users’ manual (http://www.unc.edu/∼bapat/TIES.htm).

25.

Netanyahu

Benjamin

(2012) Speech to the United Nations General Assembly, September 27. Quoted in Reuters, Key portions of Israeli PM Netanyahu’s U.N. speech on Iran, September 27 (http://www.reuters.com/article/2012/09/27/us-un-assembly-israel-text-idUSBRE88Q1RR20120927).

26.

Partell

Peter J

Palmer

Glenn

(1999) Audience costs and interstate crises: An empirical assessment of Fearon’s model of dispute outcomes. International Studies Quarterly 43 (2): 389–405.

27.

Peters

Gerhard

Woolley

John

(2014) American Presidency Project. University of California, Santa Barbara (http://www.presidency.ucsb.edu).

28.

Powell

Robert

(2004) Bargaining and learning while fighting. American Journal of Political Science 48 (2): 344–361.

29.

Reuters (2012) Netanyahu takes case to U.S. public. Washington Post, September 17: A7.

30.

Sartori

Anne E

(2005) Deterrence by Diplomacy. Princeton, NJ: Princeton University Press.

31.

Schultz

Kenneth A

(1999) Do democratic institutions constrain or inform? Contrasting two institutional perspectives on democracy and war. International Organization 53 (2): 233–266.

32.

Sechser

Todd S

(2011) Militarized compellent threats, 1918–2001. Conflict Management and Peace Science 28(4): 377–410.

33.

Simmons

Beth A

Hopkins

Daniel J

(2005) The constraining power of international treaties: Theory and method. American Political Science Review 99 (4): 623–631.

34.

Singer

J David

Bremer

Stuart

Stuckey

John

(1972) Capability distribution, uncertainty, and major power war, 1820–1965. In: Russett

Bruce

(ed.) Peace, War, and Numbers. Beverly Hills, CA: Sage, 19–48.

35.

Slantchev

Branislav L

(2003) The principle of convergence in wartime negotiations. American Political Science Review 97 (3): 621–632.

36.

Slantchev

Branislav L

(2006) Politicians, the media, and domestic audience costs. International Studies Quarterly 50 (2): 445–477.

37.

Slantchev

Branislav L

(2010) Feigning weakness. International Organization 64 (3): 357–388.

38.

Smith

Alastair

(1996) International crises and domestic politics. American Political Science Review 92 (3): 623–638.

39.

Snyder

Jack

Borghard

Erica

(2011) The cost of empty threats: A penny, not a pound. American Political Science Review 105 (3): 437–456.

40.

Tomz

Michael

(2007) Domestic audience costs in international relations: An experimental approach. International Organization 61 (4): 821–840.

41.

Trachtenberg

Marc

(2012) Audience costs: An historical analysis. Security Studies 21 (1): 3–42.

42.

Trager

Robert F

(2010) Diplomatic calculus in anarchy: How communication matters. American Political Science Review 104 (2): 347–368.

43.

Trager

Robert F

Vavreck

Lynn

(2011) The political costs of crisis bargaining: Presidential rhetoric and the role of party. American Journal of Political Science 55 (3): 526–545.

44.

Voeten

Erik

Merdzanovic

Adis

(2009) United Nations General Assembly voting data (http://thedata.harvard.edu/dvn/dv/Voeten/faces/study/StudyPage.xhtml?studyId=38311&versionNumber=1).

45.

Wagner

R Harrison

(2000) Bargaining and war. American Journal of Political Science 44 (3): 469–484.

46.

Weeks

Jessica L

(2008) Autocratic audience costs: Regime type and signaling resolve. International Organization 62 (1): 35–64.

47.

Werner

Suzanne

(2000) The effects of political similarity on the onset of militarized disputes, 1816–1985. Political Research Quarterly 53 (2): 343–374.

48.

Williams

Richard

(2006) Generalized ordered logit/partial proportional odds models for ordinal dependent variables. Stata Journal 6 (1): 58–82.

49.

Wood

B Dan

(2012) Presidential Saber Rattling: Causes & Consequences. New York: Cambridge University Press.

50.

Zakaria

Fareed

(2012) The folly of a ‘red line’. Washington Post, 14 September: A19.