Abstract
Regional trade agreements (RTAs) all over the world share many of the same institutional features—from dispute-settlement mechanisms to legalized language to escape clauses—and espouse many of the same goals. But many of those features are unimplemented in different contexts across the world. Many observers have pointed to an “implementation gap” in RTAs—most of which promise much but deliver relatively little—but offer no explanation for this gap. I argue that this is explained by domestic capacity. In countries where the rule of law is weak and infrastructure is insufficient—and particularly when those elements interact—it is difficult to implement trade agreements. This article utilizes expert survey data to demonstrate that member states with poor infrastructure and little respect for the rule of law are unable to carry out the obligations they agree to in their trade agreements. Domestic capacity—both physical and institutional—can explain the implementation gap between what agreements promise and what they deliver, as well as the variation in the economic performance of these RTAs.
Keywords
Introduction
International agreements shape the landscape of interstate cooperation today, and understanding how these agreements work is crucial to our grasp of the globalized world. Regional trade agreements (RTAs) in particular are the fastest growing type of international organization today, and they have been credited with bringing all the classically understood cooperative benefits of international organization (Hafner-Burton, 2005; Pevehouse, 2002). Yet, unlike the more prominent international organizations—such as the North American Free Trade Agreement (NAFTA) and the European Union (EU)—most of these agreements are signed exclusively among smaller, developing countries. Indeed, roughly one third of preferential trade agreements (PTAs) comprise countries that are outside the Organisation for Economic Co-operation and Development (OECD). This points to a critical problem in the study of the implementation of international agreements. Given the huge variance in the type of state that signs on to these agreements, we risk assuming that the properties of some of the better-known organizations in OECD countries apply to all of these agreements. Furthermore, the language of these agreements alone may be insufficient for understanding how RTAs actually function in practice.
This article argues that divergent capacities in member states impact the likelihood that agreements will be carried out. Even if governments have every intention of sticking to their international commitments, technical or infrastructural barriers in their own country can make compliance problematic. Indeed, why would we expect that in countries where infrastructure is cripplingly poor and where domestic legal structures are week, signing onto an international agreement would magically sidestep those problems and enable heads of state to make credible commitments to certain policy paths—when decades of foreign aid and domestic policy reform have failed to do so? 1 Many scholars have argued persuasively that domestic capacity is crucial in examining the durability of international cooperation. 2 The literature in international law has long argued for the importance of domestic capacity in explaining compliance (Chayes & Chayes, 1993, 1998; Guzman, 2002; Raulstiala & Slaughter, 2002; Simmons, 1998, 2002; Tallberg, 2002). But these arguments have not been subject to decisive, systematic testing, particularly in the area of RTAs.
Anecdotal evidence abounds that trade agreements often work differently in practice that they do on paper. In RTAs, states habitually make commitments to liberalization timelines that they do not achieve. For example, one scholar in Latin America dubbed the practice of approving norms that never come into force as the “Mercosur syndrome” (Vaillant, 2007) after the trade agreement among Paraguay, Uruguay, Brazil, and Argentina. Other scholars describe Mercosur’s record as consisting of “rising rhetoric and declining achievement” (Malamud, 2005) as a function of its poor performance even as a free-trade area by General Agreement on Tariffs and Trade (GATT) definition (Bouzas, 2002), let alone the customs union that the agreement claims to be. 3 Given that reality, it is difficult to evaluate the raft of international and regional agreements that have sprung up in the past 30 years on the basis of their language alone. The commitments to which countries agree at the bargaining table do not always translate into action on the ground—and with most international organizations lacking monitoring and enforcement mechanisms, researchers have few objective tools for examining the implementation records of these agreements in a systematic manner.
The contribution here is twofold. First, this article uses a new data set of expert surveys to assess the actual workings of these agreements. Due to the vast number and diversity of international agreements, it is often infeasible to measure in a systematic manner the actual functioning of these agreements. Expert assessments from think tanks, government offices, and trade agreement secretariats themselves add a whole new dimension to our understanding of how these agreements actually work on a day-to-day basis. Second, the central argument takes into account domestic capacity in examining the record of implementation of RTAs. These contributions add a critical dimension to our understanding of international cooperation.
This article proceeds as follows. The section below links together some of the broader arguments from the literature on international cooperation with the insights from the literature on comparative regionalism. The latter field has long advanced (though not systematically tested) the argument that infrastructural barriers and institutional paucity make meaningful international cooperation difficult, particularly in the area of trade liberalization, which depends on member states’ access to well-developed road networks and respect for the rule of law at the borders. I then set out testable hypotheses: Agreements should increase trade, and their “implementation gap” should decrease, if the rule of law is strong and infrastructure is well developed. I also propose an interaction between these two variables, such that the combination of poor infrastructure and institutions creates an added effect on implementation. After discussing rival explanations, I review the expert survey method—which is especially valuable in this context, where we want to capture elements of trade cooperation that are otherwise difficult to measure or that could not be operationalized by looking at the language of agreements alone—and describe this particular venture, which culls expert opinion from policy makers and commentators all over the world. Due to the relatively small number of observations—a common feature of expert surveys—I pursue a variety of different techniques to build an empirical case for the central argument. I also show that the rival explanations for the variation of effectiveness in trade agreements do not hold up to expert assessment of these agreements’ functioning. The final section concludes.
Understanding Regionalism Through Comparative and International Politics
The growth in PTAs in the international system has been mirrored by studies trying to examine their determinants and effects. 4 Based in part on the primacy of legal scholarship and the rational design arguments as well as the availability of data, the literature on preferential trading arrangements has long conflated institutionalization with effectiveness. 5 But many of those authors assume that the rule of law, if enshrined at the international level, will also be upheld at the domestic level. Implicitly, this is assumed to be true irrespective of the quality of courts or bureaucracy among the signatories of a given agreement. It ignores the possibility of what 1990 called “formal vs informal integration,” or de facto versus de jure integration (Higgot, 1998). Even within the EU, there has been a significant gap between its written rules and countries’ adherence to those rules (Bearce, 2009; Boerzel, 2005; Jensen, 2007). 6 There may be a gap between what is formally signed at the international level and what actually works in practice. Such a gap has long been recognized in the research on central bank independence, where it is implied that focusing on the legal texts is insufficient to understand the behavior of central banks (Cukierman & Webb, 2002; Cukierman, Webb, & Neyapti, 1992; Mas, 1995; Maxfield, 1994, 1998).
Only recently have international relations (IRs) scholars begun to acknowledge the gap between what is agreed on and what is implemented in RTAs (Büthe & Milner, 2011; Haftel, 2007, 2013; Kim, 2011). 7 As of yet, though, few authors have offered a systematic explanation for this gap. So when and why do countries fail to live up to the agreements they have reached at the international level?
Most of the literature on comparative regionalism note that wide variation exists across the actual economic output of RTAs (Breslin & Higgot, 2003; Payne & Gamble, 1996; Rosamond, 2007; Sbragia, 1992). Many of the prominent authors in this field focus on the track record of specific regional arrangements—and usually these scholars take a fairly dim view of these agreements. 8 Area specialists criticize the regional arrangements in Asia, for example, as failing to live up to their potential, particularly in terms of political implementation (Baldwin, 2007; Hughes, 1991; Ravenhill, 2002, 2008a, 2008b). In Asia, surveys estimate the utilization rate for free-trade agreements (FTAs) in the region to be around 24% (Takahashi & Urata, 2009), with only 3% for ASEAN’s FTA (Baldwin, 2007; Dayaratna-Banda & Whalley, 2005). In contrast, for European FTAs, many consider utilization rates below 50% to be low (Augier, Gasiorek, & Lai-Tong, 2005). Similarly, a comparative analysis of SADC (South African Development Community), MRU (Mano River Union), and COMESA (Common Market for Eastern and Southern Africa) in Africa has shown that all three have a “largely dismal track record” of implementation (Soderbaum, 2005), and even at the time of the formation of these agreements, many scholars did not hold out much hope for their success (Bourename, 2002). According to one study, it is “no secret that the majority of projects in SADC’s project portfolio were more of a direction of intent, even a ‘wishing list’ rather than a development plan with a realistic funding plan” (Soderbaum, 2005, p. 194). This points to the gap between what these agreements look like on paper and what they actually achieve. Even though these agreements boasted many of the calling cards of good international institutions—including dispute-settlement mechanisms and legalized language—these measures do not seem to have been implemented in any real sense. In fact, among agreements in the developing world, most nations trade more heavily with developed countries—such as the United States and the EU nations—than they do with one another, despite the presence of regional agreements.
This literature also assumes that a regionalism that hinges on legalism, institutions, and the acceptance of supranationality will not function anywhere outside of Europe (Higgot, 1998; Jetschke, 2010; Laffin, 1998; Sbragia, 1992; Soderbaum & Sbragia, 2010; Wallace, 1990). This, many scholars argue, comes at least in part from the fact that many of these agreements simply copy the structures of the more prominent regional arrangements (Boerzel & Risse, 2009a; Jo & Namgung, 2012; Lenz, 2012). The Andean Community was explicitly modeled on EU institutions, as was Mercosur to some extent (Camargo, 1999; Saldías, 2007; Vasconcelos, 2007). Although this modeling was not coerced in its initial stages, in the late 1990s, the EU began offering incentives in the form of structural funds to promote the institutionalization of the Andean Community, Mercosur, Caribbean Community (CARICOM), and various African agreements (Boerzel & Risse, 2009a; Schimmelfennig, 2009). But these institutions—such as courts and consultative bodies—are scarcely used (Bouzas & Soltz, 2001). Similarly, the 2007 ASEAN Charter “heavily emulates EU concepts and terminology and presents what could have been a lean version of the Constitutional Treaty” (Boerzel & Risse, 2009b, p. 13). Despite all that, however, ASEAN still lacks true supranational authority, with no independent authority of delegates to reach decisions and no independent dispute-settlement mechanism. Along those same lines, South African Customs Union’s (SACU) revised treaty, from 2002, established an intergovernmental institution to take charge of creating and implementing policies related to the customs union and delegated authority for trade negotiation. This, however, is the only example of its kind among all the trade agreements in Africa. For example, even though Economic Community of West African States’s (ECOWAS) revised treaty, from 1993, incorporates principles of supranationality in the style of the EU, member states do not adhere to those principles (Bach, 1983). Thus, institutional design alone may give researchers an overly optimistic view, given their design similarity to those of effective organizations. Indeed, often there is little domestic capacity for implementing the substance of these arrangements, no matter what government intentions may be.
Domestic Capacity and the Implementation Gap
Trade cooperation is different from simply agreeing that countries will not fight with one another. Meaningful reciprocal trade liberalization involves not just the lifting of tariffs, which is essentially a matter of government commitment and its resolve in overcoming domestic interests that might oppose liberalization (Grossman & Helpman, 1995). It also means that—even if heads of state and key parts of the domestic public are willing to lift protection on sectors—the country must have the basic infrastructure to realize the promised gains from trade as the result of a new agreement. Thus, for agreements to be considered successful, it means that trade potential that was previously not realized must be unlocked. In other words, trade must actually increase to an extent that would not have been possible without an agreement. 9 This basic requirement aside, many agreements also add on more ambitious goals as well. If they establish dispute-settlement mechanisms, those mechanisms must be deployed by member states and deliver just verdicts. If they tackle trade issues that go beyond the simple reduction of tariffs on goods—such as trade in services or the reduction of nontariff barriers—those commitments must be upheld as well. All these add up to a level of compliance, technical competence, and effectiveness that would be difficult to achieve no matter how honorable government intentions might be. Without sufficient domestic capacity, we might then see that the promise of international agreements may not always be fulfilled.
We know from the literature on the political economy of development that implementing effective policy reform can hit all sorts of barriers in countries where domestic institutions are weak and the stock of basic facilities and capital equipment is inadequate. Without measures specifically targeting these areas, it is perhaps unreasonable to expect that government commitments at the international level will translate into action domestically. Indeed, poor infrastructure is one of the major hindrances of the distributions of goods all over the world. Developing countries especially tend to have serious problems with basic transportation infrastructure—in particular ports, roads, and airports, all of which are necessary for businesses to send their goods abroad and for other countries’ goods to reach domestic consumers.
This potential obstacle has been noted in studies examining the record of countries’ trade flows altogether (Dollar, 1992). Similarly, many studies have found that infrastructure is a stronger determinant of growth in trade than tariff reduction (Bougheas, Demetriades, & Morgenroth, 1999). Others have noted that infrastructure in developing countries poses a significant barrier even if policy reform lists toward openness (Buys, Deichmann, & Wheeler, 2010; Edwards, 1993; Hill, Chae, & Park, 2012; Limão & Venables, 2001). As of yet, however, this link has not been carried over to evaluating the effectiveness of international agreements involving trade.
Road transportation plays a crucial role in cross-border trade to and from third countries, and politicians and policy specialists have also long acknowledged this problem. One of the obstacles of integration in Asia, for example, which businessmen have long acknowledged, is the lack of harmonization of infrastructure; current calls for deeper integration hinge on “developing region-wide infrastructure that would help ease the distribution of goods.” 10 Similarly, in Namibia and Botswana, transportation-related transaction costs for moving a 20-foot full container load from a port to its final destination come to US$3,000 and US$5,000, respectively, compared with US$800 and US$500 for Germany and Sweden, respectively. 11 Poor infrastructure accounts for much of the cost. Indeed, SADC’s executive secretary stated in 2008 that any increases in intra-SADC trade hinged on “a vibrant transport infrastructure.” 12
In addition, the implementation of international agreements on the domestic level requires that contracts of all types be enforced on the ground. Upholding the rule of law is another prerequisite for maintaining credible commitments. Contract enforcement and the respect for legal commitments at the level of the member state more generally should factor heavily into whether a country is able to fulfill its international commitments. The role of institutions such as the rule of law in the successful implementation of domestic policy reform has been exhaustively documented (Acemoglu, Johnson, & Robinson, 2002; Rodrik, 1999), and though many claim that this relationship spills over into the international arena as well (Büthe & Milner, 2008; Mansfield & Reinhardt, 2008; Martin, 2003), the mechanisms behind this are infrequently tested.
Here, too, anecdotal evidence of the difficulty of implementing trade agreements in the absence of the rule of law abounds. A business survey for SADC found that intra-SADC exports and imports represented the smallest share of trade for every member country, and that businesspeople cited inadequate transportation infrastructure, along with varying institutional quality and corruption, as deterrents to their engaging in greater regional trade.
13
One COMESA representative described the situation as follows:
Governments are always facing these problems once the agreements actually come face to face with the lower officials. Customs can be a problem. It is one thing to agree on some standard when goods cross borders, but is the guy sitting on the border in the customs office actually going to take this into account? Maybe, maybe not.
14
There is reason to believe that there is also an interactive effect between these two variables. On their own, both might have significant explanatory power, but a country can have top-quality infrastructure and be relatively corrupt, and that infrastructure will serve no purpose in trade promotion. Similarly, countries with strong governance but poor infrastructure will not be able to put their good intentions into practice. Thus, these two variables in combination could also be expected to have a positive effect on the implementation of RTAs.
In sum, we might expect that the same factors that make trade possible at the domestic level—specifically, domestic capacity in terms of infrastructure as well as institutions—will also make leaders able to uphold the trade commitments that they make at the international level. 15 To put it in a more succinct and falsifiable manner, I outline the following testable hypotheses:
Hypothesis 1: Where the rule of law in member states is strong, the ability of the trade agreements to increase trade and to meet their own goals will be high.
Hypothesis 2: When infrastructure in member states is good, the ability of the trade agreements to increase trade and to meet their own goals will be high.
Hypothesis 3: Good infrastructure interacted with strong rule of law will have an added, positive effect on the ability of the trade agreements to increase trade and to meet their own goals.
These explanations stand in contrast to arguments that hinge on the binding nature of formal commitments or on political arrangements in a given region. For example, Duina (2008) argues that nations with civil law as their dominant legal tradition will have more institutionalized regional arrangements, whereas states with traditions of common law will take a more gradual approach to institutionalization. Many authors (Koremenos, Lipson, & Snidal, 2001; McCall Smith, 2000) equate that institutionalization with the effectiveness of the agreements. Others argue that the number of veto players in a country—that is, actors who could potentially stand in the way of reform—dictates the success of the implementation of meaningful policy change. 16
Another possibility is that domestic capacity could have a direct effect on trade itself, rather than on the indirect effect of trade agreements to be effective. We might still see a positive relationship between capacity and agreement effectiveness, but the channel might come through domestic capacity’s impact on trade rather than on the ability of those agreements to work well. Thus, experts might be more likely to say that PTAs work well in those cases where the member states are trading significantly with one another, regardless of the real reason for the trade.
These additional arguments will be evaluated alongside the hypotheses put forward above. The following section tackles these concerns head on through empirical testing of these propositions.
Operationalization and Data Analysis
First, how might we effectively capture the degree to which RTAs actually work? Many scholarly efforts in the social sciences involve substantial, extensive coding efforts that hinge on the provisions covered in various treaties. 17 Many of those coding efforts, however, arrive at different conclusions. 18 This is understandable and probably even reflective of reality: Agreements that seem rigorous by one person’s standard may be less convincing to another. What this reflects, though, is an implicit acknowledgment that the effectiveness of these agreements lies to some extent in the eyes of the beholder. There is a latent dimension of effectiveness that has yet to be captured by simply looking at, say, trade flows or tariff reduction among members of a given agreement.
This points to an opportunity to implement expert surveys, a method that is widely used in comparative politics but as of yet has made relatively little headway in IR. 19 Expert surveys are useful when we want to capture latent dimensions of a particular issue for which data may not exist. 20 International agreements are a strong candidate for this method. Some aspects of these agreements—in the case of RTAs, their product coverage, implementation schedules, degree of legal obligation imposed, and trade targets—are measurable, simply through looking at the agreements that member states sign. But the most important aspects of those agreements—whether member states implement the product coverage thanks to the RTA or to some other force, whether they respect the agreement’s legal language in practice, and whether the changes in trade match the full potential of the region—are not directly measurable, despite anecdotal evidence that might support one conclusion or another. However, that information can be gathered through systematic surveys of experts who deal directly with those agreements as they function in the world.
A drawback of these survey methods is that they are by definition a snapshot in time: They cannot capture the changes in an agreement’s life span. For example, Mercosur by many accounts started out as relatively successful in liberalizing trade among its member states, but as the “easy” products were liberalized, states simply began to supplant tariffs with nontariff barriers, and today the bloc is considered to be somewhat protectionist. 21 Similarly, ASEAN started out with a very weak commitment to institutionalization and integration, but after the East Asian financial crisis in 1997, it strengthened its mandate for economic cooperation (Ravenhill, 2004). As these surveys were conducted at essentially the same time (all between 2008 and 2010), they only represent expert assessment of a given RTA’s performance at that moment. This unfortunately precludes the possibility of a dynamic analysis of these agreements. It also means that the number of cases will be relatively small; there are only so many RTAs in the world, and the lack of a temporal component to the data means that the analysis will be cross-sectional, not cross-time. In addition, because the survey focused on the more institutionalized aspects of the agreements, experts tended to score not the 400-some PTAs currently in existence, but rather the agreements with secretariats or those that have a permanent physical structure with some level of permanent bureaucracy; this reduces the number of possible agreements to around 40 (Haftel, 2013). However, the advantage of obtaining quantifiable measures of the functionality of these agreements from the people who are involved on a regular basis with those organizations outweighs those potential downsides.
To this end, this article utilizes the expert survey method to cull an assessment of the functioning of regional and preferential trade arrangements in the international system. 22 Data come from Gray and Slapin (2012). Surveys were distributed between 2008 and 2010, to experts from trade negotiating offices and think tanks in the United States, Europe, Southern Africa, Asia, and Latin America. The response rate was about 15% for a total of 27 experts. This is typical for most surveys, where response rates are usually around 20%. Experts were guaranteed anonymity, though their place of work was recorded. Table 1 lists the agreements scored by experts.
List of Agreements.
Experts rated various dimensions of regional agreements along a 10-point scale, similar to “feeling thermometer” questions in surveys. The questions presented a particular dimension with statements that characterized either end of the extreme. The survey data span many different areas of a regional agreement’s functioning, but to best capture the argument put forward in this article, I focus on just two: one on trade generated, and the second on whether it fulfills all the agenda items covered in the agreement. The first is the trade among RTAs that might have otherwise gone to other countries. 23 This dimension captures the trade that would have gone to countries outside of the preferential arrangement but instead goes to members of the RTA. It is crucial to have expert opinion in quantifying this dimension because otherwise the counterfactual is almost impossible to attain; how would we know whether trade generated among members of a regional arrangement might have gone somewhere else in the absence of that agreement? The scores on this variable (for which there are 28 agreements ranked) range from 1 to 8, with a mean of 4.4. It is worth noting that the standard deviation for these scores is 2.04, which indicates that there is a relatively tight consensus of experts as to how effective these agreements are in generating trade that would have otherwise been impossible without the agreement.
The second operationalizes whether the items in an agreement’s scope (its breadth, not the number of members) are actually fulfilled. 24 Again, expert assessment is necessary here to assess whether the member states of a trading arrangement have actually lived up to the international commitments that they made. This variable (n = 33) ranges from 0.1 to 9.5 and has a mean of 2.86, with a standard deviation of 2.09—again, representing a relatively solid expert consensus. It should also be observed that the average value of this variable is relatively low given the range, a sign that many of these agreements do not implement the full breadth of their scope. These two measures will serve as the dependent variables in the analysis to follow.
To capture the explanatory variables described in the hypotheses above, I proceed as follows. I operationalize the rule of law, as proposed in the first hypothesis, using the International Crisis and Risk Group’s (ICRG) measure of the rule of law across countries. ICRG assigns risk points to a group of political risk components ranging from 0 to around 10; the original variable is scaled such that the higher the score, the higher the level of risk, but I invert this scale to make the measure more intuitive. It should be noted that this measure is a relatively blunt measure of institutional capacity. Veto players, oversight mechanisms (Pahre, 1997), or partisan politics such as party ideology or whether party preferences are mainstream or “niche” (Jensen & Spoon, 2010) can drive the actual workings of institutional structures. For example, we can imagine that generalized rule of law works similarly in developed countries, but different court structures or political factors might result in different types or degree of implementation. As is true for many comparative coding efforts, though, there is a trade-off between the level of detail and overall coverage in data that seeks to capture institutional capacity. Many of these efforts are restricted to a few countries, and usually those in Western Europe or the OECD. Because the survey data set covers agreements all over the world, these more elaborate measures unfortunately cannot be used in this particular setting.
I measure infrastructure as the ratio of the length of the country’s total road network to the country’s land area, a variable gathered by the World Bank. 25 I collapse these variables into averages for all the countries in a given agreement. Higher values on both of these variables—that is, higher road density and stronger rule of law—should lead to RTAs with better implementation records.
In addition, I examine whether member states tend to have civil law traditions; as mentioned above, these should lead to more greatly institutionalized—and, some argue, more effective by definition—regional arrangements (Duina, 2008). Countries with civil law systems—that is, structures based on Roman law, which giving precedence to written law and a systematic codification of their legal system—make up nearly 40% of those worldwide. These data are available from the website on World Legal Systems. As with the rule of law measure, this general distinction can mask important differences. For example, Falkner (2010) describes crucial variation among Central European legal systems, even though all of them are technically common law systems; Moe (1990) and Moe and Caldwell (1994) elaborate on key differences between U.K. and U.S. legal systems, although both are civil law; and Alivizatos (1995) discusses the importance of judges in variation in policy implementation even within the same legal system. Again, a more precise analysis of these structures is limited by the availability of detailed coding efforts that span the globe, but the distinction between civil law and code law is a first cut.
To capture the argument about veto players, I use Henisz’s (2002) data on political constraints, which count the number of actors who serve as veto players in a given country; a higher number of constraints makes policy reform less likely. As before, this measure is not ideal; one would at least want to know the positions of the relevant veto players toward compliance and not just the overall number. However, especially for developing countries, ideological preferences regarding treaty compliance or other aspects of regional trade more generally are not currently available. A simple count does generate predictions for the overall tractability of the political environment; however, it is a satisfactory proxy in this case.
Finally, to demonstrate that the expert survey assessments are capturing a direct effect on agreement performance and not on trade itself, I construct a proxy variable for the expected levels of trade among PTA members. In line with the literature on gravity models, I calculate the average gross domestic product (GDP) of members in the agreement, logged to normalize the distribution.
I turn all of these monadic explanatory variables into variables at the agreement level by constructing a weighted average (weighted by population) 26 for each variable for all the countries in the agreement for 2008. For countries that signed onto those agreements after the organization’s initial formation, I used the values for the explanatory variables at the year of their joining. The following section conducts analysis on these data.
Data Analysis
First, let us examine how these survey data hold up against the main independent variables. Figures 1 to 4 show simple bivariate plots of these relationships, with best-fit lines drawn in.

Increased trade and the rule of law.

Increased trade and infrastructure.

Implementation and the rule of law.

Implementation and infrastructure.
Starting with the first two graphs, it seems as though substantial variation exists in the degree to which regional arrangements promote trade. Experts deem about half of the agreements under consideration to have been effective at generating trade among members who might have otherwise gone elsewhere. Some of the better known agreements—including the EU, Mercosur, and NAFTA, along with the Central European Free Trade Agreement (CEFTA) and the Gulf Cooperation Council (GCC)—are credited with the highest levels of trade among members. Agreements at the lower end of this scale include, perhaps unsurprisingly, lesser known ones, including the Economic Community of Central African States (ECCAS), the Melanesian Spearhead Group (MSG), and the Organization of East Caribbean States (OECS). The average rule of law in member states explains about a quarter of the variation in trade generation (Pearson’s r = .55). As predicted in Hypothesis 1, where member states have higher average values on the rule of law—such as in the European agreements and in NAFTA, as well as South Pacific Trade and Economic Co-operation Agreement (SPARTECA), an Asian trade agreement in which Australia is the largest economy—they also seem to be associated with what experts deem high levels of trade. However, this variable falls short in describing a few of the agreements; Mercosur and the South Asian Free Trade Agreement (SAFTA) have relatively weak rule of law in their members and yet manage to promote trade. In terms of road density, though the overall correlation is fairly high (r = .61), we see that it is the higher end of trade volumes that are less well explained; the Commonwealth of Independent States (CIS), the Arab Maghreb Union (AMU), and the EU all have relatively high road density on average, but their ability to promote trade among their members ranges from the very top of the scale to the very bottom. This foreshadows the need to include an interactive component to these two explanatory variables; on their own, they do not fully explain the variation in the trade-promoting power of these arrangements.
The second two graphs depict the relationship between those same explanatory variables and the implementation record of these agreements (specifically, the breadth of their scope and their ability to meet those goals). First, we see that the pattern on the outcome of implementation scores is somewhat similar to that for the trade scores; the better known agreements in the OECD countries (such as the EU, CEFTA, and NAFTA) score higher, whereas agreements in the developing world (MSG, Economic and Monetary Community of Central Africa [CEMAC], and ECCAS) score lower. Interestingly, there is some divergence: Both SAFTA and SPARTECA scored relatively high on trade promotion, but their implementation record in terms of the breadth of their commitments is lower. This might indicate that, although these agreements are somewhat successful in generating trade, they have not upheld the other aspects of their agreements.
Comparing the implementation record with the domestic capacity variables shows a similar pattern as above; though the rule of law is highly correlated with implementation (r = .61), it does not tell the whole story. SPARTECA, for example, is a notable outlier, with implementation scores far lower than one might expect given its governance. That variable, again, does have explanatory power at the lower end of the scale; some of the worst performing agreements also have low average governance quality among their member states. The story is similar for road density; the correlation is fairly high (r = .40), but many member states have good-quality roads but vary widely in the implementation of their agreements, with the EU at the top and SAFTA and the CIS at the bottom. Again, this shows that these variables explain a nontrivial part of the variation in these agreements but are not sufficient on their own to describe the full picture.
Combining the variable for road density with that for the rule of law, however, shows how these two factors can work in tandem to explain the implementation of RTAs. Figures 5 and 6 show the relationship between those same two outcomes and the interaction between the rule of law and road density.

Implementation and capacity (Law × Infrastructure).

Trade and capacity (Law × Infrastructure).
We see that the best-fit line is much tighter once these variables are combined. Similarly, the correlation coefficients for both of these outcomes are substantially higher once these two variables are interacted: .561 for implementation and .69 for trade. The most notable outliers appear on the implementation graph, where SPARTECA, the CIS, AMU, and SAFTA still have relatively lower implementation scores than would be predicted by the combination of good roads and respect for law. This indicates that the prevalent lack of success of PTAs still needs further theorizing. Yet, the interaction of law and infrastructure seems to be highly correlated with well-functioning trade agreements. The following section evaluates some of the previously described rival explanations through nonparametric tests.
Rival Explanations
Of course, these correlations do not take into account possible alternative explanations. As described in the section above, many argue that institutionalization can predict effectiveness, or that domestic political constraints in the form of veto players can make actual policy reform less likely, or that the expert survey data are simply capturing trade increases and not the effects of the agreements themselves. The relatively small data set presents a challenge for evaluating these claims through standard techniques, but I use two different correlation tests to do so. Nonparametric tests are more appropriate for this data set, but a stripped-down ordinary least squares (OLS) models revealed the same fundamental pattern as what those tests and the scatterplot describe: positive and statistically significant associations with both trade and implementation for infrastructure and the rule of law. 27
I present the correlations of the political constraints variable as well as that for civil law systems alongside those of the variables put forward in the main hypotheses. To this end, I use Spearman’s nonparametric (or distribution-free) rank statistic, which evaluates the ability of the relationship between two variables to be described using a monotonic function. This technique converts the raw scores Xi and Yi to ranks (xi and yi), and then calculates the differences between the ranks of each observation. If two variables are monotonically related—even if it is not a linear relationship—this will result in a Spearman correlation of 1, which would differ from a Pearson correlation. I also calculate Kendall’s tau, which is specifically intended to be used with small- and moderate-sized data sets, as is the case. I display below the τa statistic, which tests the strength of the association of cross-tabulations, as well as the τb statistic, which makes adjustments for ties. The Spearman and the Kendall statistics test the null hypothesis of independence of any two variables.
Table 2 shows the correlations between the two outcome variables of interest (trade and implementation) with the main explanatory variables in the hypothesis, as well as the variables capturing the rival explanations.
Spearman’s and Kendall’s Tau Correlations.
p < .01, ** p < .05, *** p < .10
We see that—consistent with the presentation of the data in the scatterplots—the correlations between the variables describing well-functioning trade agreements and those describing domestic capacity (in terms of the rule of law, infrastructure, and their interactions) are all positive and statistically significant below the .10 level. This confirms the three hypotheses described above. The relationship is particularly strong when describing increased trade. This further confirms the pattern shown in the scatterplots; domestic capacity is a clear and significant driver of increased trade and implementation across regional agreements.
By contrast, neither the variable for institutionalization (recall the prediction that countries with civil law traditions will prefer more institutionalized international arrangements, which in turn should arguably be more effective) nor for domestic veto players (which should lead to decreased effectiveness) have statistically significant correlations with the outcome variables; the correlations are not significantly different from zero. In addition, the proxy for expected trade (distances from capitals in the agreements) is not significantly correlated with either of the outcome variables, which perhaps makes sense given the lack of pattern of distances in a wide variety of agreements that do not perform particularly well. This helps rebut those alternate explanations.
This very small sample limits what we can estimate in terms of a causal effect, as well as for controlling directly for the effects of these variables. Nonetheless, it is evident that the basic correlations between standard gravity-model predictions as well as other political variables do not have the same strength as those of domestic infrastructure and rule of law.
Conclusion
This article has proposed that international agreements are only as effective as their domestic capabilities. Specifically, the implementation gap in RTAs as well as their ability to actually generate trade that would not have otherwise been created can be explained in large part by their domestic infrastructure as well as the strength of the rule of law in member states. If these two areas are lacking, government commitments at the international level will not be effectively carried out. The article uses expert survey data to give quantitative assessments of the actual implementation record of these agreements. In a departure from much cross-sectional research, I show that expert opinion on the performance of these agreements correlates strongly with variables measuring domestic infrastructure and the rule of law.
In sum, without taking into account the characteristics and limitations of member states, we risk being left with an overly optimistic view of the possibilities of international agreements to constrain or otherwise change the behavior of those states. Furthermore, without explicitly measuring how these agreements work in practice—and not just what they look like on paper—we cannot correctly identify the real degree to which states have tied their hands. Thus, an understanding of the extent to which international agreements of any sort compel states to make credible commitments is incomplete without seeing how states actually internalize those agreements, as well as giving serious consideration to the conditions on the ground in those states that make implementation possible in the first place.
These findings also have implications for the literature on the rational design of institutions. If domestic capacity is low, no institutional design—no matter how ambitious or legalized—can make states do what they are fundamentally unable to accomplish at the domestic level. If that is the case, why, then do states sign on to these commitments? Are they unaware of their own capacity limitations? Do they sign these agreements in the hope that they will spur reform, and then end up disappointed? Is it simply an empty promise that they have no intention of carrying out? The incentives that drive heads of states to sign overly ambitious agreements deserve extensive further study.
Future research should explore in greater detail the domestic constraints that make government commitments difficult to implement. This is an important consideration, because the resolve of leaders at the international level—even if they result in strong and ambitious agreements—will not amount to much unless explicit attention is given to those leaders’ ability to carry out their promises at the domestic level. Trade agreements, then, might be better served by taking domestic capacity into account before they make lofty goals that they might not be able to meet.
Footnotes
Acknowledgements
Thanks to Despina Alexiadou, Alex Baturo, David Bearce, Daniela Donno, Yoram Haftel, Michael Goodhart, Alexandra Guisinger, Raymond Hicks, Joe Jupille, Moonhawk Kim, Soo-Yeon Kim, Helen Milner, Ed Mansfield, Krzysztof Pelc, Nita Rudra, Alberta Sbragia, Beth Simmons, and Jon Slapin for comments. My gratitude as well to James Caporaso and two anonymous reviewers for helpful suggestions.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Thanks to the University of Pittsburgh’s European Union Center of Excellence for financial support for the fieldwork behind this article.
