Abstract
To understand the limitations of discrete regime type data for studying authoritarianism, I scrutinize three regime type data sets provided by Cheibub, Gandhi, and Vreeland, Hadenius and Teorell, and Geddes. The political narratives of Nicaragua, Colombia, and Brazil show that the different data sets on regime type lend themselves to concept stretching and misuse, which threatens measurement validity. In an extension of Fjelde’s analysis of civil conflict onset, I demonstrate that interchangeably using the data sets leads to divergent predictions, it is sensitive to outliers, and the data ignore certain institutions. The critique expounds on special issues with discrete data on regime type so that scholars make more informed choices and are better able to compare results. The mixed-methods assessment of discrete data on regime type demonstrates the importance of proper concept formation in theory testing. Maximizing the impact of such data requires the scholar to make more theoretically informed choices.
Introduction
Scholarship on authoritarian regimes has seen a surge in quantitative studies investigating the effects of institutional choice on various outcomes. Much of the recent work makes use of discrete data on authoritarian regime types. By pointing to the limitations of using a continuous, unidimensional score to characterize regimes, Gleditsch and Ward (1997) and Vreeland (2008) highlight the importance of differentiating authoritarian regimes into distinct types. Some of the regime classifications that are available are nearly exclusively focused on adding nuance to transitioning states that are semidemocratic. 1 Others, such as Cheibub, Gandhi, and Vreeland (2010); Geddes (2003); Hadenius and Teorell (2007); and Wright (2008), are concerned with classifying the full breadth of authoritarianism. This latter group of data sets is the subject of inquiry. In this article, I evaluate data sets provided by Cheibub et al., Geddes, and Hadenius and Teorell and consider their impacts on the comparative study of regime type.
What are the limitations of discrete regime type data? This inquiry follows in the tradition of scholars who examined the empirical limitations of data on democracy (Casper & Tufis, 2003; Gleditsch & Ward, 1997; Vreeland, 2008). The next step is to apply the same scrutiny to data on authoritarian regime types to understand how measurement can be improved. This has so far not been done. For one, it is important to hold the creators of these data to similar standards and to provide a basis for comparison. More importantly, it is apparent that scholars are not fully aware of the potential issues surrounding data selection. This is suggested by recent studies that substitute the data sets for each other and use different data sets to test similar theories (Charron & Lapuente, 2011; Fjelde, 2010; Hankla & Kuthy, 2011). Such a practice exposes conceptual issues and glosses over concerns about measurement validity.
My critique is not meant to demonstrate the superiority of one data set over another nor does it suggest that continuous data on regime type are preferable. Rather, the problem is one of improper concept formation, which can weaken the meaning that can be placed upon a set of results. This article illustrates potential limitations of different data sets so that scholars can maximize the use of discrete data on regime type and better compare results (Casper & Tufis, 2003). To accomplish this, I summarize three regime type data sets and the coding rules used to create them. Both qualitatively and quantitatively, I evaluate two types of threats to validity (Adcock & Collier, 2001). Criterion validation is necessary to show that a test adequately classifies a set of traits; content validation concerns how accurately a measure captures the concept of interest. I build my case from narratives on institutional change in three Latin American countries—Brazil, Colombia, and Nicaragua—and explore each issue in the context of a recent quantitative model of civil war onset. I conclude that discrete data sets on regime type are not good substitutes and that they are vulnerable to concept stretching. The issues raised herein should not be thought of as simply measurement issues. The larger issue concerns concept formation and using the appropriate measure to capture the stated theoretical concept. My evaluation of the limitations of discrete regime type data thus underscores proper data selection as a crucial element of best practice political science research. To this end, I offer practical solutions that encourage rather than discourage the use of discrete regime type data.
Theory
Among scholars who study authoritarian regimes, a number of explanations have been offered that link regime type to various outcomes. One approach links regime type to the incentives that elites have to stay in power (Geddes, 2003; Svolik, 2008, 2009; Wright, 2008). Faced with the prospect of losing office, a dictator’s options include coercion or cooptation, which are determined by the sources of support that are available. Abundant resources can alienate a dictator from the populace by providing a cheap source of income and making a large supporting coalition unnecessary. Where possible, an entrepreneurial leader might try to alienate opposition and consolidate personal power. Such leaders rely on personal networks, bribery, and secret police to avoid creating coalitions that can be binding (Bratton & Van de Walle, 1994, 1997; Geddes, 2003; Wright, 2009).
In the presence of a security dilemma, a leader can also look for support by creating a standing military, which offers prospects for future military interventions (Cheibub, 2007; Svolik, 2008). In a military that is highly involved in politics, officers are focused on security and are less interested in maintaining office at the risk of elite disunity (Geddes, 2003; Nordlinger, 1977). Where necessary, however, the military is willing to step in to secure its own interests. The actions of military governments have been attributed to their organization (Fontana, 1987; Geddes, 2003) as well as their executives (Horowitz & Stam, 2012).
Leaders can also rely on electoral institutions to preserve their longevity or to safeguard their exit (Cox, 2008; Debs, 2010; Gandhi & Lust-Okar, 2009). Some create legislatures to appease threatening opposition (Conrad, 2011; Gandhi, 2008). When binding, legislatures in authoritarian regimes also attract investment (Haber, 2006; Wright, 2008). Parties perform similar functions, in that they bind divisive elites, spread information, and prevent defection (Brownlee, 2007; Gandhi, 2008; Gandhi & Lust-Okar, 2009; Kinne, 2005; Magaloni, 2007). They are also an effective means of distributing money and positions (Chang & Golden, 2009; Magaloni, 2007).
In focusing on the institutions on which leaders base their support, a general consensus has emerged which distinguishes among authoritarian regimes that are personalist or narrowly supported, military-led, and party-based. Notwithstanding, the way in which scholars classify authoritarian regime differs considerably. Geddes (2003) argues that continuous data on regime types are not applicable to certain research questions and proposed instead a discrete classification based on leaders’ incentives for maintaining power. Her research question is whether, on the basis of having leaders with different incentive structures, authoritarian regimes have different survival times. Wright (2008) updated these data to include monarchies, regimes lasting less than 3 years, and prior Soviet-era countries. Hadenius and Teorell (2007) built upon Geddes’s (2003) with the question of whether some authoritarian regime types were more likely to democratize than others. The Cheibub et al. (2010) coding scheme is based on the dichotomous classification of democracies and dictatorships introduced in Alvarez, Cheibub, Limongi, and Przeworski (1996) and Przeworski, Alvarez, Cheibub, and Limongi (2000). The central focus of these authors is on the broad electoral rules that distinguish types of regimes and the type of leader turnover.
Table 1 summarizes the composition of each of the three data sets, listing the coding rules for each regime type as well as the temporal/geographical coverage. Important similarities can be found across the three data sets, which characterize a specific approach to the study of authoritarianism. Each of these authors assert that rules and institutions coincide systematically, comprising distinct “types.” There is also an implied similarity in their starting point for classifying regimes, which are the criteria for executive selection (i.e., “rules by which leaders and policies are chosen,” “political power maintenance,” “rules by which the leader is replaced”). To different extents, each data set also distinguishes between military- and civilian-based authoritarian regimes. The Cheibub et al. (2010) and Hadenius and Teorell (2007) data sets include democracies. The three data sets also have overlapping temporal and spatial domains.
Summary of Three Discrete Data Sets on Regime Type.
Despite their semblances, the data sets exhibit fundamental differences that stem from the research question that each author had in mind when constructing the data. For one, the Geddes data emphasize personalistic rule as a type, which others do not. Hadenius and Teorell (2007) do not distinguish between democracies in their sample, and Geddes does not code democracies at all. There is also considerable difference in how the authors treat civilian/party-based authoritarian regimes. None of the authors were in consensus on how to classify party-based autocracies. Beyond the differences in the three data sets regarding concept categories, there are more fundamental differences that may have empirical consequences. For example, the Geddes and Hadenius and Teorell data sets acknowledge when regimes do not fit perfectly into one type or the other, but the Cheibub et al. (2010) categories are exclusive. The authors also differ in how to handle ambiguous cases and the timing of regime change. The Cheibub et al. data set codes regimes using ex-post information about how the leader was replaced to determine what the regime was for the duration of the leader’s tenure. Moreover, the Geddes data set has a tendency to outlast the other two data sets by 1 year due to its emphasis on the effective year of regime change (denoting regimes that lasted beyond December 31).
The choice provided by alternative data sets on authoritarian regime type is good for researchers seeking a second opinion or looking for a specific construction of data. Nevertheless, the use of these data sets is not without problems. A cursory look at the literature on authoritarianism suggests a variety of causal mechanisms. It is a somewhat common for scholars to test these mechanisms with one of the data sets without justifying the particular data set they use. The problem, however, is that the data sets on authoritarian regime type do not measure the same things; they are conceptually distinct and thus are not equally suitable for testing particular causal mechanisms. More commonly, scholars test their theories with more than one of the data set. Examples include Hadenius and Teorell (2006), Hankla and Kuthy (2012), Hanson (2012), Charron and LaPuente (2011), Fjelde (2010), Cornell (2012), and Wallace (2011). The idea behind doing so is that using an alternative discrete data set that partly overlaps provides a conservative test of one’s theory, thereby demonstrating robustness. All the same, finding a similar result using different data is not necessarily a good thing. To the extent that categories contain different cases, it suggests that the mechanism under investigation—the question for which a particular data set is justified—is not the proper cause of an outcome. What is more, regime type is a blunt (albeit sometimes necessary) approach to theory testing, which makes it difficult to reject alternative hypotheses about the effect of particular institutions associated with regimes. For these reasons, one must be acutely aware of how the data were coded and the threats to validity caused by their improper use, on which this study elaborates.
Qualitative Analysis
Constructing the data required substantial descriptive knowledge and precise identification of a country’s institutional make-up. It is nevertheless difficult to classify the entire universe of cases. Criticisms of other data sets by Hadenius and Teorell (2007) center on “truly categorical regime traits,” thus emphasizing the need to understand their qualitative differences (p. 144). To this end, I evaluate the three data sets in the context of Latin American cases, following Mainwaring, Brinks, and Pérez-Liñán (2001). I focus on political narratives in Brazil, Colombia, and Nicaragua, drawing from the case-selection criteria of Seawright and Gerring (2008). Table A1 in the appendix outlines my case-selection strategies. I provide examples representing an extreme case, a deviant case, and a typical case of authoritarianism in Latin America. Nicaragua exemplifies one which is highly variable across data sets and which can take on quite different values depending on which data set is used. Colombia is an outlier, proving to be less reliable as a case than is suggested by the three data sets. Brazil shows a great deal of institutional variation corresponding to the Institutional Acts, but such change does not register in the discrete data sets. Each of these cases demonstrates how data selection affects the different types of test validity that concern proper research design.
Criterion Validation
Criterion validity is a special case of convergent validation in which one indicator is taken as a standard of reference, and is used to evaluate other indicators (Adcock & Collier, 2001). More broadly, convergent validation is an assessment of the extent to which a measure is similar to (converges on) theoretically similar indicators. In this case, there is not a standard against which to compare the validity of a particular data set, but one can compare the discrepancy of each data set to another as a test of convergent validation.
Nicaragua As an Example of Convergent Validation
Of the Latin American cases, few are more discrepant than Nicaragua, thus making it an extreme case. It exemplifies disagreement between the authors—there are different levels and sources of disagreement over 50 years of its history. In the early 1930s, Augusto Cesar Sandino led a guerilla campaign against U.S. occupation. Sandino was subsequently assassinated in 1934 on the orders of the National Guard commander General Anastasio Somoza Garcia. General Somoza was elected president 3 years later (Millett, 2007; Walker & Wade, 2011; Wynia, 1990). The earliest data that I compared begins in 1946. Rule under Somoza Garcia is coded by Cheibub et al. (2010) as a military regime, due to his prior post in the National Guard. Following his assassination in 1956, his son Luis Somoza Debayle took over. Luis did not have prior military experience like his father. Thus, Cheibub et al. coded the period of rule under Luis Somoza as civilian rule. Luis commanded significant military presence; his brother, Anastasio Somoza Debayle, headed the National Guard during this time. In 1967, Luis Somoza died of a heart attack at the age of 45. He was succeeded by his brother Anastasio, which according to Cheibub et al. returned the country to military rule (Millett, 2007; Walker & Wade, 2011; Wynia, 1990).
Throughout the 44-year-long family-run dictatorship, Geddes codes the country as being under personalist rule. Corruption and abuses were prevalent; power was consolidated enough that it could be passed between family members; and the deaths of Anastasio Garcia and Luis Somoza had affected subsequent politics (Millett, 2007; Walker & Wade, 2011; Wynia, 1990). An interest in the personal hold of power by the Somoza family is justifiably different from the focus of Cheibub et al. (2010), who show power transitioning between militarist and civilian hands.
The apparent discrepancies in classifying Nicaragua increase with the introduction of the Hadenius and Teorell data in 1972, in which the authors coded Nicaragua as neither military nor personalist but instead a limited multiparty system. This is due to the presence of opposition parties such as the Broad Opposition Front (Frente Amplio Opositor) and the National Patriotic Front (Frente Patriotico Nacional), which were active in destabilizing the Somoza power-hold (Castillo, 1979). A military offensive on the part of the Sandanista National Liberation Front (Frente Sandinista de Liberacion Nacional [FSLN]) led to Debayle’s ouster in 1979 (Smith, 1997). Hadenius and Teorell coded the insertion of the FSLN into politics as a rebel regime (it is considered a military regime in their condensed data). While the other authors may not disagree that the FSLN was revolutionary and militarist, they did not code rebel regimes as military regimes. Cheibub et al. (2010) coded it as civilian and Geddes, as a single-party. The behavior of the FSLN suggests that it was more than a rebel group in the lead-up to the 1984 election, however. In 1980, they established a legislature, which they replaced with a more liberal legislature in 1983 to undercut anti-Sandinista aggression by the United States (McConnell, 1996).
The 1984 elections were “[r]elatively clean, if imperfectly competitive” (McConnell, 1996). As the FSLN candidate, Daniel Ortega won the election fairly easily. According to Hadenius and Teorell, 1984 marked the involvement of the FSLN as a political party. For Cheibub et al. (2010), however, this election signaled the beginning of institutionalized democracy in Nicaragua, since Ortega came to power via an election and would pass it on to Chamorro via the same electoral rules. By the standards of Geddes and Hadenius and Teorell, Ortega’s term was still not sufficient to be considered a democracy. Geddes’s coding of Nicaragua stops at the 1990 election, in which Violeta Chamorro defeated Ortega as the UNO candidate with 55% of the vote. Chamorro was nevertheless criticized for rejecting constitutional reforms that would have prohibited nepotism, required legislative budget approval, shortened the presidential term, and expanded civil liberties (Prevost & Vanden, 2002). Thus, Hadenius and Teorell code Nicaragua as a limited government until the election of Arnoldo Aleman in 1996.
There is correspondence between the authors of the three data sets over the changes that surrounded the ouster of Anastasio Debayle, though they disagreed on whether certain somocistas acted as civilians, militarists, personalists, or party members. There was also disagreement over whether the FSLN immediately represented a party or remained a rebel group during its years in office before the 1984 elections. Ortega’s victory as the FSLN candidate would suggest that it had established itself as a party before then. Moreover, the FSLN created institutions beyond what would have been expected of a rebel group. The authors do not agree that Nicaragua had reached the status of democracy until 1996, though Cheibub et al. (2010) considered it one over a decade prior. There was also disagreement between Hadenius and Teorell and Cheibub et al. over when a country should be considered democratic. Other differences can be attributed to the simultaneous focus on elections, nonelectoral events, and individual leaders.
Interchangeably using data sets on regime type threatens criterion validity in research designs that set out to gauge the impact of regime type. If discrete data sets on regime type are discrepant in the classes to which they each assign observations, it then follows that they may generate different predictions about outcomes that are conditional on regime type. The case of Nicaragua exemplifies such discrepancy. To the extent that the discrete regime types do not correlate highly with each other in their ability to predict an outcome, this reflects that they do not measure the same set of traits and thus lack criterion validity.
Content Validation
Content validity concerns how accurately a measure represents the content domain that it was designed to measure. A set of coding rules may be incongruent to the content that it purports to represent (Adcock & Collier, 2001). Content validity can be undermined when the measure is too general to adequately differentiate cases, which depends on the purpose. Surely, each of the authors undertook coding with objectivity and an interest in precision. Creating a comprehensive global data set nevertheless entails holding complex cases to equal standards and risks overlooking important qualitative differences and uniqueness.
In some cases, elections were held but did not qualify as democratic (Haiti, 1990; Dominican Republic, 1961; El Salvador, 1990-1991; Nicaragua, 1984-1995). Others are coded as democratic although subsequent elections were preempted by a coup (Guatemala, 1958-1962 and 1966-1981; Honduras, 1957-1962 and 1971; Venezuela, 1946-1947; Ecuador, 1946; Peru, 1956-1967). Still more are cases where uncertainty persists and democracy was short-lived (Panama, 1948-1951; Argentina, 1962-1982). There are also cases in which democracy was widely undisputed but there were signs that authoritarian behaviors had persisted: Venezuela (1974 and 1998-2005); Bolivia (2000 and 2007); and Uruguay (1972). Rule-by-decree, legislative purges, martial law, and suspension of liberties have been observed under these periods of democracy.
Some cases are difficult to code because group activism is unclear, while other cases are difficult to code because the importance of leader-specific characteristics is vague.
Colombia As an Example of Content Validation
An interesting case that epitomizes the incongruity of coding decisions deals with institutional rules in Colombia under the National Front (1958-1974). My use of the Colombian example is to show a deviant case, one that differs from cross-country notions of democracy. General Gustavo Rojas Pinilla abdicated power in 1957, after which an interim junta assumed the role of governing (Kline & Gray, 2007; Wynia, 1990). During this time, Alberto Lleras Camargo and Laureano Gomez, leaders of the Liberal and Conservative parties, forged a pact against Rojas through the “Declaration of Sitges” in Spain. Both parties introduced the National Front in their efforts to end a protracted period of violent conflict between them referred to as La Violencia, the very crisis that prompted Rojas’s military solution (Kline & Gray, 2007; Wynia, 1990). Following the plan’s acceptance in two national referenda, the government alternated for 16 years between the Conservative and Liberal parties. Elections were held to place seats in the senate and lower house, which were shared equally by both parties. The party of the president was guaranteed, but approaching its term the presiding party had to present a list of nominees to compete among each other in the election. In this way, elites hoped to preserve a democratic system during a period of particular instability (Kline & Gray, 2007; Library of Congress, 1988; Schmidt, 1974; Wynia, 1990).
The National Front embodied a consociational democracy, a broad coalition of leaders representing much of Colombia (Dix, 1980). There is considerable debate, however, over whether this period can be classified as democratic. As Dix (1980) notes, inhibiting social change was an implicit intent of the oligarchy in both parties. Thus, although ANAPO politicians gained seats by aligning themselves with a traditional party, the party was sufficiently marginalized for Rojas to lead the formation of an insurrectionist group known as the Movimiento 19 de Abril (M-19; Nielson & Shugart, 1999; Schmidt, 1974). A contribution of the National Front was to devalue, though not completely eradicate, brokerage and clientilist politics (Schmidt, 1974). Despite the representation provided by intraparty competition, National Front politics was still perceived by many to be quite exclusive. There remained “widespread discontent about the practice of politics in the country and about the content of the policies that the political system produced” (Nielson & Shugart, 1999, p. 324).
Some scholars considered Colombia during the National Front period to be a diminished form of democracy. As the National Front pact as well as the electoral rules excluded third parties and limited competition between the two majority parties, the regime has been described as semicompetitive, restricted, or limited (Bejarano & Pizarro, 2001). According to Bejarano and Pizarro (2001), “[d]uring the National Front period, democracy’s limitations resulted from restrictions on political participation and political competition” (p. 1). The authors referred to Colombia in the 1960s as a “besieged democracy” because exogenous factors made it nearly impossible for democracy to function adequately. By Mainwaring et al. (2001)’s three-part classification of regime type, this period was only semidemocratic (Altman & Pérez-Liñán, 2002; Mainwaring, 1999; Mainwaring et al., 2001). Others considered the period representative of an “inclusionary authoritarian regime” (Bagley, 1984; Collier & Levitsky, 1997). Nevertheless, if Colombia was a democracy at this time, it was one characterized by restrictions on competition resulting from the 1957 institutional pact (Bejarano & Pizarro, 2001). There is sufficient scholarly debate to demonstrate that the issue is not as clear as the data suggest.
How does one distinguish between politics under the National Front, in which party representation was not a choice, and post-National Front Colombia, which in 1975 held “fully democratic” elections (Dix, 1980)? More importantly, how might the difference in representation affect the predictions made concerning the behavior of democracies? Content validity is a concern when the measure on which predictions are made does not do a good job of defining the concept. As such, scholars would do well to know the uncertainty that surrounds classification on the basis of discrete coding rules. If coding rules are not sufficient to perfectly distinguish among concepts (i.e., democracy from nondemocracy), then the presence of outliers is a risk—deviant cases may result from improper classification.
The content validity of a research design can also be undermined when one uses a measure that does not contain theoretically meaningful variation. Unaware that a particular indicator does not contain the feature on which the theory is based, he or she stands to make spurious claims on the basis of the missing component. It is therefore helpful to know the extent to which each of the data sets captures relevant political changes. There are several examples of missing political information. Unlike the other authors, the Geddes coding does not begin again after an initial period of democracy. The data are unlikely to indicate that democratic leaders have adopted authoritarian behaviors, as has been observed with Fujimori, Putin, and Chavez. Another example is where the coding rules by one or more authors appear to be relatively constant over a period of time, but where the data change regarding legislatures and judicial independence. Throughout the 1960s, Cheibub et al. (2010) coded Ecuador as a military regime and Geddes as a single-party military regime, although legislative changes occurred in Ecuador in 1960 and 1963. Over a 2-year period in Venezuela (2001-2002), Chavez pursued reforms that eliminated its second legislature and eradicated independent judicial review, though its regime type status did not change.
Brazil As an Example of Content Validation
One of the best examples of political change that does not correspond with a change in the data on regime type is Brazil, 1964-1989. Brazil represents a “typical” Latin American case, one which had a protracted period of military rule but during which experienced political fluctuations (Remmer, 1991). Throughout this period, each of the three sets of authors consistently coded Brazil as being a military dictatorship. After the 1964 coup, officers drafted a new constitution and instituted several acts in response to increasing resistance (Wiarda, 2007; Wynia, 1990). The military’s claim to power was founded on the threat of Communism in Brazil and the promise to restore democracy, albeit through the use of repression. The First Institutional Act in 1964 expanded executive powers to expedite the restoration of the country. The military also embarked on an aggressive campaign to purge the government of a broad swath of political actors, including members of the Catholic Church, politicians, labor organizations, academics, and political activists (Breneman, 1995; Wiarda, 2007; Wynia, 1990).
The Brazilian military continued to eliminate leftists and institutionalize control throughout the 1960s. Direct elections were maintained, but political activism was severely restricted. Still, opposition parties continued to hold ground in local elections, which prompted threats by military hardliners. In response, General Castello Branco issued the Second Institutional Act abolishing political parties and suspending the direct elections of governors. The Fifth Institutional Act effectively suspended all other political activities and censored remaining opposition. By 1968, Brazil had reached a high point of repression and censorship (Breneman, 1995; Wiarda, 2007; Wynia, 1990).
General Geisel began to reverse the military’s stronghold when he took office as president in 1974. Geisel sought to reinstate democracy in a slow and orderly fashion. His decompression plan involved controlling hardliners in the military and maintaining growth. Following Geisel’s efforts, President and former General Figueiredo released political prisoners and ended party restrictions created by the Second Institutional Act. The military installed civilian Jose Sarney as president in 1985, and in 1989 Fernando Collor de Mello took office via the first direct elections in decades (Breneman, 1995; Wiarda, 2007; Wynia, 1990).
Over the span of military rule in Brazil (1964-1984), Cheibub et al. (2010), Hadenius and Teorell, and Wright were in agreement on its status as a regime type. No change was observed across the data, though other scholars have described significant impacts during this period on legislative institutions, the constitution, civil rights, political parties, and electoral competition. Breneman (1995) argued that the consolidation of military power occurred in distinct phases, the apex of which was reached with the Fifth Institutional Act. In contrast to the tyrannical dicta of earlier acts, the Fifth Institutional Act simultaneously dissolved Congress and state legislatures, suspended the constitution, and imposed censorship. It is thus surprising that the data—whether measured by elections, constitutions, or institutional support—do not reflect changes as momentous as those imposed in 1968. In contrast, the Polity IV score does reflect the Geisel-led decompression plan in 1974, as well as variation for the years 1954, 1958, 1983, 1991, and 2003.
Institutional changes that the discrete data sets do not capture—changes which I argue were quite significant in Brazil’s history—are present in the other cases in my sample. The type of institutional variation that is represented also varies by data set. Depending on the research question, scholars must consider how the coding rules of these data account for the feature and variation that he or she hopes to explain. The Colombian case illustrates a threat to content validity arising from imprecise rules. In contrast, the data on Brazil suggests that the improper selection of data—which varies by research question—can weaken the content validity of a research design by ignoring the theoretically relevant concept altogether.
Qualitative Analysis
To understand the empirical implications of the potential threats to test validity when using discrete data on regime type, I replicate and extend Fjelde’s (2010) recent analysis of civil conflict onset. Fjelde argues that authoritarian leaders can coerce or co-opt rivals but are differently able to rely on these strategies to stay in power. Party-based autocracies are more capable of utilizing both co-optative and coercive strategies, she claims, while military and monarchial regimes rely on a narrower set of responses to opposition challenges. Using Hadenius and Teorell’s (2007) data, Fjelde compares the propensities for military, monarchy, single-party, and multiparty regimes for civil war onset over the years 1973-2004. She finds that military and multiparty autocracies are more likely to experience the onset of civil war, compared with single-party regimes and democracies. Substituting Wright’s (2008) data for the Hadenius and Teorell data shows similar results. By testing her predictions using both data sets, the author shows the findings—which largely confirm her hypotheses—to be robust.
I approximate Fjelde’s (2010) model. The dependent variable is armed conflict onset, which I define as a government-rebel conflict with 25 or more battle-deaths appearing after 3 years of nonviolence. The data on conflict onset come from the Uppsala Conflict Data Project/PRIO Armed Conflict Dataset (ACD; UCDP/PRIO 2009). Like Fjelde, my main independent variables are regime type, with single-party regimes as the reference category. Alongside the Hadenius and Teorell (2007) and Wright (2008) data sets, I also include the Cheibub et al. (2010) data set for comparison. To the Wright regime type dummies, I add a proxy for democracies by including the Polity score for missing observations for which the Polity IV score is greater than six (Marshall & Jaggers, 2008). My control variables are the same as Fjelde’s. Data on the real GDP per capita and population (logged) come from the Penn World Tables (Heston, Summers, & Aten, 2009). I add Fearon and Laitin’s (2003) measure of ethnic fractionalization. Like Fjelde, I include the number of years since last regime change, from the Polity IV project, to measure regime durability. I controlled for prior conflict by lagging the dependent variable 1 year. I used the binary time-series cross-sectional (BTSCS) approach developed by Beck, Katz, and Tucker (1998), reporting robust standard errors, for the years 1946-2008.
The results are shown in Table 2, which I briefly describe. Model 1 uses the Hadenius and Teorell (2007) data. The log-odds coefficients indicate that relative to single-party regimes, multiparty autocracies and atypical regimes (other) are more likely to experience armed conflict onset after 3 years of peace, followed by democracies. These are significant below a probability of error less than 0.10. Military regimes and monarchies are not statistically distinguishable from single-party regimes. I obtain similar results using the Wright (2008) data. Personalist and hybrid regimes are more likely than party-based autocracies to experience armed conflict onset, but military regimes are not differentiable from party-based regimes. None of the Cheibub et al. (2010) regime types are differentiable from civilian regimes.
Replication of Fjelde (2010): Logit Analysis of Armed Conflict Onset, 1946-2010.
Standard errors are in parentheses.
Proxy for democracy—not part of Wright’s (2008) original coding.
p < .10. **p < .05.
For each of the potential issues illustrated by the cases of Nicaragua, Colombia, and Brazil, I explore their impacts on the general model. The problem illustrated by Nicaragua is that the different regime type data sets may generate different predictions concerning the likelihood of experiencing a civil war onset. To demonstrate this, I generated linear predictions from each of the three models, substituting only the regime type variables. They are not highly correlated—predictions from the model using Hadenius and Teorell data correlate with those from the model using Cheibub et al. (2010) at 0.940, but with the Wright model at 0.595. The predictions from the model using Cheibub et al. correlate with predictions from the model using Wright data at 0.737. Figure 1 is a three-dimensional plot showing that predictions are more consistent for observations that are less likely to experience armed conflict; at higher levels of conflict likelihood the models make very different predictions. This divergence stems from the conditional impact of the regime type data. One would therefore come to different conclusions about which particular observations are likely to experience conflict, depending on regime type. This in part explains why the coefficient sizes and significance differ by model.

Scatterplot of predicted values from models in Table 1.
The Colombian case underscores a threat to content validity arising from imprecise rules. If the coding rules of a particular data set do not perfectly classify the population of cases, it follows that one should be wary of the potential effects of outliers. To demonstrate this, I generated the Delta-D influence statistic for each observation, for each data set. The Hosmer and Lemeshow (2000) goodness-of-fit test (Delta-D) compares the predicted value of an observation to its observed value. I reran each of the models to the exclusion of the 5, 10, and 15 most influential observations. The differences are striking. Omitting the 5 and 10 most influential observations from the model using Hadenius and Teorell data improves the strength of the already-significant coefficients. When the 15 most influential observations are omitted, military regimes become statistically distinguishable from single-party authoritarian regimes. What is more, omitting the top 5 influential observations from the model using the Wright data shows military regimes to be statistically distinguishable from party-based regimes as well. Dropping the 5 to 15 most influential observations does not obtain significance for the coefficients on the Cheibub et al. (2010) regime types, however. The 15 most influential observations for each model are listed in the appendix (Table A2).
Finally, the narrative of Brazil suggests that the improper selection of data can miss important variation, depending on the theory. Fjelde’s (2010) use of regime type, for example, was predicated on the dictator’s ability to use institutions to co-opt or coerce opponents. As Gandhi and Przeworski (2007) note, “partisan legislatures incorporate potential opposition forces, investing them with a stake in the ruler’s survival” (p. 1280). The neutralizing effect of institutions such as legislatures is therefore central to a theory of coercion and cooptation, but this is largely ignored in the construction of regime type data. 2 Studies have nevertheless asserted that legislatures matter in authoritarian regimes, and not just in party-based regimes (Wright 2008). I added to each model an ordinal variable from Cheibub et al. (2010) indicating whether the legislature was closed, appointed, or elected. Contrary to what might be expected of the distribution of institutions across regime type, legislatures are not more likely in single-party regimes. Roughly 93% of civilian regimes in the Cheibub et al. data have legislatures, compared with military regimes (64%) and monarchies (76%). Of the Wright regime types, half of all military regime years include a legislature; roughly 75% of personalist and monarchial regimes have a legislature as well. Legislatures are also common across the Hadenius and Teorell regime types—97% of one-party regimes, 99% of multiparty regimes, and 80% of monarchies. The correlation between legislatures and regime type is thus not as clear as one might expect.
The results are reported in Table 3. When legislatures are accounted for, military regimes are strongly distinguishable from single-party regimes, below a 5% probability of error. The other regime type variables also become more sharply distinguishable from single-party regimes. This is true for the both the models based on Hadenius and Teorell and Wright. Accounting for regimes which have a legislature thus helps to sharpen the distinction between regime type. The implication is that cooptation is “[a] combination of a legislature that absorbs the political energies of groups that otherwise might attempt to overthrow the dictator, coupled with a single party or a front that extends the reach of the regime into the society” (Gandhi & Przeworski, 2006, p. 15). Knowing how the regime type data were constructed helps to mitigate erroneous conclusions about cooptation theory.
Models From Table 1, Controlling for Legislatures.
Standard errors are in parentheses.
Proxy for democracy—not part of Wright’s (2008) original coding.
p < .10. **p < .05.
Discussion
Given the discrepancies presented herein regarding the choice of discrete data on authoritarian regime type, I do not offer a new data set. The shortcomings of existing data sets are more in authors’ use them. There are, however, some practical answers that provide a conceptual roadmap for better using discrete data. First, it is critical that scholars using these data know what they are working with. This warning hearkens back to Vreeland (2008), who demonstrates the problem of confounding concept with measurement. The second piece of advice is that one should avoid concept stretching by not simply using measures out of convenience. Rather, one can use the individual components by which regime types measures were created to more effectively test a particular theory. I demonstrated the benefit of directly comparing the impact of legislatures in authoritarian regimes for example, data for which are available online. 3 It is also worthwhile to draw on the strengths of more than one of the discrete data sets on authoritarian regime type to create a measure that is uniquely tailored to a question. There are several examples in which authors paid close attention to the differences in the data sets and used them to their advantage (Brownlee, 2009; Magaloni & Wallace, 2008; Weeks, 2012).
The kind of information that needs to be incorporated concerning leaders and institutions varies by researcher. For that reason, none of the three data sets examined herein can be considered superior to the others. To promote more unified theory through consistent empirical results, scholars need to refocus on proper data selection as a cornerstone of best practice research—It is a critical part of proper research design. Use of any data should hone in on the concept in question, but this study illustrates specific threats to validity that arise from misusing discrete data on authoritarian regime types. It is not enough to compare the results based on alternative data sources, but it is also not difficult to show a genuine concern for making valid inferences on the basis of regime type.
Conclusion
The field has increasingly suggested the importance of conceptual distinctions among nondemocracies. In support of that interest, multifold discrete data sets on authoritarian regime type are available, examples of which include Cheibub et al. (2010), Hadenius and Teorell (2007), and Geddes (2003). Because they are concerned with capturing similar attributes, an understandable response is to use these data sets interchangeably. As I have demonstrated, however, this is not recommended. My initial assessment of the data was qualitative, focusing on checking the data against political narratives in Latin America. The concerns illustrated by the cases of Nicaragua, Colombia, and Brazil can affect quantitative results, which I examined using Fjelde (2010). The results suggest that scholars should be aware of how comparable are data sets on regime type, how easily they classify observations and indicate uncertainty, and what variation they do and do not capture. The problem is not measurement error on the part of the coders so much as it is concept stretching by the researcher.
The study of authoritarianism would benefit from a more nuanced understanding of the theoretical underpinnings of the available data on regime type. Scholars must be aware of the particular background concept for which the data were created and then assess the extent to which the data are amenable to his or her own research question. The critical problem underscored by this analysis is the consequence for not properly specifying the meaning and components of the systematized concept to be measured. There are consequences for providing “just a one-sentence definition” of the testable element of his or her theory (Adcock & Collier, 2001). Validity is threatened by the improper use of discrete data sets on regime type. For this reason, it is imprudent to substitute discrete data sets to demonstrate a theory’s robustness. A better practice is to focus on proper data selection.
My critique of discrete data is not meant to invalidate their use. On the contrary, scholars’ interest in sets of institutional features should pave the way for others to find new ways to use them. Discrete data on regime type offer substantial benefits to comparativists, and it is a useful bridge between qualitative and quantitative political science. The ability to compare transitions and to create predictive models, however, is dependent on scholar’s confidence in the validity of such patterns (Abbott & DeViney, 1992; Scherer, 2001). Understanding discrepancies and limitations among discrete data sets on regime type serves not to undercut them, but to refocus scholars on proper research design.
Footnotes
Appendix
Fifteen Most Influential County-Year Observations, by Model (Table 1).
| Hadenius and Teorell (2007) | Wright (2008) | Cheibub, Gandhi, and Vreeland (2010) | ||||||
|---|---|---|---|---|---|---|---|---|
| Country | Year | Reg. type | Country | Year | Reg. type | Country | Year | Reg. type |
| Macedonia | 2005 | Democracy | Angola | 2002 | Party | Macedonia | 2005 | Democracy |
| Spain | 1991 | Democracy | Eritrea | 2003 | Hybrid | Spain | 1991 | Democracy |
| Mali | 1994 | Democracy | Tunisia | 1980 | Party | Eritrea | 2003 | Civilian |
| Eritrea | 2003 | Military | Liberia | 1989 | Personal | Mali | 1994 | Democracy |
| Guinea | 2005 | Multiparty | Malaysia | 1981 | Party | Guinea | 2005 | Military |
| Uganda | 1981 | Multiparty | Malaysia | 1963 | Party | United Kingdom | 2005 | Democracy |
| Swaziland | 2005 | Monarchy | Croatia | 1995 | Personal | Uganda | 1981 | Democracy |
| Sudan | 1976 | Military | South Africa | 1966 | Party | Swaziland | 2005 | Monarchy |
| Liberia | 1980 | Military | El Salvador | 1979 | Hybrid | United Kingdom | 1998 | Democracy |
| United Kingdom | 2005 | Democracy | Indonesia | 1965 | Personal | Indonesia | 1965 | Civilian |
| Tunisia | 1980 | Single party | Sudan | 1976 | Personal | Croatia | 1995 | Democracy |
| Cuba | 2005 | Single party | Liberia | 1980 | Personal | Jamaica | 2005 | Democracy |
| Jamaica | 2005 | Democracy | Guinea | 1970 | Party | Paraguay | 1954 | Military |
| Bahrain | 2005 | Monarchy | Laos | 1989 | Party | Liberia | 1989 | Military |
| Botswana | 2005 | Democracy | Sierra Leone | 1991 | Party | El Salvador | 1979 | Military |
Acknowledgements
I would like to thank Gretchen Casper, Joseph Wright, and Douglas Lemke for their comments and encouragement on the project, and Kristian Skrede Gleditsch and Cameron Theis for their feedback. A prior version of this article was presented at the European Consortium for Political Research Joint Sessions of Workshops, St. Gallen, Switzerland, April 2011.
Author Note
The form or content of this material has not been published elsewhere, in any likeness.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Support provided by NSF Graduate Research Fellowship, Grant DGE-0750756.
