Abstract
A rich theoretical literature argues that, in contradiction to Duverger’s law, the plurality voting rule can fail to produce two-party system when voters do not share their common information about the electoral situation. We present an empirical operationalization and a series of tests of this informational hypothesis in the case of India using constituency- and individual-level data. In highly illiterate constituencies where access to information and information sharing among voters is low, voters often fail to coordinate on the two most viable parties. In highly literate constituencies, voters are far more successful at avoiding vote-wasting—in line with the informational hypothesis. At a microlevel, these aggregate-level patterns are driven by the interaction of individual information and the informational context: In dense informational environments, even low-information voters can successfully identify viable parties and vote for them, but in sparse informational environments, individual access to information is essential for successful strategic voting.
Had I voted for the flower while others in my village voted for the hand, my vote would have been wasted. All the three candidates have a good reputation . . . I am not sure who to support.
Why do electoral institutions sometimes fail to produce party systems as expected? According to Duverger’s law, in single-member districts with plurality rule, only two serious parties should be competing for the seat (Duverger, 1954; Riker, 1982). Although the law often does hold, deviations are not uncommon (Diwakar, 2007; Gaines, 1999, 2009; Grofman, Bowler, & Blais, 2009). To explain such inconsistencies, the empirical literature has focused on various structural factors that make it difficult for the law to operate as expected: ethnic diversity, short history of elections, weakly institutionalized parties, or electoral volatility (Clark & Golder, 2006; Crisp, Olivella, & Potter, 2012; Moser, 1999; Ordeshook & Shvetsova, 1994; Selb, 2012; Singer, 2013; Tavits & Annus, 2006).
The theoretical literature, however, usually explains the failures of Duverger’s law by focusing on its informational foundations. Many scholars have argued that electoral coordination may fail even in the most favorable structural conditions if voters are not well informed about politics and do not have shared information about the electoral situation (Clough, 2007a; Cox, 1994; Fey, 1997; Myatt, 2007; Myerson & Weber, 1993; Palfrey, 1989). Thus, even if the society is homogeneous and the party system is institutionalized—Duverger’s law might still fail if voters do not share common expectations about each other’s voting intentions.
Although this informational hypothesis is the backbone of the theoretical literature on Duverger’s law, we have little evidence of its empirical support. In this article, we operationalize and thoroughly test the informational hypothesis using constituency-level and individual-level data from India—a country with highly diverse levels of party system fragmentation and also highly variable informational environments, which makes this case especially suitable for testing the informational hypothesis. We argue that an appropriately operationalized empirical test of the informational hypothesis should distinguish between two types of information—the individual information and the common information. Drawing on ethnographic literature and new quantitative evidence, we show that literate Indian voters are more likely to be informed about politics and more likely to share information among themselves and with less literate voters. This suggests that high-literacy environments are more likely to be high common information environments.
Building on this link between literacy and information, we draw two novel testable implications of the informational argument. The first new hypothesis is formulated at the aggregate level, and states that literate constituencies have less fragmented party systems. To test this hypothesis, we use highly granular electoral data from six elections to the Indian Lok Sabha and demographic data from two Indian censuses. Using various measurement strategies and statistical designs (cross-sectional, fixed effects, differences in differences, and instrumental variables [IVs]), we document strong evidence that Duverger’s law is more likely to fail in illiterate constituencies than in literate ones. We consider various alternative mechanisms to explain this finding, and conclude that the data are mostly consistent with the informational mechanism.
We are not the first ones to invoke an informational argument in an empirical study of electoral coordination. Scholars have used the historical electoral volatility, entry of new political parties, and the length of electoral history as alternative indirect measures of information (Crisp et al., 2012; Selb, 2012; Tavits & Annus, 2006). We complement this empirical literature by providing an alternative, theoretically motivated and empirically grounded measure of information, which can be used to explain both temporal and cross-sectional variations in electoral coordination and can be used (depending on the level of aggregation) to measure individual as well as common information. This approach is especially productive in the study of politics in the developing world, where literacy varies significantly across time and space.
To understand the microlevel mechanism behind the aggregate-level findings, we also formulate an individual-level hypothesis about the role of information in electoral coordination. We argue that an appropriately specified individual-level test of the information should account for interdependence of individual information and the informational context. In communities with many well-informed voters, the poorly informed ones can obtain information indirectly through social interactions with better informed ones. To account for this interdependence, we integrate individual- and aggregate-level data, and find that individual literacy has much less impact on the voters’ ability to vote strategically in high-literacy constituencies compared with low-literacy constituencies.
The existing individual-level studies of strategic voting assume that information and behavior are independent across individuals (Black, 1978; Catt, 1989; Choi, 2009; Franklin, Niemi, & Whitten, 1994; Niemi, Whitten, & Franklin, 1992). This assumption is inconsistent with the theoretical accounts of electoral coordination that strongly emphasize interdependence of individuals’ information and behavior (McKelvey & Ordeshook, 1985; Osborne & Rubinstein, 2003; Myatt, 2007). Our study suggests an interesting mechanism of interdependence of individual information and the informational context in strategic coordination. Previous studies have emphasized the importance of informational context for individual behavior in ethnic mobilization (Chandra, 2004) and turnout (Abrams, Iversen, & Soskice, 2010). We extend this approach to study electoral coordination—this idea follows naturally from the theoretical literature, but it has not, to our knowledge, been adopted in the empirical one.
Understanding when Duverger’s law fails is important beyond electoral politics because of the link between political fragmentation and democratic accountability (Lijphart, 1994; Powell & Vanberg, 2000). Scholars of India have pointed out that its highly fragmented political system leads to legislative stalemates and weak governance (Kapur & Mehta, 1998, 2007). However, it is important to recognize that Duverger’s law in India does not fail universally, and that there is vast variation in how electoral politics plays out at the level of electoral constituency.
The existing accounts of party system fragmentation in India mostly focus on very broad patterns, and do not explain well the very local variation of party systems from constituency to constituency. For instance, a lot of literature focuses on the decline of the dominant Congress Party, the subsequent opening of the political opportunity structure, or increasing decentralization to account for the rise of several new parties (Brass, 1980; Chhibber & Kollman, 2004; Diwakar, 2010; Sridharan, 2002; Wallace, 1980; Wyatt, 2010). Empirically, their foci have been on the national average of constituency-level party fragmentation (Chhibber & Kollman, 2004) and the state-level average fragmentation (Chhibber & Murali, 2006; Diwakar, 2010).
While these highly aggregated accounts have helped this rich body of literature to explain temporal shifts in the country’s party system, the cross-sectional constituency-level differences in party systems—including within-state variation—require a different explanation. We offer such an explanation by drawing attention to the differences in information contexts that characterize the country’s electoral constituencies. Furthermore, by discussing how Indian voters acquire and share information that aids their voting decisions, we refine the current understanding of Indian politics by highlighting the role voters—not merely political elites—play in shaping India’s party systems.
Information and Electoral Coordination
Duverger’s law draws on the idea that, in elections in single-member districts with simple plurality rule, a vote given to a candidate with no chance of winning is a wasted vote (Duverger, 1954; Riker, 1982). This logic of vote wasting relies on several key behavioral conditions: Voters are myopic, they approach voting strictly instrumentally, and they are sufficiently informed about which candidates are viable and which are hopeless (Cox, 1997). The last condition—to which we refer as the “informational hypothesis”—has been recurring most frequently in the theoretical debates on Duverger’s law (Clough, 2007a; Cox, 1994; Fey, 1997; Myatt, 2007; Myerson & Weber, 1993; Osborne & Rubinstein, 2003; Palfrey, 1989). 3
To be able to cast a strategic vote, a voter has to solve two types of informational problems: the individual information problem and the common information problem. The individual information relates to the voter’s ability to identify candidates with high expected support, so that he could choose among the two most viable ones. As Cox (1994) puts it, “If voters have no information regarding candidate chances (and diffuse priors), then . . . one does not expect objectively trailing candidates (those who have fewer voters ranking them first) to lose their instrumental support” (p. 613). When voters have a poor notion as to which candidates are likely to win and which ones are hopeless, Durverger’s law is bound to fail.
However, for electoral coordination to succeed, voters’ subjective beliefs not only have to be precise but they should also be sufficiently shared between them (McKelvey & Ordeshook, 1985; Myatt, 2007; Osborne & Rubinstein, 2003). An individual voter can cast a strategically sound vote only if he correctly anticipates the behavior of other voters in his constituency. When the degree of common information between voters is low, individual expectations of different voters can diverge—each voter believes that he is voting for a viable candidate, but the electoral coordination fails in aggregate. Thus, successful electoral coordination requires not only that voters have access to information, but also that they share that information through social interactions and political networks (Myatt, 2007; Osborne & Rubinstein, 2003).
The reasoning of the Indian voter from the Rewa district, quoted at the start of the article, provides a good example of the common information problem: The voter knew that he would have wasted his vote by voting for the “flower,” but only conditionally on him believing that other voters in his district would vote for the “hand.” If the voter had made a wrong conjecture that others were going to vote for the “flower,” he would have likely ended up wasting his vote by voting for the “flower.” When voters lack common information, such erroneous conjectures about the actions of others may become quite common and lead to the failure of Duverger’s law.
Thus, when evaluating the empirical validity of the informational hypothesis it is important to recognize its two features, largely overlooked in the previous literature: First, electoral coordination is a phenomenon that operates at the level of electorate not merely at the level of an individual voter. Second, as voters have incentives to coordinate their behaviors and as their ability to do so depends on the degree of common information, the information and behaviors of individual voters are interdependent. Strategic voting requires voters to anticipate decisions of others. Moreover, the desire to coordinate behavior at the mass level creates informational spillovers between voters leading to interdependence of the individual and the aggregate-level information.
The existing empirical literature has overlooked these two features of the informational argument. Typically, scholars ask how information (measured via literacy or education) affects ability of individual voters to identify and vote for viable parties (Alvarez, Boehmke, & Nagler, 2006; Black, 1978; Cain, 1978; Catt, 1989; Choi, 2009; Niemi et al., 1992). By focusing on individuals only, the literate fails to recognize that electoral coordination is an aggregate-level phenomenon, which should be studied as such. Knowing whether informed citizens vote strategically is not equivalent to knowing whether informed constituencies are more likely to coordinate on two parties. Even when focus is on the individual behavior, it is important to recognize the interdependence of individual beliefs and behaviors of voters; else one can potentially misspecify the microlevel mechanism underlying electoral coordination. With these points in mind, we now operationalize the informational hypothesis in the context of Indian politics and draw two testable hypotheses which are closer to theoretical logic of electoral coordination than the previous tests.
Operationalizing the Informational Hypothesis
To better understand voting behavior and test the informational hypothesis, we need to measure not only individual but also common information. The existing literature has suggested multiple factors that can serve as proxy measures of information: availability of opinion surveys (Cox, 1997; Fey, 1997) or history of previous elections (Cox, 1997; Crisp et al., 2012; Selb, 2012). While these measures are plausible, their application for the purposes of our study is quite limited. Public opinion surveys are rarely available to voters at the level of constituency where coordination takes place. This especially applies to India where broadcasting or publishing poll results are not allowed altogether during elections (Anand & Jenkins, 2004; McMillan, 2012). The history of election results (e.g., volatility of past elections) is also potentially problematic measure because volatility itself maybe endogenous to past information.
To operationalize the informational hypothesis, we follow the long tradition in comparative and American politics that uses literacy (or education) as an indirect measure of political information (Aidt, Golden, & Tiwari, 2014; Black, 1978; Choi, 2009; Delli Carpini, 1996; Highton, 2009; Jennings, 1996). Although illiteracy does not preclude individuals from gaining political information, it surely makes it more difficult to do so. If a constituency has a large proportion of voters for whom information is costly, it is reasonable to expect that such constituency is information poor in a sense that it contains many voters who are not able to solve the individual information problem.
More importantly, when measured at the aggregate level, literacy can also capture how well a given constituency can solve the common information problem. An old tradition in political science, dating back to at least Deutsch (1961), argues that literate citizens are not only better informed about politics but also more involved in politics and have wider social networks (Rosenstone & Hansen, 1993; Wolfinger & Rosenstone, 1980). 4 By virtue of being able to access more political information, literate citizens are likely to know better which candidates are locally viable. Furthermore, by virtue of being more involved in politics literate voters can transmit this information through social interactions and so the knowledge more common.
Multiple ethnographic studies support the idea that literate Indians share political information with their illiterate peers through various informal institutions, even across ethnic and caste lines. In Kerala, reading rooms and village libraries serve as centers where people assemble to listen to the reading of newspapers and to discuss daily political news (Jeffrey, 2000; Nair, 1998). Nair (1998) notes how historically the libraries and reading rooms “became centres that villagers could approach without any psychological barriers” (p. 176). In Chhattisgarh state, baithaks (group sit-ins) are held where the literate read out and analyze news for those who cannot read (Ninan, 2007). Similar evidence of information flow from literates to illiterates has been documented in urban metropolis such as New Delhi (Peterson, 2010). Educated members of different ethnic groups oftentimes form ethnic associations (sabhas, samajs, and sanghams) to disseminate political knowledge among their illiterate coethnics (Mukherjee, 1994; Rao, 1968; Rudolph & Rudolph, 1960). Remarkably, even in regions with strong caste prejudices, such as in Tamil Nadu, newspaper reading “provides a major point of intercaste, as well as inter-village discussion and argument” (Cody, 2011, p. 290).
Survey evidence is also highly supportive of the claim that literacy is a good measure of common information. Using the 2004 Indian National Election Study (INES), we evaluated whether literate Indians are more likely to share political information through participation in various political activities through which information can be effectively shared. We considered three types of activities where information exchange and spillovers are likely to occur—participation in political rallies, political meetings, and canvasing for parties during election campaigns. We then estimated three logistic regressions with individual’s literacy as an independent variable and a number of control variables (individual’s income, gender, religion, and caste).
Figure 1 shows the key results from these regressions that are relevant for our argument (regression tables are available in the online appendix). The displayed quantities represent the sample-averaged relative predicted probabilities of engagement for literate versus illiterate voters. For participation in canvasing, we see that this estimated relative probability is equal to 1.48. This means that, on average, literate voters are 48% more likely to canvass for political campaigns compared with illiterate ones. Similarly, literate voters are 26% more likely to participate in political rallies and 47% more likely to take part in political meetings. For all three outcomes, these effects are significant at 95% confidence level.

Associations between literacy and three measures of political engagement.
In sum, the ethnographic and quantitative evidence strongly suggest that individual literacy can measure individual information, and aggregate-level literacy can measure the degree of common information in the context of India at least. Given that we can formulate two testable research hypotheses. At the level of electoral constituency, we hypothesize the following:
The second hypothesis concerns the microlevel mechanism of how information of individual voters affects their ability to avoid vote wasting. As we argued earlier, to appropriately specify the individual-level test of the informational hypothesis we need to take into account the interdependence of individual information and the informational context. In low-literacy environments where information is not widely shared, an individual voter will have to obtain a lot of information to avoid vote wasting. However, in a constituency where many voters are well informed and information is widely shared among them, the informational spillovers can enable even voters with little access to political information to vote strategically. Thus, we formulate the following individual-level hypothesis in which individual and aggregate-level literacy are treated interactively:
Of course, literacy may capture not only the informational context but also multiple other factors that might be correlated with party systems—most notably, economic and social factors. Furthermore, the proliferation of mass media (as long as it is not too fragmented) should make it easier even for illiterate voters to acquire political information, and can serve as an alternative channel through which a greater degree of common knowledge can be acquired. Therefore, in our empirical analyses, we put considerable effort to account for these alternative pathways through which literacy could affect electoral coordination.
Aggregate Literacy and Party Systems
In this section, we test the hypothesis that the aggregate-level literacy rates are associated with lower levels of party system fragmentation (Hypothesis 1), using data from India’s six parliamentary Lok Sabha elections (1989, 1991, 1996, 1998, 1999, 2004). 5 Indian party systems have been extensively studied before (Chandra, 2004; Chhibber & Kollman, 2004; Diwakar, 2007; Nuna, 1989) but not directly with respect to the informational hypothesis and not with respect to the constituency-level variation.
Data
The electoral data come from the Constituency-Level Elections Archive (Kollman, Hicken, Caramani, & Backer, 2013). The demographic data are obtained from the 1991 and 2001 censuses of India. The geographic units at which census data are reported do not coincide with the parliamentary constituencies. To match the census and the electoral data, we used the Geographic Information System (GIS). The GIS-coded India census data were obtained from ML Infomap company and directly from the Registrar General & Census Commissioner’s office in New Delhi. To match the census and election data as precisely as possible, we used the smallest geographic unit at which census reports the variables we use—the census block (also referred to as Community Development block or subdistrict in rural and urban areas, respectively).
Most census blocks are contained fully within parliamentary constituencies—on average, there are about 11 census blocks per constituency. However, there are two exceptions to this rule. First, in large metropolitan areas, one census block can cover multiple constituencies. Since in such cases, election and census data cannot be credibly merged, we removed from the analysis constituencies that fall fully within a given census block. This affects only constituencies in highly concentrated urban areas, constituting about 5% of all constituencies (thus, the overall impact of this exclusion on our results is likely to be very minor). Second, when a census block covers the border of several constituencies, we classify the census block as belonging to a given parliamentary constituency if its geographic centroid is located inside that constituency. The constituency-level values of the demographic variables were then calculated by taking population-weighted averages across census blocks assigned to a given constituency.
The census data were then matched with the electoral data based on the temporal proximity of elections and censuses. The 1991 census data were matched with 1989, 1991, and 1996 elections, and the 2001 census data were matched with 1998, 1999, and 2004 elections. The consequence of this design is that the independent variables vary between the two election groups (1989-1996 and 1998-2004) but not between each individual election, which might result in artificially small standard errors (Greene, 2010). We address this issue in several ways: We cluster the standard errors at the level of electoral constituency, and we also show that the results remain to hold in a reduced dataset where the dependent variables are averaged within the two groups of elections or if we conduct the analyses separately for each election.
The dependent variable is the effective number of electoral parties (ENEP), defined as
The key independent variable Literacy is defined as the constituency-level proportion of literate population. Literacy rates are generally correlated with ethnic and religious diversity, and the latter are known to be related to political fragmentation (Clark & Golder, 2006; Ordeshook & Shvetsova, 1994). Measuring ethnic and religious diversity in India however is notoriously hard due to multiple overlapping identities (Chandra & Wilkinson, 2008; Manor, 1996). We include measures for the proportion of Scheduled castes and Scheduled tribes.
7
For the years where data are available, we also control for Religious fragmentation defined as
We also include a set of variables measuring various aspects of socioeconomic development often correlated with literacy. Urbanization refers to the proportion of persons living in urban areas. 8 TV/Radio ownership stands for the share of households owing television sets or radios, and is included to separate the effects of literacy from the media effect. Finally, to control for the degree of socioeconomic development, we include a variable Banking access defined as the share of households availing banking services. Controlling for socioeconomic development and urbanization allows us to separate the impact of literacy from the more general “modernization effect.” As an alternative, we used the proportion of households with permanent housing and the proportion of workers in manufacturing sector as alternative measures of socio-economic development without substantial effect on the main results. The summary statistics of the key variables are presented in Table 1.
Descriptive Statistics.
ENEP variable is obtained from Kollman, Hicken, Caramani, and Backer (2013), and the remaining variables are obtained from 1991 and 2001 censuses of India. ENEP = effective number of electoral parties.
Specification and Results
The spatial spread of the two main variables is shown in Figure 2. The figures indicate substantial variation in literacy and party system fragmentation across India. As we observe literacy and ENEP at a few points in time (literacy at two points and ENEP at six points), we are able to exploit not only the between-constituency but also within-constituency variation in literacy rates. The figures also suggest that the two variables are regionally clustered. This is because India is a federal state and electoral constituencies within the same state exhibit similarities due to shared state-level factors. Parties build state-level alliances and seat-sharing agreements (Pai, 1996), and party competition at the state level influences public spending policies, including spending on education (Chhibber & Nooruddin, 2004; Saez & Sinha, 2010; Thachil & Teitelbaum, 2015). We address this by controlling for state-level fixed effects. To adjust for secular trends in literacy and party system fragmentation, we also add cubic time polynomials, as suggested in Carter and Signorino (2010). 9

Literacy and party system fragmentation across India.
Table 2 shows estimates from five ordinary-least-squares regressions. 10 Column 1 provides the baseline model that includes only the variables available for all six elections as well as state-level fixed effects and time trends. In this baseline specification, the coefficient for Literacy is negative and statistically significant. The effect of literacy on party system fragmentation is quite substantive: 1 standard deviation increase in literacy rates (about 14%) is associated with about 0.11-point reduction in the ENEP measure, which constitutes about 15% of its standard deviation. Given that the coefficient for Literacy is much larger in magnitude than the other covariates, we can say that literacy is an important factor for the fragmentation of local party systems in India.
Regression Results for ENEP.
Standard errors (in parentheses) are clustered by constituency. ENEP = effective number of electoral parties. The significance for dagger symbol is 0.1.
p < .05. **p < .01. ***p < .001.
In column 2, we add the lagged value of ENEP to adjust for levels of political fragmentation in previous elections. This serves two purposes: Methodologically, it addresses the issue of temporal correlation in local party systems, and substantively, it addresses the proposition made in the previous literature that voters use results of previous elections to identify focal parties and coordinate their votes on them (Clough, 2007a; Cox, 1997; Crisp et al., 2012; Forsythe, Myerson, Rietz, & Weber, 1993; Selb, 2012). The estimated coefficient for Literacy remains significant even after adjusting for the lagged ENEP values, though it is smaller in magnitude. Note, however, that the estimate in column 2 represents the marginal effect of literacy conditional on the lagged ENEP values. The more relevant quantity of interest is the marginal effect of literacy on the unconditional expectation of ENEP. To calculate this effect, we have to divide the coefficient for Literacy by
The magnitude and statistical significance of the coefficient for lagged ENEP indicate that voters in India are relying on previous election results to identify viable parties, consistent with the existing observational and experimental evidence from other places (Cox, 1997; Crisp et al., 2012; Forsythe et al., 1993; Selb, 2012). Substantively, in a constituency with two effective parties in the previous election, one expects to see about 2.6 effective parties. In comparison, the predicted ENEP in a constituency with four effective parties in the previous election is expected to be 3.14. Thus, from the dynamic perspective, the Indian party system is not converging toward a two-party equilibrium. This finding is in line with the previous work on India (Diwakar, 2007) but in contrast to the previous findings in other contexts (Selb, 2012; Tavits & Annus, 2006). 12 Most likely, this is because the period of our study begins after India has already undergone multiple election cycles under first-past-the-post voting rule, and so it is likely that the learning effect took place prior to the period of our study. Furthermore, any learning effect created by time could be dampened or even undone by temporal changes in other factors—for example, the rise of more fragmented media environment in India (Roy, 2011). 13
In column 3, we add three control variables—Religious fragmentation, TV/Radio ownership, and Banking access. These three variables are available only from 2001 census, which results in a much smaller sample in column 3. Of these three control variables, only TV/Radio ownership is significantly associated with party system fragmentation—constituencies with larger share of TV and radio owners have less fragmented party systems. The size and statistical significance of the coefficients on TV/Radio ownership suggest that mass media is an important alternative channel for political information. However, as the coefficient for Literacy in column 3 remains very similar to that in Columns 1 and 2, we can infer that literacy provides a route for information access, which is quite different from that provided by the mass media. This is not too surprising, because other studies have found that, even with the increased presence of mass media, voters in India still rely on interpersonal networks to inform their vote (Karan, 2009).
We also considered whether availability of broadcasting media could serve as a substitute or a complement for literacy in electoral coordination.
14
We did so by estimating the model in column 3 with an interaction term between literacy and TV/Radio ownership. The estimated interaction coefficient was equal to
It is important to note that the results in column 3 also suggest that the effect of literacy on party system fragmentation is unlikely to be driven by the “modernization effect”; that is, literate constituencies see more successful electoral coordination not because literacy simply captures one dimension of modernization. If that were the case, we should not observe such strong correlation between literacy and ENEP after adjusting for urbanization, TV/radio ownership, and especially economic modernization, as measured by access to banking.
In column 4, we add constituency-level fixed effects, and so we effectively exploit only within-constituency variation in the literacy rates. Finally, in column 5, we conduct a cross-section analysis, where we average election results and demographic variables by the constituency across all elections. In the cross-section specification, therefore, we exploit only between-constituency variation in the literacy rates. In Columns 4 and 5, the results are highly consistent with the previous ones. In sum, the relationship between literacy and party system fragmentation is substantially large and negative across a large set of regression models.
Robustness
We have conducted a number of additional analyses to insure that our results are robust. To further address the problem of “unequal frequencies,” we reestimated the model in column 1 on an “averaged dataset,” in which the dependent variables were averaged within two election groups (1989-1996 and 1998-2004). In addition, we conducted analyses breaking up the dataset for each individual election. We also reestimated our baseline model using fixed year effects instead of cubic time polynomials. Finally, we implemented additional analyses that control for state-specific time trends and using random constituency effects. The results of these analyses are reported in the online appendix, and they are consistent with the estimates reported here.
We have also tested Hypothesis 1 using two alternative designs: First, we considered a differenced regression model to see whether the temporal changes in literacy are associated with the temporal changes in the ENEP. To test this proposition, we first calculate the relative change in ENEP between 1991 elections (closest to 1991 census) and 1999 elections (closest to 2001 census):
We then used the same formula to calculate relative changes in literacy, urbanization, percentage of scheduled tribes and scheduled castes between 1991 and 2001 (the years of census). Using these differenced data, we then estimated regression model with ∆
Second, in the online appendix, we use the IV analysis to provide additional evidence that the effect of literacy on party system fragmentation is not spurious. We use the geographic density of schooling at the start of the 20th century as an instrument for current literacy rates. In the IV setting, we find that literacy has a negative effect on party system fragmentation, and the magnitude of that effect is generally larger than the ordinary-least-squares (OLS) estimates reported here.
Alternative Explanations
Although the relationship between high-literacy and low-party system fragmentation appears to be strong and robust, it is important to consider alternative explanations of this relationship. The first set of alternative explanations we consider relate to the challenges of measuring party system fragmentation.
The negative coefficient for literacy in a regression where the dependent variable is ENEP might not mean that voters are better at coordinating in literate environments. This is because the size of ENEP depends on two factors: the distribution of votes across parties and the number of parties (
Relatedly, political scientists have long complained that low values of ENEP index conflate two qualitatively different scenarios—when a single party dominates the electoral field and when a few small parties dominate the field (Golosov, 2010; Molinar, 1991). Again, this has implications for the interpretation of our findings: Successful electoral coordination means that voters concentrate their votes among exactly two leading parties. If, however, the empirical results are instead driven by increased one-party dominance, we cannot conclude that they are actually supportive of the informational hypothesis.
To address these problems, we investigated the robustness of our results to three alternative measures of party system fragmentation: (a) second-to-first ratio (SF) measuring the ratio of vote-shares of the second-ranked loser to the first-ranked loser (Cox, 1997), (b) the share of wasted votes (Singer, 2013), and (c) the Molinar’s index of party system size (1991). As we report in the online appendix, after adjusting for the baseline covariates as well as state-level fixed effects and time trends, literacy is negatively associated with all three alternative measures of party system fragmentation (all these effects are significant at 95% confidence levels).
While this robustness to alternative measures is reassuring, it does not fully rule out the two alternative explanations suggested earlier. Even though the three alternative measures differ from ENEP, they are highly correlated with it and suffer similar shortcomings. For example, the percentage of wasted votes (which has .95 point correlation with ENEP) may be small when voters actually coordinate on two parties, but it can also be small (0, in fact) when only two parties are running. Similarly, SF ratio (which has .76 point correlation with ENEP) might be low when there is a really strong two-party system, but it may also be low when there is high dominance of a single party (it may also be low when there are few parties running). To make a more convincing case that high literacy drives party systems toward more Duvergerian equilibria, we have to show that, in high-literacy constituencies, voters tend to concentrate their votes on two leading parties, accounting for the possibility that the number of parties (and not only the distribution of votes among them) may also depend on literacy.
We do so using the compositional methodology of party system analysis (Rozenas, 2012b). The principal feature of this methodology is that it does not rely on any index of party system size. Instead, the compositional method simultaneously predicts the number of parties and the distribution of votes across those parties as a function of variables of interest. In our case, by studying how the predicted distributions of party vote shares change as a function of literacy and the covariates, we can evaluate whether the empirical patterns are consistent with the theoretical predictions without relying on any index of fragmentation. Most importantly, our goal with this compositional analysis was to establish whether greater literacy benefits exactly two leading parties. 15
In the compositional analysis, we control for the same covariates as in the baseline model—urbanization, and share of scheduled castes and scheduled tribes. Instead of reporting estimated parameters of the compositional model (which are difficult to interpret directly), we present its key substantive result relevant for our study in Figure 3. The figure shows the expected changes in the vote shares of parties, ranked from the largest to the smallest as a result of increasing literacy from 20% to 70%, while holding the covariates at their sample mean values. In other words, the figure shows how the vote shares of the first largest, second largest, other parties change as we increase literacy rates by 50%, while holding other factors constant.

Change in the predicted vote shares as a result of going from 20% to 70% literacy, holding the covariates constant.
The pattern in the figure is quite clear: As literacy increases from 20% to 70%, the two largest parties gain substantially more votes—each about 10 percentage points. In contrast, all smaller parties lose votes when literacy increases (the 95% credible intervals do not cross the zero line, indicating that these effects are unlikely due to chance). Increasing literacy not simply makes party systems less fragmented in a broad sense, but in a very specific way—through increased concentration of votes among the two largest parties at the expense of the smaller ones. That is, party systems do not become less fragmented because the largest party becomes more dominant in high-literacy places (if that were the case, we would see the positive effect only for the first-ranked party but not for the second-ranked party). Nor does this fragmentation fall only because fewer parties compete in elections in high-literacy constituencies. In sum, we can rule out the two alternative explanations that our results are driven by single-party dominance or by the number of parties competing in elections.
Another alternative interpretation of our findings is more India specific. India has more than one thousand political parties and many more independent candidates running in elections. This multitude of choice creates ample opportunities for voters to waste their votes—they have choices between regional, national, highly ideological and catch-all, new and established parties. With such large supply of options, it is possible that literate and illiterate voters have very different political preferences, and our earlier results simply pick up the differences in preferences, not the difference in information. For example, if literate voters are more likely to vote for, say, the national parties or well-institutionalized parties (which are few) and illiterate voters are more likely to vote for the regional or newly established parties (which are many), we would observe more concentrated distributions of votes in literate constituencies. Alternatively, easily recognizable electoral symbols—such as those of an established national party, the Congress—could convey cues about electoral viability that make voters in information-scarce settings support the national parties, without necessarily sharing information or electoral coordination. Finally, it could be the case that literate voters are more likely to support party-affiliated candidates who are typically more viable than the independent ones. 16
To evaluate the plausibility of these alternative explanations, we have reestimated our baseline regression model using the vote shares of three national parties separately—the Congress (established in 1885), the Bharatiya Janata Party (the BJP was founded in 1980), and the Communists (established in 1925), which are well known but vary in age—and the vote shares of all national parties combined and those of the independent candidates. 17 As the election commission in India assigns random symbols to independent candidates just prior to every round of election, their election symbols convey little information about electoral viability that established party symbols do. Independents could be especially obscure in information-scarce contexts where voters seek information cues from electoral symbols. National parties, however, are more recognizable and should hold advantage in electoral coordination.
The results of these analyses are shown in Table 3. We only show the coefficients for literacy, and do not display coefficients for the control variables. The results strongly indicate that differences in political preferences of literate and illiterate voters, or the age of political parties, or the information party symbols convey cannot explain our findings—the coefficient estimates for the Congress and BJP separately, for the national parties combined, and for the independents are essentially 0. This means two things: First, the two parties that secure the lion’s share of votes in literate constituencies are not necessarily the national parties, and that literate constituencies are as likely to support smaller regional parties as they are to vote for the national ones. Likewise, voters seem not to distinguish between older and newer parties, as the coefficient estimates on the Congress and the BJP reveal. Second, whatever information electoral symbols convey, they cannot explain the voting behavior we observe across India. To note, the electoral prospects of independents with no established symbols are no different from that of established political parties across informational contexts.
Regressions for Vote-Shares of Individual Parties and Party Blocks (Independent Candidates and National-Level Parties).
All estimations control for urbanization, share of scheduled castes, share of scheduled tribes, as well as state fixed effects and time trends. Each model includes only those constituencies in which the respective parties were on the ticket. Standard errors (in parentheses) are clustered by constituency. BJP = Bharatiya Janata Party.
p < .05. **p < .01. ***p < .001.
The only significant finding in Table 3 is that literate constituencies tend to vote more for the Communists. However, this by itself cannot account for greater party system concentration in literate constituencies, because no other national party is systematically more preferred by literate voters. 18 Furthermore, the electoral viability of Communists is geographically limited to the constituencies in the states of Kerala and West Bengal (about 10% of the total observations). This trend, as Huntington (1968) had noted, reflects a special affinity educated constituents in developing countries have for the Communist ideology, unrelated to the informational contexts we examine. To be specific, literate constituencies in India in fact display a diverse array of two-party contests, a small part of which involves the Communists. In vast portion of other two-party systems, the leading parties are other national parties such as the Congress and the BJP, a national party and a regional party (the Congress and the Dravida Munnetra Kazhakam [DMK]), or two regional parties—the DMK and the Anna Dravida Munnetra Kazhakam (ADMK), for instance.
The Role of the Informational Context
The analysis of the previous section considered the informational hypothesis only at the level of electoral constituency (Hypothesis 1). In this section, we consider the microlevel mechanism behind these aggregate-level findings, as specified in Hypothesis 2. As discussed earlier, the informational theories of electoral coordination do not simply imply that “information matters” but make a far more specific prediction. That is, electoral behavior of individual voters depends not only on their personal access to information but also on the informational context in which they make decisions.
The existing individual-level studies on the role of information in strategic voting have universally approached this question by looking for associations between individual-level voting decisions and individual-level education or literacy (Alvarez et al., 2006; Black, 1978; Catt, 1989; Choi, 2009; Niemi et al., 1992). Such approaches effectively assume an “atomistic” model, in which one individual’s information and behavior has no bearing on another individual’s information and behavior. We pursue a different path and consider the interactive effect between the individual-level and aggregate-level literacy. In more literate environments, the information can spill over from better informed to less informed voters, and is more likely to propagate across all types of voters (as we have shown empirically earlier). For these two reasons, in highly literate environments, even illiterate voters with low personal access to information can make strategically sound choices because of the informational spillovers from more literate voters.
Is this prediction supported in the case of India? To answer this question, we combine the aggregate-level census data with the individual-level survey data from India’s 2004 INES, which sampled 27,189 individuals across 420 electoral constituencies. To measure the individual’s literacy, we use the INES education measure and classify a respondent as literate if he has at least primary eduction. 19 For constituency-level literacy, we use the census-based literacy rate, as in the earlier analyses.
The outcome of interest in this analysis is a voter’s vote-wasting propensity. We define a vote as wasted if and only if it was for a candidate who ended up being not among the top two contenders. This definition of vote wasting cannot be applied universally. If there are only two parties competing in an election, no votes can be wasted; thus, we restrict our attention to constituencies with three or more parties or candidates running in election. Even when there are more than two parties running, vote-wasting incentives will differ strongly across constituencies. For example, if there are three candidates and the proportions of their votes are .31, .30, and .29, then it is not reasonable to treat all votes that went to the third candidate as wasted. However, if instead the proportions of votes are .41, .40, and .19, then it is more reasonable to say that the votes for the third candidate were wasted. Generally, the vote for the third parties can be qualified as genuinely wasted if the SF ratio is small (cf. Cox, 1994). To take this into account, we only use data from the constituencies where the SF ratio is below one half (in the online appendix, we show that the results are robust to alternative SF thresholds).
To estimate how the individual- and constituency-level information affects voters’ ability not to waste the vote, we build a flexible hierarchical logit model (see Gelman & Hill, 2007, Ch. 15), where we allow the effects of individual-level literacy to vary by constituency-level literacy. Formally, let
This regression model closely captures the theoretical story of how electoral coordination should work: We allow the choices of individual voters to be correlated within the constituency (through the random intercept
As a direct interpretation of estimates from the above model is extremely difficult, we study its results visually through simulations. Our quantity of interest is the predicted relative risk of vote wasting for literate versus illiterate individuals as a function of the literacy in their constituency. When this relative risk is greater than 1, an illiterate voter is more likely to waste vote than a literate voter. When this relative risk is smaller than 1, then an illiterate voter is less likely to waste vote than literate one. Finally, when this relative risk is equal to 1, the individual literacy does relate to the risk of vote wasting. According to Hypothesis 2, we expect this relative risk to be larger than 1 in low-literacy constituencies and approximately equal to 1 in high-literacy constituencies.
Figure 4 displays these relative risks in two ways: the gray curves show the estimated relative risks of vote wasting separately for each constituency, and the black solid line is the population-averaged relative risk of vote wasting. 21 Intuitively, we can think of the constituency-specific relative risk as the average relative risk of vote wasting for a randomly drawn individual from a given constituency. Similarly, we can think of the population-averaged relative risk as the average relative risk of vote wasting for a randomly drawn individual across the entire sample.

Constituency-specific (gray curves) and population-averaged (solid curve) relative risk of vote wasting with 95% confidence bounds (dashed lines), after adjusting for individual income, gender, caste, and religion.
We observe from Figure 4 that there is substantial variation between the constituency-specific relative risks (the gray lines), but they all follow a similar pattern—they all hover high above 1 when constituency-level literacy is low, and they all move downward as constituency-level literacy increases. Similar message is borne out by the population-averaged relative risks (displayed by the solid black line)—the relative risk of vote wasting is large at low levels of constituency literacy and is close to 1 otherwise. In substantive terms, after adjusting for individual-level covariates (income, gender, caste, and religion), when the constituency-level literacy rates are at their lowest point (around 12%), the illiterate voters are about 1.75 times more likely to waste voters compared with literate ones. However, as constituency-level literacy increases, the relative risk of vote-wasting approaches 1—illiterate voters are about as likely to waste votes as literate ones. When the constituency-level literacy is roughly above 38% (this is the case in 53% of constituencies), the vote-wasting propensities of literate and illiterate voters are statistically indistinguishable as the 95% confidence bounds cover one.
These findings have three important implications: First, they lend support for the idea that individual voting depends not only on one’s information but also on the informational environment (Hypothesis 2). Individual access to information matters more in places where the informational environment is sparse, and less in places where that environment is dense. In dense informational environments, even voters for whom political information is more costly can acquire relevant electoral information from their social interactions with more informed voters. Due to these informational spillovers, in places with dense literacy, the illiterate voters are able to identify the viable parties as much as their literate peers.
Second, these findings provide a partial explanation for why the previous literature on the individual-level relationship between education/literacy and strategic voting has yielded conflicting results (Alvarez et al., 2006; Black, 1978; Catt, 1989; Choi, 2009; Niemi et al., 1992). Seen from the vantage point of our findings in India, the conflicting findings in this literature do not appear surprising, because the impact of information of strategic voting is highly contextual—it may or may not exist, depending on the informational environment.
Third, this specific finding casts new insights into Indian politics: In information-rich constituencies, voters can overcome limitations imposed by their individual illiteracy, identify viable candidates, and save their votes from being wasted. For illustration, consider a voter in poor rural state such as Bihar: We estimate that her likelihood of wasting a vote is 44% in information-scarce environment (constituency with 20% literacy), but it falls to 25% in information-rich environment in the same state (constituency with 50% literacy). In fact, in the information-rich constituencies of Bihar, she is no more or less likely to waste her vote than her literate counterparts. This speaks to the largely unacknowledged influence of the informational contexts and education on voting behavior in India, which should complement the current focus of the scholarship on state-level party alliances, political opportunity structures, and decentralization.
Conclusion
The evidence from India suggests that the informational hypothesis, according to which electoral coordination can fail due to scarcity of common information, does have empirical support. At the aggregate level, we find that in places where effective sharing of political information is unlikely due to high-illiteracy rates, electoral coordination is likely to fail and lead to highly fragmented party systems. In literate constituencies, however, voters are able to coordinate their behaviors more effectively leading to less fragmented party systems, with most votes concentrated on two leading parties—precisely as Duverger’s law predicts.
At the level of an individual voter, we documented evidence that the effects of individual information on voting behavior depend on the informational context: Individual-level information is most useful for strategic voting when the informational environment is sparse; in dense informational environments, individual access to information becomes less important as even low-information voters can acquire electoral information due to informational spillovers from the better informed voters. This microlevel mechanism is not only consistent with the reasoning behind the informational models of electoral coordination but it also underscores the interdependence of beliefs and behaviors of individual voters, often overlooked in the literature.
The findings of this article also provide a new perspective on why the Duverger’s law seems to hold in India in some places but not others. The existing literature has emphasized a number of structural conditions that may potentially impede Duverger’s law—such as increasing decentralization that shifts the balance in favor of smaller regional parties and the blocked opportunities of new elites and groups within older parties leading to the creation of new political parties. Our analyses shift the focus to the role voters play in the creation of party systems in India. We show how the information at the aggregate and individual levels shapes voter behavior to either coordinate or differ in their electoral decisions, which ultimately affects the variation in party system that characterizes the country’s politics.
Our study focused on the single case of India, which is particularly attractive case precisely because the two quantities of interest—literacy and party system fragmentation—vary significantly in time and space. By exploiting this within-regional variation in literacy and party systems, we aimed for internal validity. However, one should be careful not to overinterpret these findings as far as external validity is concerned. Indian party system is known for several notable features—incumbency disadvantage (Uppal, 2009), electoral volatility (Nooruddin & Chhibber, 2008), shifting party alliances (Sadanandan, 2012), and immense diversity (Sridharan, 2002). Although these features of Indian party system are by far not unique, it remains an open question for future research as to whether our findings would travel outside India.
Finally, one of the most interesting venues for the future research in this area is to study more closely the role of political elites in electoral coordination. In cases where voters are expected to punish parties that do not appear electorally viable, politicians should strategically aim to manipulate the informational environment to benefit from electoral coordination. Myerson and Weber (1993) refer to this kind of informational cuing as “campaigning on viability,” as opposed to campaigning on a policy platform. Our research provides some interesting initial insights as to when such campaigning on viability is likely to occur. To the extent that cuing viability is most effective in places where information can easily propagate between voters, we would expect that such behaviors by politicians would most likely occur in places with high literacy. Studying the effectiveness of political elites in such focal manipulation of voters’ beliefs would be a valuable addition for understanding the informational foundations of Duverger’s law.
Footnotes
Acknowledgements
We are grateful to Rachel Brule, Kanchan Chandra, Patrick Egan, Miriam Golden, Francesca Refsum Jensenius, Herbert Kitschelt, Shanker Satyanath, and Matthew Singer for their comments, and to Himanshu Mistry for assistance with the Census of India data and Geographic Information System (GIS). Arturas Rozenas is grateful to Hoover Institution at Stanford University, where he was a 2016-2017 National Fellow.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
