Abstract
Researchers today have access to an unprecedented amount of geo-referenced, disaggregated data on political conflict. Because these new data sources use disparate event typologies and units of analysis, findings are rarely comparable across studies. As a result, we are unable to answer basic questions like ‘what does conflict A tell us about conflict B?’ This article introduces xSub – a ‘database of databases’ for disaggregated research on political conflict (www.x-sub.org). xSub reduces barriers to comparative subnational research, by empowering researchers to quickly construct custom, analysis-ready datasets. xSub currently features subnational data on conflict in 156 countries, from 21 sources, including large data collections and data from individual scholars. To facilitate comparisons across countries and sources, xSub organizes these data into consistent event categories, actors, spatial units (country, province, district, grid cell, electoral constituency), and time units (year, month, week, and day). This article introduces xSub and illustrates its potential, by investigating the impact of repression on dissent across thousands of subnational datasets.
Keywords
In the last two decades, social scientists have produced a tremendous amount of disaggregated data on political conflict and violence. 1 Large-scale collection projects (e.g. Schrodt, Davis & Weddle, 1994; Raleigh et al., 2010; Salehyan et al., 2012; Sundberg & Melander, 2013) and specialized studies of individual countries (e.g. Sullivan, Loyle & Davenport, 2012; Verpoorten, 2012; Osorio, 2015) have extracted georeferenced information on political events from press reports, social media, and state archives, manually or with automated techniques. These data have fueled new waves of subnational research, employing novel research designs at granular levels of analysis.
Disaggregation has advanced scholarship in numerous ways, but five interrelated problems have impeded progress: (1) most studies that use subnational data nevertheless conduct analysis at a highly aggregated, macro level; (2) most micro-level studies focus on one or few countries; (3) cross-dataset comparisons are rare; (4) operational definitions vary; and (5) there are no consistent units of analysis, which might otherwise enable direct comparisons. As a result, idiosyncratic and contradictory findings have become common in subnational research, leaving unanswered basic questions about the causes, dynamics, and consequences of conflict.
The barrier to generalizability is not a lack of data – in many cases these data exist and are in the public domain. Rather, it is that no one has undertaken the entrepreneurial effort to merge and combine disparate subnational conflict datasets into a unified, analysis-ready format, with consistent definitions, measures, and units. This is no small task: it involves geo-locating events, classifying them by type, assigning them to administrative and geographic units, aggregating over time, and repeating for each country, dataset, and variable. Without such infrastructure-building efforts, the field cannot move forward.
With these thoughts in mind, we present xSub, a ‘database of databases’ for cross-national research on subnational conflict. xSub currently features subnational datasets from 21 sources, covering 156 countries, organized into consistent categories and units, by space (country, province, district, grid cell, electoral constituency), time (year, month, week, day), actors (government, opposition, civilian, unaffiliated), and actions. 2 xSub also provides data on local demographics, geography, ethnicity, weather, and other covariates. These data are freely available through x-sub.org or the xSub R package.
xSub has applications across a wide range of substantive areas. Scholars may use it to examine relationships between climate variability and conflict, terrain and insurgency, ethnicity and communal fighting, elections and protests, spillover effects, conflict duration, recurrence, micro-dynamics of non-state activities, and – as we demonstrate below – the relationship between repression and dissent. Users may also contribute original data to the platform, making their work available to new researchers asking new questions.
Why xSub?
Traditionally, research on political conflict – including genocide, (counter-)revolution, (counter-)insurgency, (counter-)terrorism, protest (policing), (violations of) civil liberties and human rights – has maintained a cross-national focus, tracking macro-level variation between nation-states. This work has produced innumerable insights about why conflicts begin (Collier & Hoeffler, 2004), become increasingly lethal (Poe & Tate, 1994), employ particular tactics (Kalyvas & Balcells, 2010), respond to diverse factors of contention (Davenport, 1995), endure (Fearon, 2004), and reoccur (Walter, 2004) – in some countries but not others. It has been less informative about variation within conflicts, and why contentious events occur at particular places and times, within the nation-state.
To fill this gap, a growing movement of subnational research has disaggregated actors and actions to a more micro level. This movement has helped illuminate the local and short-term dynamics of conflict, advancing our understanding of when, where, and why unrest will likely emerge (Dube & Vargas, 2013), spread (Schutte & Weidmann, 2011), and vary in response to government action (Lyall, 2009).
While resolving some problems, subnational research has created others. To take stock of this rapidly growing literature, we surveyed the universe of topically related studies published between 2006 and 2017 in top disciplinary and general scientific journals.
3
We reviewed 392 articles, organized them by topic, geographic and temporal scope, unit of analysis, and methodology (Online appendix A1). Our survey revealed five common problems.
xSub addresses these problems directly, by pulling together hundreds of existing subnational datasets, and aggregating conflict events and covariates to consistent units of analysis across countries. As a public good, xSub significantly reduces barriers to comparative subnational research, empowering researchers to quickly construct custom, analysis-ready datasets and compare their findings across countries and sources.
Similar initiatives already exist for macro-level conflict research, like NewGene (Bennett, Poast & Stam, 2017). There is no counterpart at the subnational level, despite several important integration efforts, including PRIO-GRID, a spatio-temporal grid structure for data compilation and analysis (Tollefsen, Strand & Buhaug, 2012), GROWup, a data platform for ethnic settlement patterns and conflict (Girardin et al., 2015), and geomerge, an R package to construct spatial panel datasets (Linke & Donnay, 2017). xSub builds on these efforts by offering barrier-free access to preprocessed event data, from multiple sources, at multiple resolutions. By providing a user-friendly web interface and an R package with more advanced functionality, xSub serves a methodologically diverse user base, from undergraduates to senior researchers. While not eliminating the need to understand the limitations of individual data sources, xSub makes these data more accessible, removing key obstacles to the accumulation of knowledge.
What is xSub?
xSub’s online repository and accompanying R package currently feature 25,112 datasets on the location, dynamics, and intensity of conflict events, in 156 countries (1969–2017), from 21 data sources, with consistent categories and customizable spatio-temporal units. These datasets include popular covariates on weather, ethnicity, demographics, and geography. As such, xSub is well suited for single-country and cross-national analyses of conflict onset, (de)escalation, diffusion, termination, recurrence, and legacy.
xSub data are available as individual events and customizable spatial panel datasets, and are designed for integration with standard statistical software (e.g. Stata, R), and geographic information systems.
6
The platform is updated annually, with new data sources, bug fixes, Number of unique data sources per country
Data sources
xSub features event data on political conflict and violence from 21 sources, including widely used large-scale data collections and boutique datasets from scholars who volunteered to be early contributors to the platform. 9 xSub also welcomes uploads of users’ original data, with submission guidelines specified at x-sub.org/data-upload.
Figure 1 shows the geographic distribution of xSub data, with darker colors indicating more data sources per country. At the atomic level, the data are individual events, with information on locations, dates, actors, and tactics. We use these to construct local event counts, at various levels of analysis, as well as pooled, multiple-source datasets, integrated with the MELTT software package (Donnay et al., 2019). 10
Actors
xSub organizes events into initiator-target dyads, with four categories of actors: government (Side A), opposition (Side B), civilian (Side C), and unaffiliated (Side D). Side A: The government category includes state security forces, pro-government militias, activists, and third parties acting on the incumbent’s behalf (e.g. foreign troops, contractors). It excludes mutinous military factions, and supporters of ousted regimes.
11
xSub actor typology for all data sources xSub event typology for all data sources
Side C: Civilians are individuals who abstain from willful participation in politically contentious behavior. Civilians generally enter the dataset not as initiators of conflict events, but as unarmed victims of violence by any side. 12
Side D: The final category (other) includes militias, tribes, self-defense units, and other actors not directly challenging or supporting the government. This group also includes factions that don’t neatly fall into the first three categories due to political non-affiliation (e.g. intercommunal groups, criminal organizations, peacekeepers).
Table I illustrates the actor typology. Where data sources do not explicitly distinguish between the initiators and targets of conflict events (e.g. ACLED, UCDP-GED), the variables DYAD_* are to be interpreted as ‘event involving actors X and Y’ (undirected dyads). In all other cases, the interpretation is ‘action by actor X against actor Y’ (directed dyads).
Actions
xSub categorizes events into four categories of actions: (1) Any use of force, (2) Indirect force, including shelling, air strikes, chemical weapons, (3) Direct force, including firefights, arrests, assassinations, and (4) Protests, both violent and nonviolent.
13
While some data sources contain detailed information on tactics, most do not. Where there were no details, we coded actions as Any, or consulted the original article or author(s). Where data sources instead provided textual descriptions of events, we constructed a custom action dictionary, and used natural language processing to categorize the event into one of these categories. Table II summarizes the event typology. xSub also provides event-level data on specific actions, like air strikes, ambushes, armored offensives, arrests, artillery shelling, bombings, chemical weapons, Spatial aggregation
Covariates
In addition to conflict, xSub includes other variables frequently used in subnational research: local demographics (e.g. population density), geography (e.g. elevation, roads, land cover), ethnicity (e.g. local nationalities, linguistic groups), and weather (e.g. temperature, rainfall). To make these covariates consistent and internationally comparable, we drew them from publicly available GIS datasets with global coverage. Some of the covariates are time-variant and date back to 1900 (e.g. precipitation). Others are static (e.g. elevation).
Units of analysis
xSub provides event-level and spatial panel datasets, the latter of which aggregate events and covariates to users’ preferred spatial and temporal units. Geographic units include countries, provinces (first-order administrative divisions), districts (second-order divisions), PRIO-GRID cells (0.5 x 0.5 decimal degree lattice; Tollefsen, Strand & Buhaug, 2012), and electoral constituencies (Kollman et al., 2017). Temporal units include years, months, weeks, and days.
Figure 2 illustrates the spatial aggregation procedure, where raw data are (a) points, like event locations, and (b) polygons or grid cells, like weather. The upper left overlays raw data with spatial units of interest (here, electoral constituencies, CLEA_CST_N). The lower left displays aggregated measures for each unit, and the right pane shows the same in tabular form. For points (Figure 2a), we identify spatial units that contain each event, and generate local event counts at each time interval. 14 Because polygon borders do not always align (Figure 2b), we aggregate such data with area-weighted means. 15
Figure 3 illustrates how one’s choices of units affect the spatial and temporal distribution of conflict data in India: smaller units yield more variation, but also more sparsity.
User interface
xSub offers two options for customizing and downloading data. The first is an interactive web-based interface, at x-sub.org/data-download, where users can select countries, data sources or units of analysis, preview the data, and download a zipped archive with the requested data and supporting documentation. For example, the selection ‘Country: India’, ‘Source: UCDP GED’, ‘Space: district’, ‘Time: week’ will generate weekly observations for India’s districts, with local event counts for each week (from UCDP GED, broken down by actor and tactics), and local average statistics for weather and other covariates. The second option, for more advanced uses, is the xSub R package, available Geographic units of analysis
Illustration of use: Repression and dissent
To demonstrate how scholars might use xSub, we investigate the empirical relationship between overt manifestations of repression and dissent. A longstanding topic in civil conflict research is the contentious interaction between governments and challengers: how the actions and tactics of one side influence the actions and tactics of another, and whether escalation sparks reciprocal steps (Davenport, 2007). The dominant view is that crackdowns on opposition tend to inflame dissent (Gurr & Lichbach, 1986; Mason & Krane, 1989; Sullivan, Loyle & Davenport, 2012). Another school holds that, by raising the costs of behavioral challenges, repression is more likely to deter dissent (Tilly, 1978; Lyall, 2009). Some maintain that repression can have both effects, inflaming dissent at intermediate levels but deterring it at extremes – an ‘inverted U’ (Gurr, 1970; DeNardo, 2014) – or decreasing dissent at intermediate levels and increasing at extremes – a ‘U-shape’ (Lichbach & Gurr, 1981). Due to a plurality of idiosyncratic research designs and data sources – cross-national and subnational – the field has produced contradictory findings about which of these patterns is dominant.
To take stock of whether repression (i.e. government violence) increases or decreases opposition activity, we use xSub to conduct a meta-analysis across hundreds of subnational datasets. For each country and data source, we fit the following core model:
where
Our interest is in how the
We repeated this analysis across thousands of xSub spatial panel datasets, limiting our inquiry to yearly and monthly datasets with at least ten incidents of government violence and ten protests.
18
This narrowed our empirical domain to 14,299 datasets from 113 countries, at multiple units of analysis.
19
Figure 4 reports the predicted shape of the repression–dissent relationship across spatio-temporal units and data sources, based on a weighted average of
The results show a general tendency toward an ‘inverted-U’ relationship between repression and dissent: ‘anger’ at lower levels, ‘fear’ at higher levels
The relationship is highly sensitive to sources and spatio-temporal units: as units become smaller, the slope becomes increasingly positive. The curve’s steepness also varies regionally: European states reach the inflection point earlier, on average, than African states – potentially indicating greater coercive capacity, lower opposition resolve, or both.
The goal of this analysis has been purely illustrative, demonstrating how one may use xSub to assess the local relationship between repression and dissent, and how it varies – in direction and magnitude – across conflicts and data sources. These models cannot – and are not intended to – identify a causal effect. A more rigorous analysis with xSub might consider additional sources of variation and bias, like the endogeneity of repression, interdependence, tactical shifts, and spatial autocorrelation.

Meta-analysis of repression and dissent
Further building on this example, xSub allows scholars to uncover many sources of heterogeneity. For instance, we may expect indirect artillery shelling to have a very different effect on dissent than arrests and detentions. Protesters may react to repression differently than armed rebels. Political actors may also behave differently in locations closer or farther from their primary bases of support. While space constraints limit the scope of our inquiry here, xSub users can easily assess the generalizability of empirical relationships, switching from aggregated (country-year) to disaggregated (district-month) scales, replicating results across countries and datasets, with consistent definitions and units.
Conclusion
xSub reduces the barriers to comparative subnational research, by empowering researchers to quickly construct custom, analysis-ready datasets, pre-loaded with several popular covariates. xSub also offers a platform for scholars to contribute and distribute their own, original data. One of the reasons for fragmentation in subnational research is that many individual data collection efforts are project-specific: scholars assemble a new dataset for a one-off paper, and – apart from posting a replication archive – never re-use those data again. Rather than allow a dataset to ‘die’ with a paper, xSub enables researchers to give their data a second life, in the hands of new researchers, asking a new set of questions. Let a thousand flowers bloom.
Supplemental Material
Supplemental Material, JPR836697_Appendix_1-3 - Introducing xSub: A new portal for cross-national data on subnational violence
Supplemental Material, JPR836697_Appendix_1-3 for Introducing xSub: A new portal for cross-national data on subnational violence by Yuri M Zhukov, Christian Davenport and Nadiya Kostyuk in Journal of Peace Research
Footnotes
Replication data
Acknowledgments
We thank David Backer, Alex Braithwaite, David Cunningham, Karsten Donnay, Eric Dunford, Kathleen Gallagher Cunningham, Gary Goertz, Benjamin Graham, Macartan Humphreys, Volodymyr Ishchenko, Sebastian Kraus, Andrew Linke, Yonatan Lupu, Erin McGrath, Thomas O’Mealia, Andy Owsiak, Paul Staniland, Sebastian Schutte, Christopher Sullivan, Andreas Tollefsen, Henrik Urdal, Camber Warren, Nils Weidmann, Julian Wucherpfennig, and Thomas Zeitzoff for helpful comments, and Yioryos Nardis and Andrew Versalle for technical support.
Funding
We are grateful for financial support from the Research Council of Norway (Davenport; Project 250441). Nadiya Kostyuk’s work was partially supported by the William and Flora Hewlett Foundation.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
