Abstract
This article presents the UCDP Georeferenced Event Dataset (UCDP GED). The UCDP GED is an event dataset that disaggregates three types of organized violence (state-based conflict, non-state conflict, and one-sided violence) both spatially and temporally. Each event – defined as an instance of organized violence with at least one fatality – comes with date, geographical location, and identifiers that allow the dataset to be linked to and merged with other UCDP datasets. The first version of the dataset covers events of fatal violence on the African continent between 1989 and 2010. This article, firstly, introduces the rationale for the new dataset, and explains the basic coding procedures as well as the quality controls. Secondly, we discuss some of the data’s potential weaknesses in representing the universe of organized violence, as well as some potential biases induced by the operationalizations. Thirdly, we provide an example of how the data can be used, by illustrating the association between cities and organized violence, taking population density into account. The UCDP GED is a useful resource for conflict analyses below the state and country-year levels, and can provide us with new insights into the geographical determinants and temporal sequencing of warfare and violence.
Introduction
The last few years have seen an impressive burgeoning of studies within the strand of so-called ‘subnational’ analyses (see for instance Balcells, 2010; Raleigh & Hegre, 2009; Urdal, 2008). This trend has partially been spawned by the insight that the predominant country-year approach is flawed for certain types of research questions: oftentimes the theoretical approach used does not mesh well with the data and statistical methods available. As noted by Raleigh et al. (2010), when motivating the creation of the Armed Conflict Location and Event Dataset (ACLED), subnational studies have shown that overly aggregated approaches risk misconstruing the ‘correlates and patterns of internal conflict’, as one attempts to draw conclusions based on a mismatch between theory and available data. A geographically disaggregated approach may thus serve not only to allow us to answer research questions we previously could not, but also to facilitate better empirical tests of hypotheses and theories that are viewed as strongly and repeatedly validated. Data that are more temporally and spatially fine-grained can allow the research community to make great strides concerning both the former and the latter.
This article introduces the UCDP’s Georeferenced Event Dataset (UCDP GED) (codebook by Sundberg, Lindgren & Padskocimaite, 2010). The UCDP GED contains observations of fatal violence at the event level, meaning that each instance of organized violence with at least one fatality is included in the dataset as a unit of analysis. Each such event is accompanied by information on actors, dyad, conflict, geographic location and coordinates, as well as the specific dates on which the violence took place. Though the first version covers only the African continent from 1989 to 2010, future versions of the dataset will cover the entire globe and be updated annually.
This article proceeds as follows. Firstly, we expand on the rationale for the new dataset and its many possible contributions. Secondly, we go into detail on our definitions, operationalizations, data collection procedures, spatial and temporal coding, and what mechanisms are used to ensure data quality and consistency. Thirdly, we briefly outline the limitations of the dataset and explain our reasoning behind some of the theoretical and operational choices. Fourthly, we present some descriptive statistics as well as an example of how UCDP GED can be used to illustrate the incidence and intensity of organized violence by rural and urban areas. With this illustration we hope to stimulate interest for the many potential uses of the dataset for research questions at the subnational level of analysis.
Why a new dataset?
For a long time quantitatively oriented research within peace and conflict studies relied heavily on the aggregated country-year level of analysis. However, researchers have become increasingly interested in the promises that ‘subnational’ or ‘disaggregated’ approaches hold for the field. As has been pointed out by Raleigh et al. (2010) and Eck (2012) in their respective articles on georeferenced event data, this coming generation of conflict studies attempts to get at those mechanisms of war and peace that classical country-year studies have difficulties in engaging with (see also Buhaug & Lujala, 2005; Gleditsch & Weidmann, 2012). Some of these recent subnational studies have already produced interesting results, such as Balcells’s (2010) study on violence against civilians in over 1,000 municipalities during the Spanish Civil War, as well as Østby et al.’s (2011) study on population pressures, resource scarcity, and violence in Indonesia. It is becoming ever clearer that for some of the pathways the discipline wishes to pursue – and to be able to resolve some of the puzzles of war – disaggregated approaches and data will be useful.
However, the promising trend in this direction was long hampered by a lack of available data (Eck, 2012). Most of the projects that use subnational approaches and disaggregated conflict data have been restricted to the specific case under study, meaning that there exist little comparable data across cases. Thus, the recent introductions of datasets such as ACLED (Raleigh et al., 2010), the Social Conflict in Africa Database (SCAD) (Salehyan et al., 2012), and the Militarized Interstate Dispute Location Dataset (MIDLOC) (Braithwaite, 2010), with a wider scope than the individual case, provide welcome possibilities for cross-case work.
In order to strengthen the availability of disaggregated conflict data the UCDP has now created an events dataset, the UCDP GED, which takes as its point of departure the theoretical definitions and operationalizations that for over ten years have guided UCDP data collection. The UCDP country-year datasets include several different types of organized violence and are among the world’s most widely used (Dixon, 2009). In contrast to most other event datasets the UCDP GED can easily be integrated with a number of other UCDP datasets, something which should facilitate engagement with a broad range of research questions. Spatially, the dataset is also completely synchronized at the event level with PRIO-GRID (Tollefsen, Strand & Buhaug, 2012), which enables full compatibility with the many useful political, demographic, and environmental variables of that GIS structure (for an example combining these datasets, see Fjelde & von Uexkull, 2012). The spatially and temporally disaggregated data of UCDP GED – in combination with its compatibility with already existing datasets – contribute to opening up a plethora of possible research agendas concerning the subnational relationships of violence to geography, demography, environmental variables, and political events such as elections and border/boundary shifts, as well as the spatial and temporal links between different types of violence.
Criteria and definitions 1
Base definitions
The unit of analysis in the UCDP GED is the ‘event’ – an instance of fatal organized violence. Specifically an event is defined as: The incidence of the use of armed force by an organized actor against another organized actor, or against civilians, resulting in at least 1 direct death in either the best, low or high estimate categories at a specific location and for a specific temporal duration.
Each instance of organized violence that meets these criteria is recorded as a single observation in the dataset and constitutes a unit of analysis. From this definition of an event follows that the dataset contains only events in which it was possible to deduce fatality estimates; incidents where it is unclear how many, or if there were any, fatalities are not included. These criteria are adapted from the UCDP’s base definition of what constitutes an armed conflict/non-state conflict/one-sided violence, but with the removal of the 25 deaths criterion (in the calendar year) so as to place the definition on the level of the individual event.
The data contained within the UCDP GED are those events of organized violence that, in their aggregated form, constitute the UCDP’s country-year datasets on (1) state-based armed conflict (Gleditsch et al., 2002), (2) non-state conflict (Sundberg, Eck & Kreutz, 2012), and (3) one-sided violence (Eck & Hultman, 2007). The UCDP GED thus contains information on three types of organized violence: violence between two organized actors of which at least one is the government of a state, violence between actors of which neither party is the government of a state, and lastly, violence against unarmed civilians perpetrated by organized non-state groups or governments. The UCDP GED contains each building block of the corresponding aggregated country-year datasets in the form of its constituent events. In order for an instance of violence to qualify for inclusion, the circumstances must not only live up to the previously listed definition of an event, but must also fulfill the definitional criteria of a specified type of violence (for instance, for state-based conflict two parties must dispute an incompatibility).
These categories of UCDP organized violence are mutually exclusive, and thus a single event cannot be coded as being an instance of, for example, both non-state and state-based conflict. If instances of violence occur on the same date in the same location, but take place between different actors, these instances will be coded as separate events. It is, for example, possible for police to kill unarmed civilians (one-sided violence) in location x on date y, while battles rage in the same x on the same y between rivaling criminal factions (non-state violence). Since these two instances fulfill different coding criteria they are coded as separate events. These three categories of violence capture a wide spectrum of different forms of organized violence, but of course not all types that are sometimes seen to be related to the general concept of ‘conflict’. The theoretical categories leave out phenomena such as ‘rioting’ or demonstrations that turn violent (since these instances often fail to show a high enough level of organization), as well as clashes between police/army and individuals and/or groups that are armed but not particularly organized. 2
Dataset structure and inclusion
The present UCDP GED records all events of fatal organized violence in Africa between 1989 and 2010, but will in its end-state be global in scope and updated annually. The specific subset now released for public use includes some 22,000 events of fatal organized violence. The dataset contains events for all dyads and actors that, per calendar year, surpass the 25 deaths threshold for inclusion – the same threshold that is applied across all of the UCDP’s data on organized violence. The biases that this induces in the dataset are discussed further in the section on the dataset’s limitations below, as well as in the online appendix.
Each event comes complete with several spatial and temporal locators, such as place name, administrative division, and geographic coordinates, as well as start and end dates, to allow for fine-grained spatial and temporal analysis. Each event is also given a unique ID for easy reference, as well as conflict, dyad, and actor IDs. As was mentioned earlier, these IDs for conflicts, dyads, and actors correspond with identifying information found in other UCDP datasets and allow ample possibilities for dataset integration.
The data collection for the dataset follows the standard coding procedures for UCDP data: human coders that mine news sources, NGO reports, case studies, truth commission reports, historical archives and other sources of information. This method has been described in depth in several other publications (Eck & Hultman, 2007; Sundberg, Eck & Kreutz, 2012) and we will instead focus on explaining the procedures for the novel aspects of this dataset – the temporal and spatial coding. We would, however, like to highlight some new methods for ensuring data accuracy and quality that have been introduced as the sheer number of events included and the many variables that accompany each event have necessitated the introduction of additional quality-control instruments. Data quality is at least triple-checked, where the coder first runs through a checklist of consistency and streamlining tests. Secondly, a project manager performs similar tests, as well as controls of the geocoding through a set routine of visualization. Thirdly, PHP and Python scripts are run on the data to check consistency across IDs, coordinates, fatality counts, and more. While these routines cannot identify all errors in the base coding they nevertheless ensure high quality and consistency in the final product.
Coding, where and when
Information on where and when a specific event took place is one of the novelties of the UCDP GED compared to earlier UCDP datasets, and the aspects that allow for geospatial analysis and modeling of event sequencing. Disaggregated spatial and temporal information is extracted from the source material for each event, or through additional searches if that information is unavailable in the original material. Coding coordinates has proven to be more difficult than coding dates, especially since many instances of organized violence are played out in developing countries that lack good digitalized maps.
Several different sources on geographical locations have been used in the spatial coding, which is provided in the WGS84 (World Geodetic System 1984) horizontal datum. One important source has been the Geonet Names Server database (GNS) of the National Geospatial-Intelligence Agency (NGA), 3 a US federal agency specialized in geospatial intelligence. Despite its global reach the GNS is not detailed enough to cover some of the areas of the world where lethal violence takes place, such as many rural areas. The coders turn to a variety of additional sources that are of crucial importance for correctly identifying locations of violence when the GNS cannot supply the coordinates. These sources include, but are not limited to, Google Earth, digital maps provided by aid and refugee agencies, and field atlases. The coordinates come with a precision score that states the preciseness of the coordinates given, that is, whether the event can be matched to a specific location such as a village, or can only be related to some form of administrative division. Researchers can thus make informed decisions on what level of precision in terms of spatial disaggregation or aggregation suits their purposes best. 4 The location of the event is also supplemented with information on what country the event took place in (string and number format), the relevant first- and second-order level of administration, and the corresponding PRIO-GRID cell.
The temporal aspects of each event are classified in a similar manner, where the start and end dates given for each event are paired with variables that denote what type of event the observation constitutes (a single-day clash, a continuous clash, or a summary figure for an extended period of time) and with what certainty the time of the event is known. The lowest level of disaggregation is the single day. Temporal precision codes are again provided to give the analyst the possibility to set reliable cutoff points for the creation of different temporal aggregations.
Concerning both the temporal and spatial coding there are ample coding and estimation examples in the codebook appendix (Sundberg, Lindgren & Padskocimaite, 2010), as well as a more detailed description of the technical aspects in the online appendix.
Limitations of the dataset
There are, admittedly, some limitations to the dataset. 5 In contrast to many other event datasets the UCDP GED adheres to strict theoretical definitions of what constitutes, for instance, ‘armed conflict’. Strict adherence to the definitions of the three types of violence creates data that depict a specific universe of violence. The upshot of such adherence is theoretical validity; the users can be confident that what is captured in the data are instances of the type of violence they wish to study. The flip-side of the coin is that the universe of violence presented by UCDP GED does not include all the possible instances of violence that may be of interest.
Three limitations are especially worth addressing: (1) the 25 deaths criterion for inclusion, (2) the insistence that all actors be organized and identifiable, and (3) issues of estimating fatalities. All of these issues arise from a search for theoretical validity.
The first version of the UCDP GED adheres to the criterion of an actor or dyad having to produce at least 25 fatalities in a calendar year in order to be included. The reason is theoretical: the UCDP’s definition of, for instance, ‘armed conflict’ demands that the phenomenon has a certain intensity to be so classified. The argument is that larger skirmishes between actors are qualitatively different from smaller ones. The flip-side to this strict adherence is that it obscures very minor potential conflicts, and hampers analyses of conflict escalation from zero to 25 deaths.
The second point stems from the theoretical consideration that organized actors and their activities differ from semi-organized activities such as riots, armed demonstrators being attacked by security forces, and loosely organized ethnic groups battling a government. Through this theoretical perspective an instance of a group of people fighting another group of people cannot automatically be said to constitute an armed conflict. Unorganized or semi-organized activities are thus excluded from the dataset, as they are viewed as being theoretically different from organized violence in which groups are more permanently organized for combat. Anyone wishing to study riots and protests may instead use datasets such as ACLED (Raleigh et al., 2010) or SCAD (Salehyan et al., 2012). The argument for exclusion of unidentified actors is likewise theoretical. We argue that if violent events cannot be attributed to specific actors we are left with stating nothing more than that there is violence in a country. Not identifying organized actors thus makes it impossible to separate the phenomenon of armed conflict from widespread murder. This has meant the exclusion from the dataset of thousands of events that may be of interest to some researchers. Again, alternative datasets are available for those interested also in violence perpetrated by unidentified actors (for instance Raleigh et al., 2010).
The third point is also an important one. As has been pointed out by, among others, Raleigh et al. (2010), the task of estimating fatalities in conflict is difficult and is susceptible to bias. While it is clear that media and urban biases exist to a high degree in the coding of conflicts, we would still argue that including fatality estimates is a sound avenue as well as a theoretical necessity. First of all, UCDP estimates do not rely solely on media reports, but also on NGO reports, case studies, databases, and historical archives, which to a certain degree alleviates the well-known media bias. We would also question that not making use of fatality estimates removes issues of media and urban bias. The main bias appears to stem from how much reporting there is and not necessarily the contents of that reporting.
Also, relying on fatality coding can to a degree compensate for some media bias. The UCDP GED can track conflict intensity beyond the number of events, as some battles are bigger than others and thus make a stronger imprint in the data, even if there are fewer battles reported in certain areas that are comparatively less interesting for the media.
While the above limitations do exist in the dataset they are induced not by oversight, but by wishing to have a solid theoretical base for the dataset. For a further discussion of these issues, please turn to the online appendix.
Brief descriptive statistics
In this section we present some descriptive statistics in order to provide a snapshot of the different dimensions of the data. The UCDP GED version 1.1 contains approximately 22,000 events, covering the years 1989–2010 for the African continent. By types of violence the dataset contains some 11,000 events of state-based violence, 4,000 events of non-state conflict, and 6,000 events of one-sided violence. The total fatality count for the dataset is around 750,000 deaths in the ‘best estimate’ category (1,136,969 fatalities in the high estimate). 6
Disaggregating by fatality counts and type of violence yields a different picture with regard to the distribution of violent activity; in terms of state-based violence there are some 372,969–543,507 fatalities (best estimate 383,669), concerning non-state conflict some 59,981–105,357 fatalities (best estimate 68,725), and for one-sided violence 264,122–488,105 fatalities (best estimate 298,757). The dominance of the state-based category is thus mainly in terms of the number of events, as the number of fatalities in the one-sided category is relatively close to the number of fatalities in state-based violence.
Figure 1 shows the geographical distribution of fatalities from organized violence per 1x1 degree grid square in Africa from 1989 to 2010. The darker the square, the more fatalities occurred within that square. Figure 2 displays the yearly trends of fatalities, from 1989 to 2010, for each category of violence in the dataset.
In terms of events per country or conflict, there is great variation in the data. The countries with the highest numbers of individual events are Algeria (3,619 events) and South Africa (2,624) – two countries which have seen relatively high-intensity conflicts but which are also relatively prosperous and thus well covered in the news. This likely contributes to the higher number of events. Although the two countries above have the highest number of registered events they are not the most violent countries. Even if we exclude the two most violent episodes in Africa (the Rwandan genocide and the Geographic distribution of fatalities in Africa, 1989–2010 Yearly trends of organized violence in Africa, 1989–2010

Figure 2 shows that only the non-state category has displayed any form of stability across years in terms of fatalities, while the state-based and one-sided violence categories have seen large fluctuations across years. The larger spikes in one-sided fatalities represent well-known events in African conflict history and depict – among other things – the Rwandan genocide (when the spike is off the chart), violence mainly in the DRC and Burundi (1996), and violence against civilians in Darfur and again in the DRC (2002–04). Concerning state-based conflict, the years 1989–91 saw a high fatality toll caused by simultaneous and deadly conflicts in countries such as Mozambique, Angola, Ethiopia, Sudan, and Liberia. The spikes in 1999–2000 are mainly caused by the Ethiopia–Eritrea border war. Since around 2002, fatality levels for all three types of violence have dropped significantly and remained consistently around 3,000–5,000 annual deaths, clearly depicting a less violent Africa when compared to the ten preceding years. This trend is in line with what has been reported in the aggregated UCDP datasets.
Relating violence to population density
Next we exemplify how the UCDP GED can be used to illustrate the association between cities and organized violence, taking into account population density. Space restrictions prevent us from presenting a full-fledged geo-spatial analysis including a complete literature review and theoretically derived hypotheses.
All violence related to urban areas
Pearson chi2(1) = 746.3147; Pr = 0.000.
We conjecture that the effect of population density differs depending on whether the area in question contains an urban location or not. For rural areas we expect, in line with previous research, that higher population density is associated with a higher risk of violence. For urban areas, however, we expect the conflict-promoting effect of population density to be weakened or nonexistent. Our rationale for this conjecture is that, all else equal, the relative importance of controlling cities should increase the sparser the population in the area, since cities surrounded by more sparsely populated territory typically represent particular concentrations of valuable assets compared to the desolate countryside. Although cities in the most densely populated areas are likely to be larger, richer in resources, etc., the relative importance of controlling cities in more desolate areas may cancel out the lure of the richest cities. Hence, when comparing only areas with urban locations we do not expect to see a clear conflict-promoting effect of population density. At the same time, regardless of the level of population density, we expect urban locations to be more prone to organized violence compared to rural areas with the same population density. In other words we expect interaction between population density and cities.
Violence related to population density, by rural/urban location
Note that the indicator of urban location is not simply a derivative of the population density. Most observations in the quintile with the densest population do not contain a population center of at least 100,000 inhabitants, and are hence coded as rural. Conversely, there are quite a few grids with low population density that nevertheless are urban. Table II is a trivariate cross-tabulation between the violence indicator and population density (population density data from CIESIN, 2004), with control for the rural/urban distinction. The population density variable is obtained by sorting all grid-years from lowest to highest population density and dividing them into quintiles.
The upper subtable shows that more densely populated areas are increasingly more susceptible to violence when considering rural areas only. However, the relationship between violence and population density is not as clearcut when looking at the urban areas separately as in the lower subtable. Urban areas with medium-level population density are actually more prone to violence than the most densely populated urban areas, and there is no linear relationship between population density and violence. Similar patterns are obtained if the three types of violence are analyzed separately (not shown). In sum, we find preliminary support for our conjectures.
To complete this simple empirical illustration, we note that nearly 50% of all coded deaths occur in rural areas (using the best estimates and combining all three forms of violence). Although urban localities are relatively speaking much more prone to violence, the deaths in the vast rural areas add up to substantial numbers. Interesting differences emerge when we disaggregate the types of violence. In terms of the number of deaths, one-sided violence is the most urbanized type of violence; more than two-thirds of the deaths occur in urban areas. In contrast, the deaths in state-based conflict and non-state conflict happen mostly in rural areas; around 60% of the deaths in these two types of conflict are in rural areas.
Conclusions
This article provides an illustration of the potential uses of the UCDP GED for the study of the subnational causes and processes of organized violence. Not only are the events in the UCDP GED spatially and temporally disaggregated to allow for analyses of the timing of violent action and the possible geographic correlates of conflict (to mention but a few uses), but the dataset also opens up new opportunities for users of UCDP (and PRIO) data, as the dataset is fully compatible and integrated with a wide range of data in other datasets. The dataset in itself, together with the many products with which it is easily integrated, creates new and enticing opportunities for peace research. It is therefore our hope that the UCDP GED will be useful for many students of the subnational determinants of organized violence, and the consequences thereof.
Footnotes
Replication data
The dataset, codebook, and do-files for the empirical analysis in this article can be found at http://www.prio.no/jpr/datasets. The latest version of the UCDP GED dataset, as well as derivative datasets, are available at
.
Acknowledgements
The authors would like to thank the editor and our anonymous reviewers for helpful comments. We also acknowledge assistance from the UCDP team in general and Mihai Croicu in particular. Finally, we would like to thank Håvard Hegre, Halvard Buhaug, and Tomislav Dulic for their input when launching this project.
