Abstract
This article examines what scholars can learn about civilian killings from newswire data in situations of non-random missingness. It contributes to this understanding by offering a unique view of the data-generation process in the South Sudanese civil war. Drawing on 40 hours of interviews with 32 human rights advocates, humanitarian workers, and journalists who produce ACLED and UCDP-GED’s source data, the article illustrates how non-random missingness leads to biases of inconsistent magnitude and direction. The article finds that newswire data for contexts like South Sudan suffer from a self-fulfilling narrative bias, where journalists select stories and human rights investigators target incidents that conform to international views of what a conflict is about. This is compounded by the way agencies allocate resources to monitor specific locations and types of violence to fit strategic priorities. These biases have two implications: first, in the most volatile conflicts, point estimates about violence using newswire data may be impossible, and most claims of precision may be false; secondly, body counts reveal little if divorced from circumstance. The article presents a challenge to political methodologists by asking whether social scientists can build better cross-national fatality measures given the biases inherent in the data-generation process.
Introduction
Civil war datasets offer scholars an ‘international funeral parlour’ to view the dead. 1 This presentation of death obscures a murky reality, fraught with uncertainty about who has died and who is worth counting. Behind each observation in a newswire dataset is the story of a person who falls victim to an atrocity; a killer who disfigures, buries or conceals a corpse; a human rights officer or a journalist who arrives once the killing is done, and decides whom to talk to and what to record; a report that bureaucrats or newspaper editors make public; and a researcher who locates the report, and codes it into a dataset. This article is about what scholars can learn about the causes and consequences of civilian killings from newswire data in contexts wrought with non-random missingness. It asks: How do technical procedures, political agendas and normative judgements alter inferences about violence drawn from the newswire? Can social scientists build better cross-national fatality measures given those biases?
Scholars have long recognized that fatality data are nasty, noisy, and muddled (Williams, 2016; Spagat et al., 2009; Kalyvas, 2006). How researchers code determines what they see (Merry, 2016; Sambanis, 2004). Governments fabricate data or obscure it to produce ‘official ignorance’ (Aronson, 2013: 30; Werth, 1997). Epidemiologists and statisticians must adjudicate among assumptions to model observations missing not-at-random (Manrique-Vallier, Price & Gohdes, 2013; Checchi, 2010; Dulic, 2004). These problems motivate some to reject civil war research because the data preclude design-based inference, while others proceed believing that patterns in messy data reflect stronger trends in reality (Gerber, Green & Kaplan, 2014).
This article charts a course between nihilism and blind faith. It starts from the premise that newswire data remain invaluable to conflict scholarship in spite of non-random missingness. Not all fatality count biases are equal: some distort our inferences about civilian killings in ways that preclude precision, while others prove trivial. I argue that conflict scholars should train their methodological eye on the first type. But to do so, they need to understand what they are.
I contribute to this understanding by offering a unique view ‘under the hood’ of newswire data for South Sudan. Most quantitative datasets in the social sciences stand on qualitative foundations, which demand careful source interpretation (Kreuzer, 2010). I begin by demonstrating how this feature of newswire data creates confusions in defining civilians, biases in counting methods, and estimation problems. I then unveil the data-generation process by recording the experiences of people who handle the dead and file newswire reports. Drawing on 40 hours of interviews with 32 human rights advocates, humanitarian workers, and journalists, I investigate the biases we should expect to distort data for South Sudan. I focus subnationally on killings in Jonglei State to assess patterns of non-random missingness. I conclude with lessons about which biases matter most and assess the merits of using categories rather than counts to analyze violent deaths in low-information contexts.
Counting the dead
In war, ‘something is always wrong with the facts one is given’ (Nordstrom, 1997: 43). Scholars, journalists, and human rights advocates disagree about who dies in crises for many reasons. They diverge on who counts as a civilian, and face political and technical choices about how they handle the dead. This leads to variations in how researchers record killings and estimate what they do not see. Newswire datasets offer starting points for navigating these problems by offering systematic records of the dead over time. The problem for civil war researchers is that they can rely neither on a missing-at-random assumption nor on a general pattern of missingness that applies across contexts.
Which civilians?
How researchers define dead civilians determines how they code, and how they code determines what they see (Sambanis, 2004). This confronts conflict scholars with two challenges.
The first is to distinguish civilians from combatants – a difficult exercise in irregular wars without subjective judgements (Gade, 2010). Belligerents may remove uniforms or put them on the dead, and people shape shift between civilian and combatant categories at unpredictable times and in unpredictable ways. For example, South Sudan’s ‘White Army’ is an ad-hoc mobilization of cattle camp youth without clear central command, which sometimes aligns with insurgents (Thomas, 2015). Whether observers count these youth as civilians or combatants depends on what they use death counts for (Ratnayake, Degomme & Guha-Sapir, 2009). For example, human rights advocates may choose an expansive definition to obviate excluding ‘victims’.
The second challenge is that conflict researchers study ‘direct’ civilian deaths, requiring an ‘event grammar’ that identifies a perpetrator, a victim, and an act (Landman & Gohdes, 2013: 78; Ball, 1996). Even where the act is observable, scholars must disaggregate whether a homicide is conflict-related, criminal or both (Williams, 2016). This is difficult where criminal networks run deep into the battlefield and conflict-related killings shape criminal homicides (Kleinfeld, 2017).
Political and technical bias
Once researchers settle on a civilian measure, they face technical biases, which arise from counting methods, and political biases, which arise when people’s agendas shape what they report (Aronson, 2013). Two issues make this distinction a useful but blurry framing device. The first is that technical choices can motivate political biases that would not otherwise pose a challenge. For example, wire journalists may interview survivors at an atrocity site. Due to political pressure, civilians might misreport what they observed. Thus, the recording technique motivates political bias.
The second issue is that everybody, from survivors to researchers, has agendas that shape what they say, choose methodologically, and judge credible. These agendas may be technical. For example, epidemiologists and forensic statisticians arrive at different counts because they have different objectives that motivate different survey questions (epidemiologists seek data to improve the health of the living; forensic statisticians investigate homicides) (Aronson, 2013). These agendas may deprioritize accuracy to advance a normative cause (De Waal, 2016). Political authorities may also distort official records to propagate a particular social outlook (Merry, 2016; Nelson, 2015; Tishkov, 1999).
Biases in newswire data
These issues magnify up the chain through coding and analysis to shape political discourse. Controversies over death counts in Rwanda and Libya illustrate this problem. In 1994 Rwanda, African Rights provided the first systematically gathered evidence of killings (Omaar, 1994), which later provoked accusations that the investigator was a Rwandan Patriotic Front sympathizer (Van Oijen, 2018). In Libya, Human Rights Watch estimated fewer civilian fatalities than the US government claimed (Kuperman, 2013). In such cases, coders must choose which sources to trust (Weidmann & Rød, 2015).
Political organizing and fluid advocacy agendas add contingency to Table I, altering the direction of biases non-randomly. For example, human rights entities’ growing attention to rape in Bosnia and Herzegovina may explain the apparent increase in reported incidents over time (Cohen, 2013: 466). Interest levels also affect the intensity of human rights investigations into past events. In Guatemala, journalists missed the surge in rural killings of Mayan communities in the 1980s, in a typical case of urban bias (Ball, Kobrak & Spirer, 1999: Chapter 9). Yet, in the 1990s, Mayan communities powerfully mobilized to demand that human rights organizations intensify their investigations into abuses in the countryside (Nelson, 2015).
The bottom line is that newswire data comprise interpretations of observers (Davenport & Ball, 2002). This leads to conflicting accounts open to arbitrary selection.
Estimation options
The scholars who make newswire data reliable, free, and public have long acknowledged these problems. As the Uppsala Conflict Data Program Georeferenced Event Dataset (GED) codebook asserts: ‘The goal […] is not to present the most complete and accurate image of a certain conflict at a certain point in time, but rather be a tool for the global understanding of subnational conflict patterns and trends’ (Högbladh, 2019: 3). Nonetheless, social scientists frequently assume, implicitly or explicitly, that media reports of killings correlate with actual killings (Krauser, 2020; Berman et al., 2017; Fjelde & Hultman, 2014; Wood, 2014; Toft, 2010). This assumption is more credible in some conflicts than others (Sloboda et al., 2013). Fighting forces target individuals and groups for specific reasons, and make differential choices about which atrocities they conceal or put on display (Fujii, 2013). Political authorities also make non-random decisions about where investigators can go after conflict events. Thus, newswire datasets often omit accurate information about atrocities in the most insecure environments, obstructing micro-level analysis (Landman & Gohdes, 2013; Eck, 2012).
Statisticians have advanced promising remedies for missingness. However, these require high-quality data and explicit probability models to yield precise point estimates. For example, the Matching Event Data by Location, Time and Type (MELTT) method assuages missingness in individual datasets by integrating them using an algorithm to identify matching entries. As the developers of MELTT acknowledge, this still requires high quality data and random missingness to support meaningful estimation (Donnay et al., 2018). Methods like multiple imputation also demand sufficient information about the cause of missingness to allow a missing-at-random assumption (Bauer, Ruby & Pape, 2017).
Multiple systems estimation (MSE) offers another promising alternative by using overlaps between kill lists to estimate deaths recorded nowhere (Landman & Gohdes, 2013; Hoover Green, 2013). The challenge is that MSE is unviable in many contexts, and cannot be performed with newswire data alone. The method, like other epidemiological techniques, demands information at the level of individuals rather than events, and ideally requires at least three different kill lists. Thus, MSE offers a powerful method in contexts with some bureaucratic capacity but does not correct newswire deficiencies.
Given these problems of non-random missingness, what can civil war scholars learn from newswire data about the most volatile conflicts? To answer this question, I turn to South Sudan.
Counting the dead in South Sudan
‘We’ve lost count’ (Hervé Ladsous, cited in Martell, 2016). As the UN peacekeeping chief confessed, humanitarians have struggled to document killings in South Sudan (Maxwell et al., 2018). From 2014 to 2018, officials cited 50,000 dead (Martell, 2014; Reuters, 2016; Council on Foreign Relations, 2018) – a figure that jumped four-fold overnight when the London School of Hygiene and Tropical Medicine (LSHTM) issued a mortality estimate with transparent confidence intervals (Checchi et al., 2018). This section offers vignettes from South Sudan to illuminate how biases in data generation distort what observers think they know about the civilian dead.
Empirical strategy
This article studies South Sudan for methodological and normative reasons. South Sudan’s newswire data exhibit useful variations in types and degrees of bias, with higher quality data for some areas than others. These challenges lead some scholars not to study contexts like South Sudan. Yet, sole focus on data-rich environments eases causal inference at the expense of understanding whole categories of conflict; empirical bias feeds theoretical bias (Arjona & Castilla, 2020). This produces an imperative to optimize rather than reject newswire data in turbulent, low-information conflicts.
I begin by plotting South Sudan’s Armed Conflict Location and Event Data Project (ACLED) and GED data, from December 2013 to December 2015, disaggregating it to investigate missingness (ACLED, 2019; Raleigh et al., 2010; Croicu & Sundberg, 2017; Sundberg & Melander, 2013).
2
I then conduct more than 40 hours of interviews with 32 people responsible for monitoring civilian killings in South Sudan and other relevant contexts.
3
These span 17 UN officials, including South Sudan, October 2011 (based on UN map)
I asked people to describe what they did when they reported on an event, at the scene and afterwards. I prompted interviewees to reflect on whom they systematically missed and could count well. I also asked people’s views on a range of issues, including the feasibility of distinguishing civilians from combatants, and civil war-related violence from ‘intercommunal’ violence. I then extracted references to incidents, time periods, and locations from the transcripts, examining how different accounts clashed or aligned with newswire data.
I received perspectives that covered South Sudan unevenly. Therefore, I focused subnationally on a location and time period for which I could more feasibly ‘know the unknowns’. The resulting case study of Jonglei State, from December 2013 to December 2014, guides the analysis.
Turbulence
Dataset characteristics
This situation created military and humanitarian turbulence (De Waal, 2015). The UN estimates that the conflict had displaced four million people by 2018, while causing food insecurity and ‘the longest-running cholera outbreak in its history’ (UNOCHA, 2017: 5). By 2016, UNICEF claimed armed groups had recruited more than 17,000 children (UNICEF, 2016). Human rights organizations have accused armed groups of using forced starvation as a tactic, and of perpetrating ‘epic proportions’ of sexual violence (Mednick, 2017; Turse, 2016).
These numbers represent advocacy agendas. But they describe a conflict of chaotic complexity where governing institutions kept few records, and belligerents destroyed what they found.
Who has died: According to the newswire
At first glance, GED and ACLED show large discrepancies (Table II). GED captures less than half of ACLED’s civilian killings, in part because it denotes killings as ‘unknown’ where civilian or combatant identities are unclear. If we assume that most of these ‘unknowns’ are civilians, and GED would code some of ACLED’s killings as ‘unknown’, the gap narrows. However, these counts do not converge across months. Figure 2 presents South Sudan’s aggregate monthly killings; ACLED and GED differ in trend and magnitude. For example, from June to July 2015, ACLED out-counts GED, whereas the opposite is true for the same period in 2014.

Monthly fatalities, South Sudan, Dec 2013–Dec 2016
The divergences also appear subnationally (Figure 3). 5 ACLED captures more observations, but GED’s counts exceed ACLED in some periods, notably during an April 2014 government offensive in Unity State. 6
How counting happens
To understand what these data represent, I interviewed the authors of three types of source dominant in GED and ACLED: South Sudan-focused outlets such as Radio Tamazuj; human rights reports (principally by Human Rights Watch, Amnesty International, and the UN); and civil society and think tank statements, which the newswire reports cross-reference.

State-level fatalities aggregated by month
Newswire journalism
Journalists expressed surprise when I described how scholars use newswire data. Interview respondents did not believe they had a mandate to produce accurate fatality counts. Instead, they sought to publish first-hand observations in a format that allowed readers to interpret reliability (A26, Martell; A31, Van Oudenaren; A29, Gridneff; A28, journalist). Ilya Gridneff, who had reported for the Associated Press and Bloomberg, explained: ‘The nature of wire reporting is that you hedge your bets by quoting someone else, such as “the SPLA spokesman said there were ten dead.” You never say how many yourself, because you are just reporting what someone said.’
Journalists adopt this approach to fit their professional mandate. Peter Martell, who had reported for AFP, BBC World Service, and IRIN, described newswire reporting as a haphazard, hand-to-mouth existence: ‘Stringing meant living from story to story trying to make a living’, where agencies paid USD 50 for a report. In this context, ‘the objective was to get something onto the wire’. That information was intended for others, like Al Jazeera, to verify further. The problem, Martell explained, is that upstream media houses would revert to his original source, making verification circular. This produced a dilemma: Do you report nothing [if you can’t triple source], or report what you’ve got noting who has said it […] saying here is the information as we know it at the time? In South Sudan, the second happens more often than not. Journalists’ responsibility is to tell stories and not collect data for social scientists. The responsibility is to readers […] It requires case-by-case editorial judgements as to whether we trust a person: Even if he is lying, is the number worth printing? Even if the number is questionable, is it worth printing? We may interview a county commissioner who says there was a cattle raid and 20 people were killed. We don’t verify that but we print that […] We can’t wait around nine months for the UN to produce a human rights report. We are not doing our job as journalists if nobody knows there is a massacre in a city until a year afterwards […] So, let’s transcribe what we know, and who knows what and put it out there.
In spite of these problems, interviewees felt pressured to cite numbers. Copeland spoke of journalists’ desperation for death counts. When she voiced an estimate of 50,000 dead, she recounts how the information spread like wildfire through media houses. As Martell explained: ‘If you can quote numbers, it changes the situation […] Numbers make reporting more effective’. However, as Copeland commented, it can take a committed journalist several months to arrive at an accurate account of a single incident, if they can arrive at one at all.
Human rights investigations
Human rights investigators face similar dilemmas. In the case of human rights NGOs, small teams deploy from a regional base for advocacy purposes (A8, human rights advocate). This disposes human rights investigators to report the most egregious cases and not others. Interviewees explained how they sought to register the quality of human rights violations, for which a death count can be counterproductive (A8, human rights advocate; A14 human rights advocate). Indeed, one interviewee noted how death counts derailed debates about the killing of healthcare workers in Syria. UN officials distrusted his organization’s reports because of numerical imprecision, although they had access to robust evidence of targeted killings at a health facility. This was sufficient to demonstrate a violation of international humanitarian law, for which it did not matter whether an armed group had targeted one or 100 healthcare workers. In this context, ‘the accuracy of the count became a distraction’ (A4, human rights advocate).
Accordingly, interviewees emphasized that they sought to establish scale and types of violations, rather than comprehensive accounts of events (A14, human rights advocate). For example, on viewing the aftermath of massacres in Bor town, South Sudan, the Human Rights Watch investigator reported that ‘scores died’, rather than estimate what they could not see (Wheeler, 2014). However, numbers do appear in human rights reports, which coders transfer as data points. Like their journalist colleagues, human rights advocates expressed surprise when I described how scholars use their data. One interviewee commented that, like wire reporters, they include numbers through anecdotal quotes, giving sufficient context for the reader to interpret accuracy. For example, they might write, ‘According to John in X village, his two brothers were killed in the attack’, without any intention of conveying that two was a credible count for the event (A14, human rights advocate).
Civil society and think tank statements
Civil society organizations and think tanks also produce numbers that travel through the newswire. These are attractive sources to journalists, because they derive from ‘expert’ opinion and intentional estimation. For example, Copeland produced the 50,000 count through a multimethod strategy, where she interviewed displaced people and traveled to hospitals across South Sudan to check records. However, Copeland conducted the count alone, since international agencies were over-stretched. She urged readers to interpret her estimate as ‘at least 50,000’ at one moment in time (A16, Copeland).
Remembering the Ones we Lost provides another data source journalists have turned to. South Sudanese volunteers launched the initiative to memorialize the deceased or missing. People submit testimonials online or by text message, while the volunteers conduct their own investigations (A15, South Sudanese human rights advocate). The initiative draws substantially on community lists – local authorities’ informal records of the dead (A25, UN official). However, lists often include missing people who arrive alive in internally displaced people (IDP) camps (A13, UN official; A25 UN official; A30 UN official; A32 UN official). Some saw the lists as useful starting points for investigations due to the quality of local knowledge (A13, UN official; A32 UN official), while others judged them limited in reliability and coverage (A14, human rights advocate; A15, South Sudanese human rights advocate).
This discussion of who produces source data illustrates how technical and political constraints create non-random omissions and distortions in the evidence available to coders. I now turn to how this affects the patterns evident in newswire data.
Who is missing?
Interviewees produced common stories about which populations journalists and human rights reporters count, and which they miss. Investigators have access to a subset of IDPs in UN protection of civilian camps (PoCs), which housed 209,000 of 1.9 million displaced across South Sudan in 2017 (UNOCHA, 2017). The PoCs provide concentrated spaces to interview witnesses of attacks. Consequently, investigators have ‘mapped the PoCs left, right and centre’, using the camps to record testimonies (A15, South Sudanese human rights advocate).
By contrast, a human rights investigator estimated that international agencies were unable to account for a quarter of the population due to poor access to 85% of South Sudan’s terrain (A14, human rights advocate); populations beyond airstrips remained a mystery (A30, UN official). The UN Mission in South Sudan (UNMISS) developed a fatality tracking system in 2012 that corroborated this uncertainty, delivering zero counts for places like Upper Nile state in months where insecurity limited surveillance. Officials kept the counts confidential, to obviate external misinterpretation (A18, UN official).
Interviewees also reported limited data on the elderly and people with disabilities, whom fleeing families often left behind during attacks (A14, human rights advocate; A25, UN official). One investigator falsified allegations that an armed group had executed the elderly in Malakal when he discovered combatants had brought them to safety: ‘The soldiers weren’t angels, but treated them decently’ (A14, human rights advocate).
Interviewees also feared that information was more prevalent about some ethnicities than others. For example, an investigator alleged that local authorities suppressed information about the Shilluk dead in Malakal (A8, human rights advocate). Others found it difficult to count the Dinka dead due to cultural taboos around talking about the deceased (A14, human rights advocate). Some predicted undercounts, while others feared Dinka civilians were overcounted due to the (Dinka-dominated) government’s interests in revealing rebel-perpetrated abuses (A16, Copeland; A3, UN official).
I also heard concerns that newswire data undercounts ‘intercommunal’ civilian killings, although ACLED captures more of these incidents than other sources (A5, South Sudanese academic). To compound journalists’ problems with pitching these stories, interviewees disagreed about whether intercommunal killings counted as direct conflict deaths. Some saw intercommunal violence as criminal, unrelated to civil war politics (A12, NGO worker), while others judged it inseparable from the broader conflict narrative. One UN official saw the line between crime and armed conflict as so fluid in South Sudan as to be meaningless (A32, UN official). Others offered examples of politicians manipulating cattle raids as proxies for political retaliation. Such acts bind local disputes to the national politics driving civil war (A15, South Sudanese human rights advocate; A14, human rights advocate; A5, South Sudanese academic; A3, UN official).
Fatalities in Jonglei state
Even within subnational units, sources of missingness vary in ways that preclude capturing them in a probability model. Vignettes from Jonglei state illustrate this. Jonglei sits on the Sudd, the largest marsh on earth (Figure 4): ‘Multilingual; muddy and remote; mutinous and divided against itself: Jonglei exemplifies South Sudan’ (Thomas, 2015: 278). It is an illuminating case due to its political significance, which should draw journalists’ attention.
In December 2013, Jonglei became the frontline between government forces and the SPLM-IO. The violence initially hit the state capital, Bor town, which changed hands four times in four weeks when armed groups killed and evicted civilians. Many took refuge on the UNMISS base, to be terrorized when armed youth attacked the PoC camp on 17 April 2014.
Other areas of Jonglei, previously wrought with violence, experienced calm. Pibor – a year earlier, the center of an insurgency against the government, led by David Yau Yau’s SSDM/A-Cobra Faction (Cobra Faction) – became one of the most stable places in South Sudan in 2014. When the civil war erupted, both the government and SPLM-IO sought to co-opt the Cobra Faction. Yau Yau used this to bargain for special autonomy in Pibor.
These dynamics appear inconsistently in the newswire data. Figure 5 illustrates fatalities aggregated by month; months without data points signal either no killings or no information. Observable data points at ‘zero’ code recorded conflict events without reported civilian deaths. Figure 6 presents these data padded with zero counts in periods without observations, and Figure 7 decomposes the data for Jonglei’s 11 counties.

Jonglei state, December 2016 (based on UN map)

Monthly fatalities, Jonglei, Dec 2013–Dec 2014
If no observations reflect no violence, Fangak, Pibor, Uror, and Nyriol appear peaceful. While this assumption fits Pibor in 2014, it seems suspect for the other counties. I explore these data gaps in vignettes of a correct count, an undercount, an overcount, and missing data.
Pibor
A former UN official (A25) described visiting Pibor in March 2015: ‘It was remarkable – an oasis of peace with sheep grazing by the river. There were Cobra Monthly fatalities, Jonglei, Dec 2013–Dec 2014 (padded)
Bor town, December 2013–January 2014
In November 2013, Bor was much like Pibor in March 2015. That month, I (the author of this article) had sauntered through a lively town, drinking tea and eating fresh perch from the Nile. On 16 December 2013, I started evacuating colleagues from the spreading violence. My colleague in Bor told me things were calm and to prioritize the team in Juba. He called me the next morning as he fled towards the UN along a road littered with bodies.
‘After this, things got blurry for days’ (A25, UN official). International agencies evacuated. On 26 December the UN redeployed civilians under movement restrictions. In the first month of the conflict, Bor became a ‘black hole’ to international observers while armed groups killed and evicted civilians (A13, UN official).
The Bor town mayor forbade the few remaining residents from moving bodies (A13, UN official). Once government control returned, the mayor brought journalists and human rights advocates to observe the scene:
Fatalities by county in Jonglei state aggregated by month, Dec 2013–Dec 2014 (Nyirol and Pibor excluded due to no recorded events) Bor town killings There were bodies everywhere. We would go into houses and find civilians shot […] I will never forget this woman hiding under her bed and seeing her hand coming out. It stank appallingly […] The whole of Bor was a massive crime scene. It was pathetic. (A8, human rights advocate) There were all these dead bodies, all over town, weeks into the crisis. The authorities asked the UN to help with body removal. The ICRC was not there to do it, and only had a written guide available, which is what we relied on […] We tried to do some mapping of dead bodies by sector, but it was a very rough estimation. A drop in the bucket. (A13, UN official) The numbers that made it into the UN’s public report were the numbers buried in the collective graves. We are confident of these numbers because the UN military picked up the bodies and buried them. But they were not identified, and we didn’t check if they were civilians. (A13, UN official)
Attack on UNMISS Bor base, 17 April 2014
An attack on the UNMISS Bor base illustrates an overcount in a situation where events unfolded before investigators’ eyes in a confined space (A25, UN official). Officials faced confusion as youths started shooting at civilians on UN premises: ‘There were good intentions that meant bodies were removed and the injured treated. But it was done by [untrained] humanitarians and military personnel’ (A13, UN official). For several days, a security lockdown restricted UNMISS staff from investigating evidence outside the base (A13, UN official). This meant that the UN required months to confirm the attack’s death count: ‘The situation involved a fairly finite physical space […] and yet we still had to contend with controversy over the numbers’ (A13, UN official).
Community leaders immediately called media outlets, claiming hundreds of deaths. They later notified the UN of 146 dead, of which they could identify 55 by name, some of whom appeared alive among the displaced. The UN found forensic evidence of 53 dead (UNMISS, 2015: paras 107–108). GED counts 47 dead, which accurately reflects the UN’s count of the IDP dead (UNMISS, 2015: para 105). However, ACLED counts 60 civilian dead.
Northern Jonglei
The most difficult missing observations to interpret relate to places unmonitored by any agency, such as northern Jonglei: ‘We didn’t know what we weren’t seeing’ (A21, UN official); ‘An intelligent person would know the single digits we were getting [from the UN fatality tracking system] were wrong’ (A18, UN official). But the scale of the undercount was unclear: ‘In the early months, we had no idea of what was going on in the Nuer areas of northern Jonglei […] we presumed they [Dinka and other non-Nuer] had to have been killed’ (A25, UN official). This assumption proved wrong when UN officials discovered rebels had imprisoned rather than killed many Dinka (A25, UN official). An aid worker summed up the uncertainty in Canal/Pigi county specifically: It is a place that’s caught in the middle. We have no sense of how bad things really are. It is clearly not the worst place in Jonglei. At the same time, the security is not good enough for a long-term presence […] The only thing we see is food security data from the World Food Program, which looks bad for the country but is not the worst in Jonglei […] There is not enough evidence of things going really wrong for people to look at it. (A12, NGO worker)
Counting the dead better
The Jonglei vignettes illustrate empirical dilemmas with elusive solutions. Nevertheless, researchers may consider two steps when handling newswire data: (1) elucidate biases; and (2) measure prevalence.
Biases
This inquiry suggests scholars should train their eye on at least seven types of bias.
This feeds
Journalists’
Respondents feared that
The South Sudan case also highlights data distortions from inconsistencies in the
I also heard accounts of The chaos of the first month or two of the war created a space for well-intentioned people to communicate what was taking place with relatively minimal interference. Once the government was competent again in Juba structures formalized again […] All of a sudden, a national security person was with the people you were interviewing. Either people were brave and would tell the truth, or they would pick selectively […] Space has closed since the beginning of the conflict.
Prevalence
These biases generate non-random missingness, threatening statistical inferences based on point estimates in places like South Sudan. Absent a fix, researchers might consider analyzing the prevalence of violence using categorical measures. This method is neither novel nor perfect. However, in some situations it illuminates magnitudes, patterns, and uncertainties more reliably than counts. To illustrate, I compare a measure of fatality prevalence against numbers for Jonglei.
The following approach adapts a decades-old method. The human rights field developed categorical variables for violence in the 1980s, when the Political Terror Scale project translated a coding scheme from Freedom House to measure violations (Stohl, Carleton & Johnson, 1984). Monthly fatalities by county in Jonglei (padded), Dec 2013–Dec 2014
This approach optimizes qualitative information when numbers are uncertain. Consider, for example, what descriptive data show that reported numbers obscure about an attack on the UN base in Akobo, Jonglei state on 19 December 2013. ACLED registers a zero count, and GED two ‘unknown’ killings.
7
Yet, public reports describe numerous fatalities: Many civilians were killed in the attack. Due to the chaotic circumstances and the inability of survivors to observe the full attack as their movements were restricted, the HRD [Human Rights Division] has not been able to determine an exact figure of fatalities. However, any Dinka civilians not evacuated by UNMISS are presumed dead, and at least 20 civilians, as well as two peacekeepers, were killed. (UNMISS, 2014: para 143)
This report portrays a stark contrast to ACLED and GED’s counts.
While attractive in these situations, it is important to note that categorical variables blunt rather than resolve the problems this article explores. They also generate their own dilemmas. As Cohen recognizes, report-based prevalence measures face similar biases to newswire data (violence that is too dangerous to report will go unreported, descriptively and numerically). Therefore, researchers may only be able to discern patterns at a high level of aggregation – in her case, the country-year (Cohen, 2013: 466–467). Prevalence measures are also normative projects. Categories make populations legible for a political or scholarly purpose (Scott, 1998). They require decisions about who counts as a victim, and what thresholds indicate moderate or extreme abuses (Nelson, 2015).
Nonetheless, prevalence measures reveal fatality patterns that numbers can obscure. I illustrate for county-months in Jonglei, December 2013 to December 2014 (Figures 8 and 9). Table IV describes my coding rules for Monthly fatality levels by county in Jonglei, Dec 2013–Dec 2014
Figures 8 and 9 show how civilians in two counties that look alike in the newswire may have divergent experiences of killings. For example, Fangak and Pibor both register zero counts in ACLED and GED. Yet, qualitative descriptions indicate that Fangak experienced violence that likely killed civilians, while Pibor remained calm in 2014.
Coding rules
Conclusions
This study has normative and empirical implications. It raises questions about whether counting the dead is analytically useful and politically ethical in contexts like South Sudan. Death counts are prerequisite to memorialization, criminal justice, humanitarian targeting, and meaningful scholarship on political violence. Yet, ‘there is a risk when things are so messy and complicated that throwing something like a death toll out there is a shiny distraction from the issue at hand’ (A16, Copeland). Why, as scholars, human rights advocates, and policymakers, do we obsess about numbers in these situations? Can data that miss so many of the dead advance knowledge?
Maintaining the hope that it can, this study has two implications for social science. First, precise point estimates about the causes and consequences of violence may be impossible using newswire data. Scholars face neither statistical nor qualitative silver bullets to resolve bias problems. Categorical variables can reveal patterns where newswire data are difficult to interpret, but do not solve non-random missingness. Thus, this article poses a challenge to political methodologists: Can social scientists build better cross-national fatality measures, given the biases inherent in the newswire-based data-generation process? While this article offers few answers, it is meant to stimulate this debate.
Secondly, body counts reveal little if divorced from circumstance. It is impossible to construct theories about violence with numbers decoupled from the human experiences that lie beneath. As a former aid worker (A23) shared: ‘It is the nature of killing that matters, not the number.’ He recounted how a man with disabilities was castrated, killed, and left on display in the Malakal PoC camp: There were riots a month later and the Dinka left […] [because they feared] they would be ethnically cleansed […] If you look at fatality data this man appears as only one death but that body was used to convey something important.
Social scientists owe it to both knowledge advancement and the dignity of the dead to ground statistics in context.
Footnotes
Replication data
Acknowledgements
I thank Kate Baldwin, Ali Zeynel Gökpınar, Stathis Kalyvas, Jana Krause, Keith Krause, Anne-Kathrin Kreft, Jule Krüger, Roxani Krystalli, Jason Lyall, Ian Shapiro, Jessica Stanton, Jason Stearns, Alex de Waal, Martin Waehlisch, Skye Wheeler, Elisabeth Wood, and participants at the International Studies Association Annual Convention 2019 and Yale African Politics Working Group.
Funding
This article was supported by a US Institute of Peace (USIP) Peace Scholar Award. The views expressed in this article are those of the author and do not necessarily reflect the views of USIP.
