Abstract
Textual Analysis by Augmented Replacement Instructions (TABARI) provides an automated method for coding large amounts of text. Using TABARI to code lead sentences of news stories, the KEDS/Penn State Event Data project has produced event data for several regions. The wide range of events and actors, TABARI’s ability to filter duplicate events and the number of events coded allow users to analyze patterns in conflict and cooperation between state and nonstate actors over time. We evaluate whether coding full stories provides more detailed information on the actors referenced in the lead sentences. Additional actor information would allow researchers interested in the interactions between violent nonstate actors to test hypotheses regarding group cohesiveness and splintering, spoiling behavior, commitment problems between factions and many other issues critical to management of an insurgency. We downloaded Reuters news stories relevant to the Israeli–Palestinian conflict and used TABARI to code the lead sentences. We then analyzed the full text of the coded stories to determine the level of actor detail available. Our findings highlight the dynamic relationship among nonstate and state actors during the Israeli–Palestinian conflict, and we find that, contrary to expectations, hand coding full news stories does not lead to significant improvements in the accuracy or depth of actor information compared with machine coding by TABARI using lead sentences. These findings should bolster the confidence of researchers using TABARI coded data, with the caveat that TABARI’s ability to distinguish between actors is dependent upon the detail available in the actor dictionaries.
As past actions serve as key indicators of future behavior, event data and analyses are a primary means of understanding the dynamics of conflict and cooperation among actors. Event datasets, such as WEIS (World Events/Interactions Survey; McClelland, 1978) and COPDAB (Conflict and Peace Data Bank; Azar, 1980) coded events from major newspapers and other periodicals to determine patterns of behavior. The first generation of event datasets were hand coded and generally focused on state interactions over defined temporal periods. Machine coding has allowed researchers to gather data on a variety of actors (international, state and sub-state) over large spans of time, with continuous updating ability. As the complexity of coding software has increased to match the complexity of international relations, the degree of precision in the coding rules is fundamental to analyzing accurate patterns of interactions.
The Penn State Event Data Project (formerly the Kansas Event Data System or KEDS) uses TABARI (Textual Analysis by Augmented Replacement Instructions), a machine coding software program developed by Philip Schrodt 1 to code cooperative and conflictual events between actors at the sub-state, state and international levels. TABARI codes the lead sentence of stories from newswire services, namely Reuters and Agence France Presse. Utilizing pattern recognition and sparse parsing, TABARI identifies and codes 2 the source, target and action of an event. 3 Coding only the lead sentences improves speed of coding and avoids problems associated with parsing multiple related sentences. 4 This practice, however, risks missing important supplemental information concerning the actors involved in the event coded from the lead. This project evaluates the coding of event observations derived from the lead sentences of newswire stories covering the Israeli–Palestinian conflict to determine whether coding full stories significantly improves the depth and accuracy of actor information in the Penn State Event Data Project. Ultimately, we find that hand coding full stories adds little to the actor information that TABARI collects through lead sentence coding. 5 This finding should increase researcher confidence in the the use of event data coded by TABARI.
Accurate and precise actor information is vital for the evaluation of learning, reputation and spoiling behavior theories, among others. For example, event data are often used to examine behaviors of reciprocity or action–reaction patterns over time (Goldstein and Freeman, 1990). In the case of the Israeli–Palestinian conflict, there exist distinct sub-groups, holding diverging preferences over end goals, acceptable thresholds of violence and degrees of cooperation. Coding any distinct actor generically, such as “Palestinian” or “Palestinian rebel”, without identifying the specific faction (should that be possible given the information in the story) categorizes different groups together in ways that may not reflect reality on the ground or changes in the interactions among groups over time. More simply, imprecise coding, specifically of actors, can give the appearance of action–reaction patterns that do not actually exist or may mask such interactions when they do exist on a smaller scale. For an example of this, see Figure 1(A) and (B). Figure 1(A) shows an action–reaction pattern between Israel and Palestinians. In Figure 1(B) the actions are likely to be related, but they are not representative of the typical action–reaction framework. Distinguishing between (A) and (B) could allow researchers to test theories of spoiler violence and reciprocity in more concrete ways than have been possible in the past; if we can distinguish between factions, we can begin to understand how one faction might be responding to interactions in a related dyad. That is, we might be able to observe if a faction like the Palestinian Islamic Jihad responds violently to cooperation between the Palestinian Liberation Organization (PLO) and Israel. Further, as shown in Figure 1(C) and (D), if the Palestinian Islamic Jihad and Fatah are both regularly coded as Palestinians owing to lead sentence coding, then the data will obscure any interaction patterns between the two groups.

Dyadic interactions in the Israeli–Palestinian context. Arrows indicate direction of actions.
Detailed information regarding the identities of actors in event data can allow researchers to use those data to understand how different types of actors are likely to respond to different types of stimuli. Rather than relying solely on the interpretations of those reporting the news events (whose perspectives may be clouded by proximity to the events) or on the statements of political or military figures (who may have incentives to misrepresent) to determine linkages between separate events, researchers can use event data to test for action–reaction patterns suggested by their theories. While the event data are coded from news events, TABARI codes only an individual event, not whether the event was said (by the perpetrator, reporter or experts) to be in retaliation for a previous event. The ability to capture patterns in accurate and precise event data suggests the potential to understand both individual events within a broader context and the long-term effects of individual actions.
Machine coding programs, like TABARI, provide multiple advantages over hand coding to create event data. Machine coding is significantly faster, much more cost effective, allows for reproducibility, lacks human bias and is fairly accurate compared with hand coding. Once actor and verb dictionaries have been created and fine-tuned, the costs incurred involve acquiring machine-readable text (newswire stories). Running the stories through TABARI takes a few seconds to several minutes depending on the size of the original dataset. Additionally, the speed with which TABARI codes and the fact that it is an open source software program allows users to easily experiment with various coding schemes or different source text. Researchers can make changes to the dictionaries and easily re-code the data. 6
The speed of coding also contributes to the cost-effectiveness of machine coding software. Again, after the initial investment in developing the coding dictionaries, there are few costs associated with running the stories through TABARI. In contrast, hand coding would require hiring individuals (generally students) to review and code individual stories. Human coders need to be trained, and the inevitable turnover in staff entails costly retraining.
Machine coding also allows for the reproducibility of results. The same coding scheme and rules used at one institution can be reproduced at another quickly and easily. As Schrodt notes in the TABARI manual, “a set of words describing an activity will receive the same code irrespective of the actors or time period involved” (Schrodt, 2009: 3). Gerner and Schrodt (2001) argue that human coding limits transparency in coding decisions and introduces both “systemic biases because of unconscious assumptions made by the coder” (p. 6) and “hind-sight bias [because] knowing the outcome of a mediation can potentially affect how informed coders assign values to independent variables” (p. 10). While both human and machine coding can result in bias, the sources of bias (bias in source material or in coding dictionaries) are much easier to identify and control for in machine coding than in human coding. 7 Finally, machine coding is fairly accurate. Based on tests conducted on KEDS using the WEIS coding scheme, machine coding applied the same code as a human coder 80–85% of the time (Schrodt and Gerner, 1994). 8
Despite these benefits, machine coding does have two main drawbacks. First, should the user be interested in coding stories for regions not yet addressed by the TABARI team, creating the requisite dictionaries is a long and intense process that requires a significant up-front investment. The existing dictionaries provided by the Penn State Event Data Project represent years of human coding and refining. Second, machine coding often has difficulty addressing complex sentence structure. Although newswire stories generally follow a straightforward subject–verb–object format, this issue can potentially result in miscoded or missed observations. 9 These drawbacks, however, do not outweigh the benefits of machine coding.
Comparing human and machine coding
The current project examines the depth and detail of actor coverage for event observations derived from Israeli–Palestinian interactions coded by TABARI. Schrodt and Gerner (1998) argue that the lead sentence generally summarizes the main event(s) in each story, particularly for regional areas that are well covered by media outlets. Indeed, in comparisons between coding the lead sentences and coding full stories, Schrodt and Gerner found the results to be highly correlated. They write, “in most statistical studies involving linear models, similar results will usually be obtained with either approach” (p. 17). 10 However, Schrodt and Gerner’s primary concern was locating additional events rather than analyzing the detail of the actors coded. This project focuses instead on the quality of sub-state actor coding based on information provided by lead sentences. That is, does the lead sentence provide enough data on the actors involved to produce accurate machine coding? 11
The current coding scheme utilized by TABARI, the Conflict and Mediation Event Observations or CAMEO, uses hierarchical coding to address the prevalence of sub-state actors in contemporary conflicts. The coding strategy uses a system of one to three, three-character elements to provide increasingly greater detail about the actors. The first element identifies a country, a religious or ethnic group not associated with a particular state, international actors (e.g. nongovernmental organizations) and finally geographic regions (e.g. Eastern Europe). The second three-character element generally refers to the status or role of domestic actors, such as the government, the opposition, the military or rebels within a given state. The second element also identifies sub-state regions and religious or ethnic identities associated with a particular region or state (e.g. Egyptian Copts). Finally, the third three-character element indicates the role or status of domestic actors that have already been identified by country, region or identity grouping. This element also indicates specific branches of international organizations (e.g. the International Atomic Energy Agency as a branch of the nongovernmental organization, the United Nations). As an example, the Algerian rebel group, the Armed Islamic Group, would be identified first by its country of origin, Algeria (coded DZA), then by its domestic designation as a rebel group (REB) and finally by its specific group name (GIA). References to the Armed Islamic Group as either the source or target of an event would be coded as DZAREBGIA. For additional information on the CAMEO coding scheme, see Schrodt et al. (2008). 12
The hierarchical coding system allows for the easy capture of distinct sub-state groups, contingent on their inclusion in the coding dictionary. As the Israeli–Palestinian conflict has been a central focus for the TABARI team, these dictionaries include detailed information about Israeli and Palestinian actors. 13 Yet, TABARI’s capacity to identify sub-state actors depends on their inclusion in the lead sentence of the newswire stories used as source text. If a specific group is not listed in the lead, TABARI, coding with CAMEO, will mis-specify the actual source or target of an event. For instance, if a lead identifies the source of a shooting as a Palestinian gunman, TABARI will code the actor as a generic PALREB (Palestinian/Rebel). However, later in the story, the actor may be further identified as a member of a specific organization, such as the Al Aqsa Martyrs Brigades, which would be coded as PSEREBAAM (Palestinian Occupied Territories/Rebel/Al Aqsa Martyrs Brigade).
TABARI-coded data can only be as accurate as its source text. Newswire stories vary in the degree of information available in the lead sentence and TABARI will only code the actors captured by the lead. However, even within a region characterized by significant media coverage and a highly developed dictionary, we may find systematic evidence that coding the lead sentence does not provide information on the primary actors of an event. The data generated by TABARI may be mis-specified or misrepresent the actions of a particular sub-group. Such miscoding could be particularly problematic if multiple groups are classified as a single broad category when, in fact, the individual groups have conflicting preferences or internal disputes. For example, imagine that all factions of the PLO are coded only as PALPLO, while some factions oppose a particular resolution brokered by another faction. Should the opposing factions, identified as the PLO, take action against the resolution, the resulting data may appear to suggest that the PLO was insincere in its negotiations and cannot be trusted in future negotiations. This appearance may obscure information regarding the preferences and power relationships of individual PLO factions that, if known, could suggest a route to conflict resolution.
For instance, Table 1 provides an actual sample from a Reuters newswire story as well as the resulting TABARI coding. The lead sentence clearly states “Palestinian gunmen” as the source of the shooting of an Israeli civilian and the TABARI output correctly codes the source as “Palestinian rebel” (PALREB). However, details appearing further in the story (in this instance, the next sentence) identify a distinct faction, Hamas, to which the gunmen belonged. 14 Data on the actions and reactions of Hamas, as separate from Fatah or rogue fighters, allows researchers to test accurately theories addressing sub-group behavior.
TABARI coding sample
Actor identification and detail
Event data play a significant role in international relations research (King, 1989; Schrodt et al, 1994). Indeed, the high number of event datasets (COPDAB, WEIS, PANDA and others) indicates the popularity of this type of data among scholars. The majority of these datasets utilize unitary states as the source and target of events; however, the post-Cold War environment is characterized by the increasing influence of sub-state, nonstate and international/transnational entities. The continuing relevance of event data relies on accurate information concerning these multiple types of actors. Indeed, correct actor identification can contribute to multiple research streams, including identifying patterns of conflict and cooperation, enhancing our knowledge of conflict dynamics and, perhaps, providing insight into new means of achieving conflict resolution.
Research on spoiler behavior and fractionalization relies on accurate information concerning intra-group dynamics. Disaggregating broad opposition groupings can reveal interactions within a group that could potentially trigger greater conflict. For instance, works by Kydd and Walter (2002), Bueno de Mesquita (2005) and Best and Bapat (2013) utilize formal models to examine sub-group interactions within dissident movements. Detailed information on which actors support or undermine negotiations and when they do so can strengthen the empirical findings derived from these models.
Beyond research on fractionalization, detailed actor information also supports research on learning and mimicking behaviors. Although it is well known that dissident groups collaborate both directly in operations as well as in joint training exercises (Stern, 2003), this research is anecdotal at best. Event data that accurately identifies actors can help determine which groups are working together and, subsequently, if such cooperation systematically influences sub-group operations. For instance, research by Crescenzi et al. (2010) indicates that vicarious learning among sub-state actors can affect the scope and level of violence. Their initial findings show evidence of action–reaction patterns in interactions among Israeli, Hezbollah, religious Palestinians and secular Palestinians, indicating possible learning patterns. Additional data on sub-groups can refine theories of learning and contagion among dissident groups.
Disaggregated actor information also provides valuable information on the counterinsurgency strategies of target states. Jaeger and Paserman (2005a, b) suggest that Israel responds differently to attacks from different dissident groups. Using data covering the Al Aqsa Intifada, the authors find that Israel responds differently to attacks from Fatah than to those from Hamas and Palestinian Islamic Jihad. Specifically, they find that it increases its violence against all Palestinian groups in response to Israeli fatalities caused or claimed by Fatah, but not in response to those caused or claimed by Hamas or the Palestinian Islamic Jihad. Israel’s counterinsurgency strategy relies on its ability to distinguish between violations of agreements made by its Palestinian allies and attacks from other Palestinian factions. Disaggregated actor information allows researchers to test for these sorts of patterns in target state strategies.
Recent work by Best and Bapat (2013) and Cunningham (2011) also points to research that would be supported by accurate and detailed identification of sub-state actors. Their findings suggest that rivalries and power differentials within broader groupings (such as Palestinians) can influence the state’s negotiating strategy. Leveraging differences among sub-state groups can lead to conflict that weakens the opposition overall. This research, and other scholarship that addresses sub-state group interactions, would benefit from data that accurately reflects the behavior of sub-state and nonstate groups.
In addition to enhancing our understanding of conflict dynamics, event data with detailed and accurate sub-group coding can also provide information on cooperative interactions. Indeed, one of the benefits of the CAMEO coding scheme is its inclusion of cooperative and conflictual actions, allowing researchers to examine multiple dimensions of nonstate group behavior. Particularly in situations where sub-state actors are characterized as dissidents or terrorists, event data indicating systematic cooperative behavior can aid scholars and policymakers in understanding which incentives induce cooperation or lead to positive tit-for-tat behavior.
Although not an exhaustive survey, the research outlined above indicates areas that could benefit from event data providing fine-grained actor detail. Disaggregating sub-state groupings by linking specific organizations with conflictual or cooperative behavior enhances the accuracy of our data, contributes to the testing of spoiling and credibility concerns within the opposition, and can provide insight into how groups learn from each other.
Our project serves as a first step in determining if more refined data are currently available for testing these theories. Toward this end, we examine a sample of news stories to determine whether additional information on the actors, as coded by TABARI, is available when coding the full story. Table 1 demonstrates the type of additional actor identification and information that can result when coding and analyzing full stories.
If we find that TABARI is systematically missing actor details (either through miscoding or because the information is not available in the lead), it may be necessary to re-program it to code full stories or be more cautious in the types of questions and models to which event data are applied. 15 However, should coding the full story lend no significant improvement to the detail of identified actors, our findings provide further support for the use of TABARI-generated event data as a reliable source of data.
Research design and results
In order to test the depth and detail of actors coded by TABARI, we compared the source and target produced by TABARI’s coding of lead sentences with the information available through hand coding full stories. Our project concentrates on Palestinian–Israeli interactions and interactions between Palestinians for two reasons. The first is that the Levant dictionaries are better developed than any of the other TABARI dictionaries. Because our focus is only on the utility of coding full stories relative to that of coding only lead sentences, the completeness of the actor dictionaries should have no effect on our results. That said, we expect researchers to be more likely to use TABARI-coded data to study the Levant region since this is the region for which it is optimized. Second, our substantive interest is in the Palestinians and their interactions with Isreal. While Palestine is only a small corner of the world, it is a corner of high political salience, so we expect that there are many others who share our substantive interest in Palestine.
Within Israeli–Palestinian dyads, we focus primarily on Palestinian actors owing to the many autonomous sub-state groups operating in the Territories. Major organizations, including Fatah and Hamas, have very different motivations and adopt different strategies. As noted above, classifying both groups under the umbrella term “Palestinian” masks unique interactions between actors. We do not disaggregate Israeli actors. As a state with a cohesive government, Israel behaves as a unitary actor; its military and police do not act independently from government policy. Therefore, it was not necessary to examine the depth and detail of Israeli actor coding. 16
We chose to compare stories from randomly selected days within two specific periods of time, each characterized by high levels of interactions among Israeli and Palestinian actors. 17 Specifically, we randomly selected 25 days within a four-month period, August–November, in both 1993 and 2000. We chose to randomly draw these days from the two periods, rather than to code all events from the two periods, in order to examine a larger time span and thereby minimize the risk of choosing a period with atypical actor coverage. Additionally, randomly selecting days from within these two periods of higher activity gave us a selection of days with different levels of activity, allowing us to control for the effect of action density on the coverage of actors. In each year, these four months surround significant events in the course of the Israeli–Palestinian conflict, namely the signing of the Declaration of Principles in 1993 and the outbreak of the Al Aqsa Intifada in 2000. We anticipated that there would be more relevant events and greater activity by many groups during these periods, thus providing a rich sample from which to test the detail of TABARI coding.
The 1993 Declaration of Principles, better known as the Oslo Accords, was the landmark peace agreement negotiated between Israel and the PLO. While relations between these two actors were marked by cautious optimism, the agreement generated significant opposition from other Palestinian groups, namely Hamas, and nationalist organizations, who formed an alliance against the PLO. The combination of cooperative and conflictual interactions between Israelis and Palestinians, broadly, as well as among Palestinian sub-groups, creates an environment that should feature increased activity by specific organizations. In other words, we expect that, in this context, a group may condition its behavior toward another organization based on a combination of previous interactions with competitors, allies and adversaries.
In September 2000, tensions between Israelis and Palestinians erupted, leading to the Al Aqsa Intifada. The period was marked by significant levels of conflict, primarily between Israeli forces and Palestinian factions. The environment was noisy, as both individuals and groups engaged in wide-scale violence against Israel. However, Mia Bloom’s (2005) analysis of competitive outbidding finds Palestinian sub-groups jockeyed to claim responsibility for violent attacks, namely suicide bombings, as a way to bolster their support base among a population strongly supportive of such attacks. As such, we should expect to see multiple groups engaged in violent attacks and claiming responsibility for violence.
Using a random number generator, we selected 25 days from each of two four-month periods, August–November, in 1993 and 2000. We then downloaded Reuters news stories from Factiva for each day of interest, filtered the text to extract the lead sentences and ran the lead sentences through TABARI to develop our base dataset. 18 The TABARI-coded dataset produced 267 events in 1993 in which Israeli or Palestinian actors were coded as the source or target and 206 events in 2000. 19
We then independently matched each event observation with its corresponding Reuters story and carefully reviewed each story to determine if additional details on the coded actors were available past the lead sentence. Our primary focus was locating available information on the actors coded by TABARI, rather than the accuracy of the coded event. 20 Thus, we examined whether the source or target of an event was more narrowly identified later in the story. In particular we concentrated on broad actor codings, such as “Palestinian” or “Palestinian rebel” to determine if these groupings were disaggregated in the body of the story. We focused specifically on whether the event verb code could be linked to a specific group or sub-group of actors. For instance, in the story sample featured in Table 1, titled “Israeli wounded by Arab gunmen in Gaza Strip”, from October 1993, the lead sentence identifies the perpetrator of the attack as “Palestinian gunmen”. TABARI coded the source as “PALREB”. However, a review of the full story revealed that the gunmen were associated with Hamas’s armed wing, the Qassam Brigades, allowing us to improve the accuracy and detail of the event information. 21 We ignored quotes or statements by specific groups or references to actions by particular organizations that did not relate to the coded event action. For a complete list of the actors in the TABARI dictionary that are specific to the Middle East, see Table 2.
Middle East actors coded by TABARI a
Adapted from Schrodt et al. (2008).
“Date restricted” indicates that actors changed roles over time, such as becoming the ruling party in government.
Our results are shown in Tables 3 and 4. Table 3 displays the percentage change in the source of an event between the TABARI-coded base dataset and the results of our hand coding of full stories. Table 4 displays the percentage change in the target of an event between the two datasets. 22 Hand coding captured additional actors not coded by TABARI, thus the total number of observations between these columns changes slightly.
Percentage change between TABARI and hand coding, by source a
1993 TABARI coded, n = 139; hand coded, n = 140. 2000 TABARI coded, n = 99; hand coded, n = 98.
Change is an artifact of the change in number of observations across coding method and the number of PSEGOV actions.
Percentage change between TABARI and hand coding, by target a
1993 TABARI coded, n = 154; hand coded, n = 156. 2000 TABARI coded, n = 117; hand coded, n = 116.
As Table 3 illustrates, coding full stories did produce some disaggregation of the broadest actor codes. For the Palestinians, these codes are PAL (“Palestinians”), PALPLO (“Palestinian Liberation Organization”) and PSEWBN/WSB (“Palestinian West Bank”). 23 Of particular note is the addition of events attributed to the Popular Front for the Liberation of Palestine (PSEREBPFL), with a change in 1993 from zero events as the source of action to 2.86% of all events, a count increase of three events. Other groups reflected smaller changes in 1993, including a 1.43% increase in actions attributed to the Democratic Front for the Liberation of Palestine (PSEREBDFL), a 0.7% increase in events with Hamas (PSEREBHMS) as the source, and a 0.71% increase in events with Palestinian Islamic Jihad (PSEREBISJ) as the source. In 2000, the largest changes were events attributed to Palestinian civilians (PALCVL), with an increase of 2.08%, most likely reflecting the broad-based violence of the Al Aqsa Intifada. Closely following was a 2.07% increase in events with Palestinian police as the source (PSECOP). Finally, hand coding produced a 1% increase in events attributed to the Gaza Strip (PSEGZS; events would have occurred in the Gaza Strip), the Palestinian military (PSEMIL) and Palestinian nongovernmental organizations (PSENGO). 24 Table 4 illustrates no major additions in detail of targets based on full-story coding. The one exception is the PLO (PALPLO), with a 2.73% increase in 1993. Smaller changes include Fatah (PSEGOVFTA), with a 0.87% increase in 2000 and Hamas (PSEREBHMS) with a 0.62% increase in 1993. The gains, particularly for targets of an event, were relatively small.
The difference in results across time periods could be related to the formalized nature of the conflict in each period. Although the 1993 Oslo Accords are presented generally as an agreement between Israel and a unified Palestinian front, in reality the Accords created a significant schism within the PLO. Several of the PLO’s participating groups rejected the agreement and withdrew or suspended their membership to signal their opposition. Many of these groups, including the Popular Front for the Liberation of Palestine—General Command, the Popular Front for the Liberation of Palestine, the Palestinian Liberation Front and Fatah Uprising were vocal, and sometimes violent, in their rejection of the Accords. Additionally, Hamas and the Palestinian Islamic Jihad, which were not members of the PLO, were also active in the controversy over Oslo. The activity during this period was perpetrated by distinct groups, rather than an amorphous sub-set of the population. Indeed, according to a poll conducted by the Palestinian Center for Policy and Survey Research 25 (PSR) in September 1993, close to 65% of respondents supported the proposed Gaza–Jericho First agreement (the first component of the Accords). 26 Opposition to the agreement was not broad-based.
In contrast, the environment in 2000 was very noisy. Similar to the 1987 Intifada, the 2000 uprising was characterized by activity on the part of both average Palestinians and distinct factions. Indeed, coded stories from the four-month period in 2000 reveals that over 35% of both machine-coded and hand-coded stories featured the generic category of “Palestinian” as the source or target of an event. Palestinians, acting independently of any distinct faction, represented a large segment of the activity during this time. Additionally, the high level and widespread nature of the conflict contributed to difficulties in identifying specific groups as the source or target of an event. In this period, the source material, rather than the coding rules, was responsible for the lack of significant changes between machine and hand coding.
Additionally, in hand coding the 1993 and 2000 samples, we found several instances in which actors appeared in the story but were not included in the Levant dictionary developed by the TABARI team. These groups included the Popular Front for the Liberation of Palestine—General Command, the al-Qassam Brigades (military wing of Hamas), Fatah Uprising and Soldiers of Mohammed Brigade. However, owing to the ease with which researchers can amend the coding dictionaries and re-code the data, incorporating these actors would take minimal effort. These groups are fairly marginal, however, and their inclusion does little to change our results. Event data are beneficial to detecting patterns of behavior over time; small groups that act infrequently will be subsumed in the noise of the data.
As a note to researchers using the TABARI system, we add that the data TABARI codes can only be as good as the sources from which it codes. Where news coverage is sparse or actions (particularly attacks) are more difficult to attribute, the data will be of lower quality than where coverage is good and information about actors and their actions is more readily available. That said, we suspect that, where reliable news coverage is not available, other information sources will be less reliable as well.
Conclusion
Overall, our findings demonstrate that hand coding full stories adds little substantive value to the degree of actor accuracy available from machine coding lead sentences using TABARI and the CAMEO coding scheme. In the instances in which additional actor detail was located in the body of the story, the total change of these additions was too small to be of much consequence. While we saw marked decreases in the number of observations attributed to the largest (and most broadly defined) actor category, PAL, those observations were redistributed across multiple other actor categories such that these changes did not noticeably increase the percentage of observations in any given category.
We would anticipate no significant changes in the results of research addressing sub-state actors in the Levant using a TABARI-coded base dataset vs a hand-coded dataset or a TABARI-produced dataset coded from full stories. Therefore, given the limited gain in actor information and the technical issues associated with running full stories through TABARI or other textual analysis software, we find that the current method of coding lead sentences is appropriate for use in analyzing the actions of smaller groups. Our results support the accuracy and detail of TABARI-coding software, using the CAMEO coding scheme and the highly developed dictionaries covering the Levant. Greater actor detail using event data will require news outlets to be more consistent in identifying the sources and targets of actions.
