Introduction
This manuscript tackles a recurring problem for researchers in the peace science community. True research reproducibility is best achieved creating data from scratch, although no published guide exists that informs researchers how to do this on their own. Instead, researchers may end up reusing old code generated in past studies, leaving them to spend time and energy adjusting the sample of states and the temporal domain, and doing whatever additional troubleshooting may arise from this practice. Researchers may additionally spend too much time reproducing old code for standard information that goes into any dyadic or monadic analysis—like contiguity relationships and democracy—and have to do additional troubleshooting for how these various data sources treat missing data or treat state codes in a manner that is inconsistent with the more accessible Correlates of War or Gleditsch–Ward state codes. This is all compounded by changes in technology that treat the creation of the data and the analysis of data as a continuous process in which the contemporary quantitative political scientist is increasingly becoming a computer programmer as well. Graduate students and other beginners in the field face unique challenges associated with these developments. Students just learning peace science must learn how scholarship informs data in peace science and how data inform scholarship at the same time that they are needing to learn quantitative methods in a chosen software package.
{peacesciencer}addresses these problems. Built around the free and open source R programming language, {peacesciencer} contains a suite of data and functions for creating data of interest to researchers. Researchers can use {peacesciencer} to create dyad-year, leader-year, leader-dyad-year, and state-year data (among some others) from scratch. Afterwards, they can add a variety of standard information (e.g. contiguity, alliances, major power status, GDP per capita estimates, capability estimates and more) to these data with a simple command. This is a considerable time-saver since, in the absence of it, researchers would have to more meticulously code and transform the raw data to conform to the kind of data they want. {peacesciencer} comes with some data innovations as well, including a comprehensive dataset on democracy by year, an original dataset on capitals and capital transitions, and a function to create peace years between ongoing conflicts. All are done with the maximum possible transparency. The project is available for public view on Github (https://github.com/svmiller/peacesciencer/). The data-raw directory on the project's Github contains information and comments about how every dataset was created. The function manuals (http://svmiller.com/peacesciencer/reference) contain additional comments about what each function returns and, in appropriate cases, why it is doing what it is doing. Thus, {peacesciencer} not only assists a peace scientist with their research, but it does so in a manner that best conforms to the Data Access and Research Transparency Initiative (DA-RT) initiative across all political science.
This data feature proceeds in the following fashion. The next section expands what need this package fills for peace scientists. Afterwards, it provides an overview of what is included in {peacesciencer} to help researchers more quickly conduct the kind of quantitative research they want. Thereafter, it provides a tutorial on how to install and best use {peacesciencer} in the R programming language. A more comprehensive tutorial follows, showing how {peacesciencer} already has a suite of data and functions that can allow for effective replications of a “dangerous dyads” type analysis (Bremer, 1992), standard state-year analyses of civil conflict onset (e.g. Fearon and Laitin, 2003), and even leader-year analyses of interstate dispute initiation (e.g. Horowitz and Stam, 2014). This feature concludes with a comparison of {peacesciencer} with other alternatives and a discussion of how {peacesciencer} can inform more reasoned design decisions for researchers in peace science.
Why {peacesciencer}?
{peacesciencer} is motivated by the following observations and ideals that led to its creation. For one, researchers invest too much time in the construction of a dataset that faithfully captures the unit of analysis. Assume that a researcher wants an original dataset on all directed dyad-years for Correlates of War states for an analysis of interstate conflict. How might one do that? The answer has never been immediately obvious. No published guide exists that shows a researcher how to create these data themselves from scratch, which is one reason why software bundles like EUGene and NewGene are attractive to researchers who primarily care about the substance of their research question. After all, EUGene's main value—if not its primary impetus—was allowing researchers to create datasets for replication of previous studies, producing a host of data types (e.g. dyad-year, state-year, dispute-year) along the way that users could amend as they saw fit. {peacesciencer} is primarily born from this question about how to create these data from scratch. The underlying code that produces these data types is available online and {peacesciencer} converts these lines of code into simple functions for the ease of the researcher.
Second, researchers also invest too much time in retracing steps for peace science analyses for new projects. Assume that a researcher finished a state-year analysis on the correlates of civil conflict onset a few years ago and wants to start a new project that analyzes the same outcome from a different angle (or perhaps using newer data). Under these conditions, a researcher will have to find where they stored that replication code and copy-and-paste it into a new directory for the new project. They may then have to change the name of some files, change some code to account for potentially new column names in the newer data, and troubleshoot instances where their old code does not perform as it once did. At its worst, this process may lead to some errors by the researcher. At best, this is tedium that spends the researcher's time that they would rather invest in analyzing the data. The lion's share of {peacesciencer}'s functionality is both creating the units of analysis for the researcher and merging in different forms of data in wide use in the peace science community so that the researcher can spend less of their time on tedium.
Third, the creation of the data and the analysis of the data are increasingly becoming one continuous process. Not too long ago, it used to be the case that researchers had to download a dataset, or create one from scratch (possibly in a spreadsheet or through a program like EUGene). After downloading or constructing the data, the researcher then opened a specialty program for statistical analysis (e.g. SAS, SPSS, Stata) to recode raw data into a form suitable for analysis before running a statistical model that regresses some outcome on a set of covariates. Current research practices still resemble this process, but the steps between them are no longer as large as they were in the past. Software options exist that allow the researcher to load data, create data, clean data, analyze data and present the results of the analysis all within one program. {peacesciencer}, by itself, does not do all of these things, but it seamlessly connects the beginning of the research process to the end of the research process without needing to leave the increasingly popular R programming language and RStudio (its free-for-use integrated development environment, IDE).
Fourth, it is increasingly the case that as the steps between creating data and analyzing data decrease in size, the lines between them blur as well. In other words, to create data is to code data and the contemporary quantitative political scientist is increasingly becoming a computer programmer (c.f. Bowers and Voors, 2016). This is happening concurrent to innovations in programming languages for statistical analysis, especially the R programming language that {peacesciencer} uses. There have been significant advances in add-on packages that allow users to do things like get World Bank data from the internet (Arel-Bundock, 2021a) and even format results from a statistical model for presentation in a way the reduces the probability of transcription errors to almost zero (Arel-Bundock, 2021b). {peacesciencer} embraces this. This R package reduces the time required to create peace science data for analysis and also informs the user about the code required to create the kind of data the user wants.
Finally, the creation and presentation of data in peace science should be 100% robust and transparent, which {peacesciencer} takes seriously in the following ways. The website for {peacesciencer} has several vignettes that describe its processes in some detail. These include how it provides reasonable estimates of democracy that may not be available in the Polity data or the Varieties of Democracy data and how a researcher can whittle dyadic dispute-years into true dyad-years through reasonable case exclusions. {peacesciencer} subjects itself to a battery of tests before publishing updates, making sure that new features do not create duplicate entries in the original data (which is the surest sign of a botched merge). The project's Github contains a publicly available data-raw directory that shows how every dataset included (and processed) in {peacesciencer} was created. The function manuals included in {peacesciencer} contain ample documentation that clarifies what each function is doing, what it returns to the user and why it is doing it this way. Researchers can also use the project's Github to point out bugs, ask for further clarification and propose additions. {peacesciencer} takes seriously the Data Access and Research Transparency Initiative (DA-RT) initiative across all political science and endeavors for maximum transparency, leveraging open source and version control software to inform users of what data it uses and how it uses the data.
What is included in {peacesciencer}
{peacesciencer} comes with a fully developed suite of built-in functions for generating some of the most widespread forms of peace science data and populating the data with important variables that recur in many quantitative analyses. The core functionality of {peacesciencer} reduces to two broad categories of functions. These categories are functions that create the base data of interest to a researcher and functions, called after the base data are created, that add variables of interest to the data frame or subset the base data to a handful of rows that the researcher deems appropriate for analysis.
1
Tables 1 and 2 list these core functions as of version 1.0, the version of this package slated for release alongside this paper.
2
D, dyad-year; L, leader-year; LD, leader-dyad-year; S state-year data; C, specialty functions applicable to just the dyadic conflict data; CoW, Correlates of War; G-W, Gleditsch–Ward; MID, Militarized Interstate Dispute; CREG, Composition of Religious and Ethnic Groups.
Table 1 lists the functions that create base data frames for a researcher starting an original project. They serve as functions that communicate the units of analysis supported in this package and that the package is capable of generating for an interested user. For example, create_stateyears() will generate the full universe of state-years from the Correlates of War (Correlates of War, 2011: v. 2016) or Gleditsch–Ward (Gleditsch and Ward, 1999: v. 2017) system, encompassing all state-years to the most recently concluded calendar year, depending on the arguments supplied to the user in the function. create_leaderyears() will generate the full universe of leader-years from the Archigos leader data (Goemans et al., 2009: v. 4.1), optionally standardizing leader-years to the Gleditsch–Ward or Correlates of War state system data. As of version 1.0, {peacesciencer} is capable of creating full dyad-year data, leader-day data, leader-dyad-year data, leader-year data, state-day data and state-year data.
3
A vignette on the package's website shows how users can create other forms of data from these functions as well (e.g. dyadic-dispute-year, leader-months, state-quarters).
Table 2 lists the main functions in {peacesciencer} that add information or subset the number of rows of the data to just those of interest to the user, describing these functions and listing whether they are applicable to dyad-year (D), leader-year (L), leader-dyad-year (LD), state-year (S) data or specialty functions applicable to just the dyadic conflict data (C).
4
All of these functions use raw or pre-processed data included in the package. For example, add_gml_mids() uses a dyadic dispute-year version of the Militarized Interstate Dispute (MID) data offered by Gibler et al. (2016: v. 2.2.1) and merges in information about whether there was an ongoing MID or a MID onset in a dyad-year, leader-year, leader-dyad-year, or state-year. {peacesciencer} also has some data innovations included in these functions. For example, add_capital_distance() calculates the distance between state capitals in kilometers using the Vincenty method (i.e. “as the crow flies”) based on an original dataset of state capitals that accounts for instances when capitals moved (e.g. Brazil in 1960, Burundi in 2018). add_democracy() does more than just add Polity data to a dataset. The data underpinning the function feature an innovation in democracy data, providing reasonable estimates of democracy using the Marquez (2016) method of extending the Unified Democracy Scores data (Pemstein et al., 2010) in addition to Polity estimates (Marshall et al., 2017: v. 2017) and V-Dem estimates (Coppedge et al., 2020: v. 10).
5
The coverage of{peacesciencer} focuses mostly on data that are released as standalone datasets for download, especially those in the Correlates of War or Gleditsch–Ward ecosystem of data. Data that can be obtained from a stable advanced programming interface—like the World Bank, for example—can be obtained through those other means (e.g. Arel-Bundock, 2021a). Its coverage will assuredly expand with new additions of interest to the peace science community, although the package already offers a lot to meet researcher needs.
6
How to install {peacesciencer}
{peacesciencer} is a package for the R programming language. This assumes at least some familiarity with the R programming language. Users should have at least version 3.5 of R, which should not be an issue since the most recent version—as of writing—is 4.1.3. {peacesciencer} is designed to be as user-friendly as possible. Those proficient in R, those just learning R and those with no experience in R should all be able to pick up its use fairly quickly.
The functions of {peacesciencer} functions work out of the box, although users should find their experience augmented by two additional downloads. First, RStudio offers an IDE that serves as a user-friendly graphical user interface over what is, at its core, a programming language with a command-line interface. RStudio's design will make it much easier for users to experiment with the functionality of {peacesciencer} and read its documentation to assist them with the use of these functions. The second additional download is {tidyverse}, itself a suite of packages that share a common form and design (Wickham et al., 2019). {peacesciencer} functions make considerable use of the component packages of {tidyverse}, and {peacesciencer} can work without it, but installing and loading {tidyverse} will allow the researcher to make quicker use of the functionality of {peacesciencer}. A user can open an R session by way of RStudio and install both packages as follows.
7
R packages once installed need to be loaded with every R session (i.e. every time the user opens RStudio). The user can load both with the library() function in R.
Thereafter, a researcher can start using {peacesciencer} to create the kind of data they need.
A tutorial on how to use {peacesciencer}
I encourage users who are using {peacesciencer} for the first time, especially those who are learning R for the first time because of their interest in this package, to approach {peacesciencer} with an idea of the kind of data they want to create for the sake of a project. The core functionality of {peacesciencer} begins with the creation of a data frame, which can then be populated with various indicators of interest. No matter, the suggested use of {peacesciencer} begins with the creation of a data frame. To start, assume that a new user without much familiarity with the R programming language installed RStudio, {tidyverse}, and {peacesciencer} with the idea of using {peacesciencer} to help them start a new research project that seeks to explain civil conflict onset across state-years. Toward that end, they have identified their unit of analysis to be state-year, which can be created with the create_stateyears() function. To get started, I encourage entering the following command in the console in RStudio.
This will open a documentation file for this function, which is itself quite verbose and informative for the user about what the function is doing. In this case, the documentation file will show that there are three arguments in this function, each with built-in defaults. This function will allow the user to choose a state system for which they want state-years (system, which accepts either “cow” or “gw” for Correlates of War (CoW) state system data or Gleditsch–Ward (G-W) state system data and defaults to “cow”), whether they want to extend the state system to the most recently concluded calendar year (mry, which defaults to TRUE), and whether they may want to additionally subset the years to a more narrow temporal domain that they may already have in mind (subset_years, which defaults to no subset of the data, returning all possible state-years). If the user simply ran the function with no overrides, the function would return all Correlates of War state-years from 1816 to the most recent year (2021, as of writing).
A user who approaches {peacesciencer} with a project in mind will see that they can better tailor this function to what they want. Their interest in a state-year analysis of civil conflict will probably gravitate them toward the Gleditsch–Ward state system data, since that is the state system that serves as the basis of the Uppsala Conflict Data Program (UCDP) armed conflict data. They will also know they have no use for pre-1946 observations since civil conflict data typically have coverage only from 1946 forward. Thus, the user can supply some additional arguments to tailor the creation of data to just what they want (here: all Gleditsch–Ward state-years from 1946 to 2019).
Another reader may be interested in using {peacesciencer} for a research project using a leader-year unit of analysis. These data can be created with create_leaderyears(). Users can read more about what this function is doing by consulting the documentation file in R with the following command.
This function has three arguments denoting the leader system to inform the creation of the leader-year data (i.e. Archigos) and two other arguments: standardize and subset_years. The standardize argument, when it defaults to “none”, returns all leader-years as presented in the raw Archigos data. Archigos’ leader data are nominally denominated in the Gleditsch–Ward state system data, if not necessarily Gleditsch–Ward state system dates (e.g. Archigos often has leader entries prior to state system entry in a few cases). Thus, the user can standardize leader-year data to Gleditsch–Ward state system dates or CoW state system dates. Finally, the user can subset the leader-year data to a more narrow temporal domain with the subset_years function. If a user is interested in creating a dataset of leader-years for an analysis of interstate conflict initiation with the GML MID data, they can create the data that interest them with the following function. Notice how create_leaderyears() also returns information about the leader too, like the leader's approximate age that year, their gender, and information about their tenure.
Researchers using {peacesciencer} to create data for their research project should start with one of these “create” functions, which will create the full universe of cases of interest to a researcher.
Creating dyad-year data and adding to dyad-year data in {peacesciencer}
After creating the base data of interest to their project, researchers can begin to add the information they want with the suite of functions outlined in Table 2. For example, a researcher interested in a dyad-year analysis of interstate disputes can create a non-directed dyad-year dataset from 1816 to 2010 in {peacesciencer} with the create_dyadyears() function, one of the aforementioned “create” functions. The following would create the base data of interest to the user (i.e. all non-directed dyad-years from 1816 to 2010).
Adding information to this data frame is a simple matter of joining a series of functions together in a “pipe”. The “pipe”—represented as %>% in the code below—is an operator built into {tidyverse} that allows users to pass forward expressions or functions. These pipes are common in the programming world and, as the code below will show, have the benefit of changing code in a way that is more intuitive and easier to both read and write for the user. While these functions can be modified to work without {tidyverse} installed and loaded into the session, the user will find their experience with {peacesciencer} is only improved by this important package.
For example, assume that the researcher wants just all politically relevant, non-directed dyad-years, where political relevance is traditionally understood as a dyadic relationship involving some form of a contiguity relationship or a major power (Lemke and Reed, 2001). create_dyadyears(directed = FALSE, subset_years = c(1816:2010)) created the full universe of non-directed dyad-years from 1816 to 2010, although this full universe includes “irrelevant” dyads like Nigeria–Mongolia and Estonia–Rwanda. Reducing the data to just politically relevant dyads is simple in {peacesciencer} and {tidyverse}. Users first create the data they want (here: create_dyadyears(directed = FALSE, subset_years = c(1816:2010))), follow it with the pipe operator (%>%), and then add another function from {peacesciencer} (here: filter_prd(), which also quietly executes add_contiguity() and add_cow_majors()).
Here, the user has created all non-directed dyad-years from 1816 to 2010 and then subset the data to just those with a major power or with some kind of contiguity relationship.
Users will find that the ease of the “pipe” will allow them greater agency in creating the full dataset that they may want for an analysis. Indeed, the pipe has the effect of forming something analogous to a drop-down menu, in which the user can “select” additional data/commands that they may want simply by specifying the function in {peacesciencer} that does what they want. For example, a researcher can follow filter_prd() with another pipe and communicate they want information about ongoing conflicts and conflict onsets from the Gibler-Miller-Little (GML) dispute dataset (Gibler et al., 2016). Following filter_prd() with add_gml_mids(keep = NULL) will add information about ongoing conflicts and onsets in a given dyad-year.
8
Users can also calculate peace-years for these conflicts with add_spells() using the pipe to pass forward the dataset and applying the add_spells() function to it.
9
Researchers should see that the functionality of {peacesciencer} can scale up nicely from there. For example, the following would round out the kind of information necessary to replicate Bremer's (1992) famous “dangerous dyads” analysis by adding information about national material capabilities (e.g. the composite index of national capabilities [CINC]) for both states in the dyad (add_nmc()), estimates of democracy for both states in the dyad (add_democracy()), information about alliance commitments in the dyad-year (add_cow_alliance(), by way of Gibler (2009)) and finishing with information about estimated population size and (surplus, gross) domestic product based on simulations reported by Anders et al. (2020). Whereas add_sdp_gdp() is the last command in the pipe-based workflow, the {peacesciencer} call ends by assigning to an object called Data. This type of assignment is done with the “right hand” assignment operator (i.e. ->).
If the user wants to move these data into Stata for analysis, they can save it to their current working directory with a command like haven::write_dta(Data, "my-data.dta") and import it into Stata when they are done. No matter, {peacesciencer} has pre-processed, cleaned, recoded, and merged the desired data, which have greatly reduced the time and energy that a researcher might otherwise spend doing something like hard-coding −9s in these data to be NA in the capabilities data. In this particular application, it has already created the main data required for a replication of Bremer (1992). There is only some slight data work to create the desired indicators for a statistical model of conflict onset, like a dummy variable for land contiguity, the presence of a major power in the dyad and some “weak-link” indicators of militarization, relative power in the dyad, level of democracy in the dyad (using the Marquez (2016) method for extending the Unified Democracy Scores [UDS] data) and the GDP per capita in the dyad. Table 3 is a formatted version of the results of a logistic regression model of conflict onset using these “dangerous dyads” indicators and temporal adjustment variables (t). Users typically do not end their analysis here—often looking for new predictors of conflict onset with these covariates in mind—but {peacesciencer} greatly reduces the time and energy that researchers must invest in cleaning and processing data for analysis.
Creating state-year data and adding to state-year data in {peacesciencer}
{peacesciencer} is capable of generating data for replications of analyses at multiple levels beyond dyad-year. Suppose a researcher wants to create a state-year data frame to conduct an analysis of civil conflict onset analogous to Fearon and Laitin’s (2003) well-cited analysis of civil conflict onset, but using UCDP conflict data and the Gleditsch–Ward state system for creating the appropriate universe of state-years. The pipe-based workflow will start with create_stateyears(system = 'gw', subset_years = c(1946:2019)), creating the full universe of Gleditsch–Ward state years and subsetting them to just 1946–2019 (because the UCDP data included in {peacesciencer} include just the observations in that time frame). Next, we can use the add_ucdp_acd() function to return information about ongoing UCDP conflicts and onsets for these states. add_ucdp_acd() takes three arguments: type, issue and only_wars. type is an optional argument for the type of armed conflicts for which the researcher wants information. Options include “extrasystemic”, “interstate”, “intrastate” and “II” (short for “internationalized intrastate”). If no type is specified, the function returns information about ongoing disputes and onsets for all states for all types of conflict. If the user wants information about multiple types of conflict—say, intrastate wars and internationalized intrastate wars—they can specify that as a character vector (e.g. type = c("intrastate", "II")). issue is another optional argument for what issue types of conflicts the user wants. Options include “territory”, “government” and “both”. If no issue is specified, the function returns information for all conflicts regardless of the particular issue. only_wars is an argument that subsets the data to just those with the intensity levels of “war” when only_wars = TRUE. The argument defaults to FALSE, returning information about conflicts with at least 25 deaths in addition to the conflicts with more than 1000 deaths. In this application, add_ucdp_acd(type = "intrastate", only_wars = FALSE) returns state-year information about ongoing intrastate conflicts over any issue and at either of UCDP's severity thresholds.
10
Finally, we can add some covariates of interest to these data. add_spells() calculates peace spells between ongoing conflicts in the data generated by add_ucdp_acd(). add_democracy() adds information about the level of democracy in the year using three prominent datasets on democracy (Polity, V-Dem and Marquez’s (2016) extension of Pemstein et al.'s (2010) UDS data). add_creg_fractionalization() adds information about the fractionalization and polarization of a state's ethnic and religious groups from the Composition of Religious and Ethnic Groups Project at the University of Illinois. add_sdp_gdp() will add information about a state's estimated gross domestic product (GDP), population and GDP per capita from the Anders et al., (2020) simulations. Finally, add_rugged_terrain() provides two estimates of the ruggedness of a state's terrain. The first is the terrain ruggedness index calculated by Nunn and Puga (2012) and the second is the Gibler and Miller (2014) extension of the natural logged percentage of the state that is mountainous (originally calculated by Fearon and Laitin, 2003). At the end of the pipe, the data returned by {peacesciencer} is assigned to an object minimally called Data.
The tight integration of{peacesciencer} with the {tidyverse} permits wide flexibility for the researcher. For example, assume that the researcher wants to discern the estimated effect of the same set of covariates on intrastate conflicts at the threshold of war and those intrastate conflicts at or below the threshold of war. The first call included all conflicts with at least 25 deaths, per the UCDP's inclusion rules, and the peace years were calculated for those as well. If the researcher wants a new set of conflicts with a new set of peace years, it will be a simple matter of repeating the pipe-based workflow, but altering the argument in add_ucdp_acd() to be only_wars = TRUE. {peacesciencer} would then calculate the peace years for those (add_spells()). To avoid confusion with the overlapping column names, the researcher can use some {tidyverse} verbs to rename all of those conflict variables to have a distinct prefix of war_ (i.e. rename_at(vars(ucdpongoing:ucdpspell), ∼paste0("war_", .))) before finally joining these data into the master data frame (i.e. left_join(Data, .) -> Data). Table 4 shows the fruits of the data {peacesciencer} generated after some post-processing and lagging important variables.
Creating leader-year data and adding to leader-year data in {peacesciencer}
{peacesciencer} also has support for newer levels of analysis in the peace science community, prominently leader-levels of analysis. There has been considerable emphasis in peace science research that state leaders, not “states”, make foreign policy decisions that may lead to war. Understanding the attributes of leaders themselves is critical to the core research questions of the community (Goemans et al., 2009; Horowitz and Stam, 2014; Ellis et al., 2015) and {peacesciencer} wants to help researchers toward that end.
Table 2 shows that there are some dedicated functions for populating leader-level data with leader-specific information, in addition to adding state-year-level information (e.g. democracy, capabilities) to leader-level data. Table 1 shows support for creating leader-level data that are standardized to either the CoW or G-W state system data. Suppose a researcher wanted to create leader-year data, standardized to CoW system dates, for an analysis of interstate dispute initiation analogous to what Horowitz and Stam (2014) do. The user would first start by creating the base data (create_leaderyears(standardize = "cow", subset_years=c(1875:2010))), which also generates some leader attributes (e.g. estimated leader age, year in office and leader gender). From there, they would add information about leader conflict behavior in the year with add_gml_mids() and calculate peace-years with the add_spells() function.
11
add_lead() will add a battery of leader experience and attribute variables from data created by Ellis et al. (2015), including whether the leader had military experience or combat experience, was a rebel fighter, and more. Finally, add_nmc() and add_democracy() will add state-year estimates of national capabilities and democracy for making state-to-state comparisons, even for leader-level analyses. Some light recoding of the data created in {peacesciencer} and some regression modeling reproduces the results presented in Table 5, itself an approximation of the kind of leader-year analysis exemplified in Horowitz and Stam (2014).
A comparison with other approaches
{peacesciencer} is not the only software available to peace science researchers who want to reduce the time and energy required to faithfully recreate data from scratch. Alternatives exist, some more inaccessible than others. NewGene, for example, is a stand-alone software program for Microsoft Windows and Mac that can create various types of data of interest to international relations scholars (Bennett et al., 2019). NewGene is itself the evolution of EUGene, which served conflict researchers well for over a decade (Bennett and Stam, 2000). Finally, peace scientists well versed in Structured Query Language (SQL) could use one of several “join” transformations to create dyad-year data from state system data, even if this might amount to a detour in the peace scientist's research agenda for this particular task.
12
A comparison of {peacesciencer} with EUGene and NewGene suggests the following benefits of the package while emphasizing areas where other options may have some advantages.
EUGene is the clear inspiration for this package. Although its original impetus was the generation of expected utility data for evaluating Bueno de Mesquita and Lalman (1992) (i.e. the “EU” in “EUGene”), the software became quite popular for peace science scholars in the early 2000s for helping them start new projects from scratch with important data already provided. {peacesciencer} covers all of the same units of analysis that EUGene covers, as of version 3.2.
13
EUGene has more explicit support for dyadic dispute-year data whereas {peacesciencer} treats dyadic dispute-year data as a derivation of dyad-year data, with respect to functions that populate base data with additional information.
14
{peacesciencer} may reflect more current research interests in the peace science community. This is why there are leader-level data and support for the G-W (and UCDP) ecosystem of data that EUGene does not have, but there is no function yet for things like calculating expected utility values. Importantly, though, {peacesciencer} mimics the verbosity of EUGene's user manual. EUGene's user manual was amply informative about what it was calculating and why it was doing what it was doing. Likewise, the documentation of {peacesciencer} strives to be as informative as possible as to what it is doing and why it is doing what it is doing. A scholar who remembers EUGene well will ideally think of {peacesciencer} as the most faithful approximation of what that software did for the community at the time. It has the added benefit of being agnostic to the researcher's operating system, having greater flexibility of data types supported, and better reflecting more current frontiers in the community. It does have the drawback of requiring at least some level of comfort with the R programming language.
NewGene is the latest evolution of EUGene, at least as a stand-alone executable program for creating the kind of data of interest to the peace science community. NewGene's greatest strength, relative to EUGene and even {peacesciencer}, is its support for k-adic data (c.f. Poast, 2010). k-adic levels of analysis are not yet supported in {peacesciencer} and users interested in generating k-adic data should consider downloading NewGene. No matter, {peacesciencer} has several superlatives in relation to NewGene. It supports leader-year and leader-dyad-year data while NewGene does not. It offers support for the G-W (and UCDP) ecosystem of data whereas NewGene does not and is mostly aimed for researchers interested in interstate conflict. NewGene deviates a little from EUGene by only indirectly asking users what kind of data they want (e.g. state-year, dyad-year) and without providing too much detail about what it is doing and why it is doing what it is doing. For example, NewGene only indirectly states the unit of analysis of the data to be generated near the top of its interface by asking the user how many country columns they want. This is an indirect way of the user getting data that are state-year, dyad-year, triad-year (etc.), which are then expanded and populated with data at various levels. {peacesciencer}, much like EUGene, is more explicit, encouraging the user to be upfront about what their unit of analysis is and what are the data that can be plausibly plugged into the data the user is creating. {peacesciencer}, again, does expect at least some level of comfort with the R programming language, but even this comes with greater ease of interpreting what is the unit of analysis and what are the primary spatial and temporal units that serve as the basis of the data.
Conclusion
{peacesciencer} is already more than capable of creating the kind of data in high demand in peace science. It can create dyad-year, leader-year, leader-dyad-year and state-year data (among others). It is also generalizable to the dispute data included in the package, allowing for merging into dispute-year data as well. This feature showed how it can effectively approximate three types of analyses in wide use in the peace science community. Surely researchers can and will add more information to these simple analyses after using {peacesciencer}, but the package already does a lot of the tedious work for researchers. It also does this in a maximally transparent way that conforms well to the DA-RT initiative across all political science. This is not to say that {peacesciencer} does everything, but {peacesciencer} can only evolve and expand on what it already does well. Users are free to request new features as “issues” on the project's Github.
Finally, a skeptical reader should not think that making the process as simple as possible necessarily facilitates poor decision-making by the user. In cases where it is evident what the user wants (e.g. an estimate of the level of democracy in the state-year), {peacesciencer} does the necessary work to provide the user that information. However, the package makes sure that it leaves important decision-making to the researcher. For example, add_cow_alliance() returns information about various types of alliance pledges in the dyad-year—should they exist—but leaves it to the researcher to say whether they want to define the presence of an alliance to be just a defense pledge or any type of alliance pledge. add_contiguity() returns information about the type of contiguity relationship in the dyad-year, but leaves it to the researcher whether they want to code a contiguity variable as the presence of a mutual land border or some other type of contiguity relationship. The documentation included in this package, and on the website, is replete with caveats about the underlying data (e.g. the contiguity data are not ordinal and should not be treated as such), how and where data issues arise (e.g. how CoW state system data differ from Gleditsch–Ward data and how one is coerced into the other), and how researchers should consider optimally using its functionality (e.g. add_ucdp_acd() probably should not lump all forms of conflict together; cf. Gibler and Miller, Forthcoming). {peacesciencer} does not endeavor to make researchers lazy or sloppy, and it does not ultimately do this. Instead, {peacesciencer} encourages well-reasoned design decisions by the user up front and reduces the tedium associated with starting quantitative peace science research. It achieves this in a quick, robust and transparent way.
Supplemental Material
sj-pdf-1-cmp-10.1177_07388942221077926 - Supplemental material for {peacesciencer}: An R package for quantitative peace science research
†
Supplemental material, sj-pdf-1-cmp-10.1177_07388942221077926 for {peacesciencer}: An R package for quantitative peace science research
†
by Steven V Miller in Conflict Management and Peace Science
Supplemental Material
sj-pdf-2-cmp-10.1177_07388942221077926 - Supplemental material for {peacesciencer}: An R package for quantitative peace science research
†
Supplemental material, sj-pdf-2-cmp-10.1177_07388942221077926 for {peacesciencer}: An R package for quantitative peace science research
†
by Steven V Miller in Conflict Management and Peace Science