Abstract
The Ill-Treatment and Torture (ITT) Data Collection Project uses content analysis to measure a number of variables on more than 15,000 public allegations of government ill-treatment and torture made by Amnesty International (AI) from 1995 to 2005. The ITT specific allegation (SA) event data use the torture allegation as the unit of observation, thus permitting users to manipulate them for a wide variety of purposes. In this article, we introduce the ITT SA data. We first describe the key variables in the SA data and report a number of bivariate descriptive statistics to illustrate some of the research questions that might be usefully investigated with the data. We then discuss how we believe the ITT SA data can be used to study not only AI’s naming and shaming behavior, but also states’ (lack of) compliance with the United Nations Convention Against Torture (CAT). We conclude with an empirical analysis using the SA data that investigates the effect of domestic political institutions on formal complaints, investigations, and adjudication of torture allegations. We show that legislative checks are positively associated with complaints, investigations, and trials; elections and freedom of speech are positively associated with investigations and trials; and powerful judiciaries are associated only with investigations.
Introduction
Amnesty International (AI), an international nongovernmental organization (INGO) working for the protection of human rights, documented allegations of torture and ill-treatment against 98 countries in 2011 (Amnesty International, 2011). Although telling of the amount of government torture occurring worldwide, this figure masks a remarkable degree of heterogeneity in state torture practices. In January 2012, for example, AI criticized a Migrant Accommodation Center in Western Ukraine for ill-treating and forcing into isolation a group of Somalis and Eritreans detained at the center (Amnesty International, 2012c). In the same month, security and military forces in Libya were accused of leaving ‘visible marks’ on pro-Gaddafi loyalists, including open wounds on the head and limbs (Amnesty International, 2012a). The scars left by the Libyan military stand in stark contrast to AI allegations of torture against the United States that focus on water boarding, sleep deprivation, the use of loud music, and exposure to cold temperatures (Amnesty International, 2012b).
As these vignettes suggest, torture is perpetrated by many government agencies against a variety of victims. Torture techniques also vary widely, and government responses to allegations of torture are similarly heterogeneous across states. Although human rights advocates and scholars acknowledge these distinctions, existing large-N, cross-national data (e.g. Hathaway, 2002; Cingranelli & Richards, 2010b) on torture focus almost exclusively on its incidence at the country-year level of observation – the level of torture reported to have occurred in a given country during a given year. While those data have greatly increased our understanding of the covariates of torture, they are unable to support inferences about agents and victims, methods of abuse, and state response to torture allegations.
In this article, we introduce new cross-national event data on Amnesty International allegations of government torture from 1995 to 2005. The Ill-Treatment and Torture (ITT) Data Collection Project specific allegation (SA) data are different from previous data on state torture in at least two important ways. 1 First, by coding allegations of torture as events, ITT is able to generate information on a multitude of characteristics of allegations not currently available for large-N, cross-national analysis. Aside from information on torture incidence, the SA data include information on the duration of a particular allegation, the location of the event, the type of torture, the government agency alleged to have committed a given abuse, the econo-socio-political group of which the alleged victim is a member, and the government response to the torture allegation. Second, the ITT project explicitly recognizes that actual state torture and ill-treatment are unobservable (e.g. Spirer, 1990; Clark, 2001: 57; Cingranelli & Richards, 2001: 230–231; Goodman & Jinks, 2003) and instead focuses on what can be measured reliably and validly (e.g. Bollen, 1986): AI allegations of such violations. This is a significant conceptual departure from previous data collection efforts that have implicitly treated the allegations contained in reports by INGOs and governments as representative of state’s (lack of) respect for human rights.
Overview of the ITT specific allegation (SA) data
To generate data on AI allegations of torture and ill-treatment, 2 the ITT Data Collection Project performed content analysis of all Amnesty International publications published between 1995 and 2005. An allegation of state torture or ill-treatment occurs when AI alleges that the perpetrator is an agent of the state, the victim(s) is a person detained under the state’s control, and the alleged abuse meets the definition of torture and/or ill-treatment in the United Nations Convention Against Torture (CAT).
Unlike existing data on human rights violations, the ITT Data Collection Project measures allegations of torture at two units of observation: specific allegations (SA) and country-year (CY) allegations. The distinction between these units of analysis involves the breadth of their spatial-temporal domain. The CY data report allegations of abuse targeted at a particular government agency over the course of an entire year. In contrast, SA data include only precise allegations about abuse in a specific place that is smaller than the country itself or that occurred during a period of time less than a year in duration. AI allegations that the state tortured an individual victim, that ill-treatment was prevalent in a single prison, or that torture occurred during the three weeks following an election are included in the SA data.
Take, for example, the following allegation against Uruguay in 2005: ‘Amnesty International has been concerned at reports of torture and ill-treatment, including cases involving minors, in prisons, detention centres, and police stations’ (AMR 52/002/2005). Because this allegation is about country-wide torture and ill-treatment, it is coded in the ITT country-year (CY) data, not the SA data presented here.
3
Contrast that allegation with one targeted at Azerbaijan in 1995: ‘Rafiq Ismayilov, a barber from the village of Digah, was detained in December by police officers from Masalli district on suspicion of theft and taken to the Regional Police Department, where he later died. Unofficial sources, however, allege that he died as a result of injuries sustained when he was beaten by police officers.’ (AI Annual Report 1996). This allegation constitutes one Number of AI allegations per year, 1995–2005
Some bivariate relationships in the ITT SA data
There are over 15,000 events – allegations of torture and/or ill-treatment – included in the ITT SA Data. 5 Figure 1 shows that the number of AI allegations per year tended to decline from 1995 to 2005. Although AI made fewer total allegations of torture over time from 1995 to 2005, there are over 1,000 allegations of torture and ill-treatment in 2005. While that change represents an over 50% decline from 1995, it is still a very large number of alleged violations of the UN Convention. 6
What types of torture?
The ITT SA data distinguish among four ‘torture type’ allegations. Each of these is a binary variable indicating whether AI made an allegation of a particular type of abuse. Scarring Torture is coded when AI alleges torture that leaves marks on the human body (Conrad & Moore, 2011b: 11–12), and Stealth Torture or ‘clean’ torture is coded for allegations that do not leave marks on the body (Rejali, 2007). Unstated Torture distinguishes allegations of torture in which AI documents that torture occurred, but does not provide information regarding the type of torture alleged. The CAT not only prohibits torture, but also proscribes states from engaging in cruel, inhuman, or degrading treatment or punishment. Ill-Treatment is coded when AI alleges such behavior (Conrad & Moore, 2011b: 11–12). AI made roughly 8,000 allegations of scarring torture, over 3,000 allegations of stealth torture, 6,000 allegations of unstated torture, and over 6,000 allegations of ill-treatment between 1995 and 2005. 7 The importance of the distinction between allegations and true rights violations raises an interesting question: Does AI allege scarring torture more frequently because it is the most utilized type of torture? Or are allegations of scarring torture more prevalent because it is the easiest type of torture to monitor? We do not yet know, but in the following section we discuss potential avenues for distinguishing between these possibilities.

Number of AI allegations by region and torture type, 1995–2005
In the 11-year period from 1995 to 2005, Amnesty International issued over 2,000 allegations of torture and ill-treatment in each region of the world. We divided the world into the following geographic regions: Asia, Eastern Europe and the former Soviet Republics, Latin America, the Middle East and North Africa, sub-Saharan Africa, and Western Europe (including Australia, Canada, Japan, New Zealand, and the USA). 8 Although AI is likely strategic in its decisions to publicize allegations, every region of the world is well-represented with regard to allegations of torture and ill-treatment from 1995 to 2005. Figure 2 displays the distribution of torture types by geographic region. All regions except Latin America and Western European were the subject of over 2,000 allegations of any torture, with the other two regions being named and shamed by AI more than 1,500 times each. With regard to ill-treatment, AI issued between 1,000 and 1,500 allegations for all regions except the Middle East and North Africa, which was the object of more than 500 allegations.
Figure 3 reports the percentage of each type of torture allegation occurring in the presence of several institutional variables: judicial power (Henisz, 2000); legislative checks (Henisz, 2000); freedom of speech (Cingranelli & Richards, 2010a); and competitive Prevalence of AI allegations by institution and torture type, 1995–2005

Prevalence of AI allegations by institution and agency of control, 1995–2005
Which government agencies?
The SA data also include information on the government agency of control (AoC) alleged to be responsible for a given abuse. AI called out police for torture and ill-treatment considerably more frequently than any other AoC. AI published roughly 2,800 allegations of prison personnel violating the CAT; immigration detention centers, intelligence agencies, and paramilitary groups were each accused of fewer than 500 allegations.
Figure 4 reports the percentage of alleged responsibility by different AoCs for torture in the presence of the institutional variables described above. The figure shows the percentage of allegations of torture by police, prison officials, the military, immigration detention, intelligence, or unstated AoCs in states where there is a powerful judiciary, an elected legislature, free speech protections, or competitive elections. Approximately 50 to 65% of all allegations against police and 75 to 80% of allegations against immigration detention officials occur in countries where there is a powerful judiciary. Further, Figure 4 indicates that there is little correlation between competitive elections and the number of allegations made against prison officials and the military. Free speech protections are correlated with relatively fewer allegations against all agencies. Perhaps the boomerang model of NGO influence explains the relatively higher number of allegations made by AI against AoCs in countries with limited or no free speech protections (Keck & Sikkink, 1998). In countries where channels between domestic actors and the government are limited, as is the case in countries lacking free speech protections, transnational networks, particularly NGOs such as AI, can bypass the state and amplify the demands of domestic groups.
The figure also shows that most allegations against immigration detention officials occur in countries where these domestic institutions are present. About 75 to 80% of all immigration detention allegations occur in countries with a powerful judiciary, and around half of all immigration detention allegations occur in countries with an elected legislature and free speech protections. There are two plausible explanations for this pattern. First, countries with a powerful judiciary and freedom of speech protections tend to have greater levels of macroeconomic productivity and higher average wages, making them more likely to not only have large immigrant populations (thus generating political demand for immigration restrictions), but also the state capacity to incarcerate. Second, AI tends to prioritize the publication of allegations of more grave violations relative to those of less grave violations. 10 To the extent that political prisoners are more likely to be abused in states that lack powerful courts and/or a free press, AI may be less likely to report allegations of abuse that occur in immigration and detention centers.
Statistical models of state practice with ITT
The ITT Project provides data on the population of AI allegations of torture. As such, scholars interested in drawing inferences about the determinants of AI allegations – or naming and shaming – can use the SA data to do just that. But scholars who wish to use SA data to study the actual rights behavior of states (e.g. the extent to which the state complies with the CAT) must account for the strategic process by which AI generates allegations. Existing quantitative data on allegations of human rights violations are frequently employed as measures of violations of international and domestic law. As Bollen (1986), Spirer (1990), Clark (2001: 57), Cingranelli & Richards (2001: 230–231), and Goodman & Jinks (2003) point out, the implicit assumption that such data measure actual human rights practices is rather strong. In what follows, we elaborate on the distinction between allegations and violations and explain that researchers can utilize data on allegations to draw inferences about state violations of human rights by controlling for variables they believe will impact the likelihood that AI makes an allegation (i.e. learns about and then reports it).
It is standard practice in the quantitative study of human rights abuse to use content analytic data from AI and/or US State Department annual reports as a measure of states’ performance with respect to their obligations under international treaties. Although allegations in AI documents are credible, there has been little discussion in the literature about whether the resulting data are representative of the actual level of state torture in a given country-year. Because AI is unlikely to report allegations with equal probability across countries, allegations are not an unbiased undercount of state violations of human rights. States (and their agents) face incentives to hide human rights abuse, and INGOs like AI face strategic and budgetary incentives that influence the extent to which it makes allegations of human rights violations (Hill, Moore & Mukherjee, 2013). Thus, researchers drawing inferences about human rights violations – rather than drawing inferences about AI allegations or naming and shaming – are typically interested in a latent concept measured using (1) a biased observable indicator produced by (2) a strategic actor.
In order to draw inferences about actual violations of the CAT, we show that a control variable approach can be successful -- with a caveat. Researchers need to include as controls a set of variables that model the process by which AI generates allegations. The caveat is that some variables that will influence the likelihood that AI publishes an allegation will also impact the government’s (lack of) respect for the CAT. Each variable that the researcher suspects will contribute to both the likelihood AI learns of, and reports, an allegation and the state’s (lack of) respect can be included only once in the regression. As a result, the coefficient for that variable will be a combination of the two different impacts (upon both AI allegations and the state’s performance), and there is no way to decompose the estimate into its two parts from that regression. There are more complex statistical models that permit one to decompose these effects, but they are not readily available in software packages such as Stata. 11 We illustrate this in the following empirical application, and in an Online Appendix demonstrate why that is so. We also provide a discussion in the Appendix about how researchers can directly model both AI’s allegation production and states’ (lack of) respect for the CAT. 12
An example: Complaints, investigations, and adjudications
The ITT SA dataset considerably increases the number of research questions about government torture allegations and violations that scholars can test quantitatively. Unlike previous data on state torture, both the ITT SA and CY data include information about the victims alleged to have been abused and the agents alleged to be responsible for a given abuse. Such data allow researchers to investigate questions like: Does international treaty commitment affect torture by police officers and military officials differently? Does judicial effectiveness have the same (decreasing) effect on government torture by prison guards and immigration officers? Further, and we focus our empirical illustration here, the SA data include information about what happens after a torture allegation. In what follows, we illustrate a potential use of the ITT SA data by offering a preliminary inquiry into the question: What impact do domestic institutions have upon formal complaints, investigations, and adjudications of torture allegations?
Data and empirics
Effect of covariates on complaints, investigations, and international and domestic trials
Incidence response rates reported, unadjusted standard errors in parentheses; † p < 0.10; *p < 0.05; **p < 0.01 (two-tailed).
To use SA data to draw inferences about the conditions under which states violate the CAT, we must include covariates that model the likelihood that AI reports an investigation, adjudication, etc. The extent to which AI is able to generate allegations of human rights violations and report about their investigation and adjudication is dependent on its ability to work within a given country. Although AI maintains local offices in many countries, some governments prevent NGOs from operating within their borders. In these cases, it is more difficult for AI to have access to victims, as well as local advocates, and thus more difficult for them to make allegations and report on investigations and adjudications – even when violations occur. To account for such biases in the production of allegations and reports of investigations and adjudications, we include a variable from the ITT country-year (CY) data (Conrad & Moore, 2011a). 16 Restricted Access is a binary measure that captures whether or not AI published a statement that it, or another INGO, had difficulty gaining access to detainees during a given country-year (Conrad & Moore, 2011a).
Empirical results
Table I reports the results from four negative binomial regression models. These results are intended to be suggestive and to encourage future work on these topics. Note that we have not conducted a battery of robustness checks, we have not sought to model any potentially conditional relationships, 17 and we do not have a strong theoretical case for the model specification.
To provide a sense of the substantive impact of the variables, rather than report coefficient estimates and their standard errors, we report incidence response ratios (IRRs) in Table I. IRRs are the exponentiated value of the coefficient and can be interpreted as the expected change in the number of torture events, given that AI observes one event. An IRR value of 1 indicates no change, and a value less than 1 indicates a decrease by a factor of IRR in the expected count. An IRR value greater than 1 represents an expected increase of 100*(IRR)–100 percent.
To begin, note that restricted access, the variable that we include as a control for AI’s propensity to report, is not only highly statistically significant in all four regressions, but also has an estimated IRR of at least 2 in each model (100%+ increase). AI’s reports of restricted access are associated with relatively higher numbers of AI allegations of complaints, investigations, and trials. 18 This result highlights the importance of carefully modeling the data generating process by which AI produces allegations when using the ITT SA data to draw inferences about state behavior. 19
Turning to the institutional variables, elections are positively related to the number of investigations and trials, having a particularly large impact upon international trials (over 500% increase compared to countries that do not hold elections). The independence and effectiveness (i.e. power) of courts increases the number of investigations countries conduct, but is not associated with greater numbers of complaints and domestic trials. Veto, the extent to which the legislature can check the executive, is positively associated with the number of complaints brought against the state, the number of investigations it conducts, and the number of domestic trials that occur. Given these relationships it would be interesting to explore whether executives misjudge the extent to which legislatures (1) might generate information about violations, thereby spurring complaints, (2) pressure the bureaucracy to investigate (or do so on their own), and (3) signal to rights advocates that domestic courts will be receptive to cases brought against the state. Finally, greater respect for freedom of speech is positively associated with both larger number of investigations and international trials. That freedom of speech has a positive impact upon investigations is unsurprising, but it is interesting that it does not similarly impact complaints against the state. Further, international trials are more likely as respect for freedom of speech rises, but domestic trials are unaffected.
Conclusion
Between 1995 and 2005 Amnesty International made public more than 15,000 allegations against states for violating the United Nations Convention Against Torture (CAT). Those allegations represent a jarring, though uncountable, number of human beings who, while under state control, were subjected to ill-treatment, stealth torture, and/or scarring torture. Yet AI allegations are an undercount of actual violations: because both perpetrators and states generally have an incentive to hide violations of the CAT, it is not possible to know how many violations actually occurred. As a consequence, the ITT project departs from previous practice, and rather than use AI allegations as a direct measure of state violations, uses them instead as a measure of AI activity. Nevertheless, we argue that it is possible to use the ITT data to draw inferences about state compliance by controlling for the factors that affect the generation of AI allegations.
We report a number of intriguing patterns that we hope stimulate interest in the ITT SA data. For example, Figure 1 records a near monotonic decline in the annual number of allegations made by AI between 1995 and 2005. The two years that deviate from the pattern of decline are 2000 and 2001, not 2002 and 2003 as one might anticipate in the wake of states’ responses to Al Qaeda’s attacks on 11 September 2001 and the Bush administration’s roll-out of its enhanced interrogation and extraordinary rendition programs. That temporal decline is intriguing and warrants investigation. It might represent a decline in AI’s capacity to observe abuse, but might also indicate a variety of alternative possibilities including the justice cascade (Meernik, Nichols & King, 2010; Sikkink, 2011). Similarly, we show interesting variation in allegations across regions of the world, and Figure 3 indicates that freedom of speech is associated with types of alleged torture. We learn that police are the agency of control most likely called out by AI, and that is true across all regions of the world. We hope that the ITT SA data will encourage researchers to investigate these and other patterns.
We pursued the ITT project to permit us to examine research questions that arose out of the research reported in Conrad & Moore (2010b). In particular, we are not satisfied with that study’s inability to distinguish national security torture from criminal and social control torture (Rejali, 2007). The ITT SA data make it possible for us to conduct work that makes a much more satisfying distinction, and in so doing will reveal new and interesting questions that otherwise would have likely gone unnoticed. We have plans to conduct a number of inquiries that will explore the impact of liberal democratic institutions on states’ compliance with the CAT. Those studies but scratch the surface of what might be done with the ITT SA data, and we are eager to learn what others are able to illuminate.
Footnotes
Replication data
Funding
The ITT Data Collection Project has received support from the US National Science Foundation (Grants #0921397 and #1123666), the Department of Political Science at Florida State University, the School of Social Sciences, Humanities, and Arts at the University of California, Merced, and the Department of Political Science and Public Administration at the University of North Carolina at Charlotte.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
