Algorithmic risk governance: Big data analytics,race and information activism in criminal justice debates

Abstract

Meanings of risk in criminal justice assessment continue to evolve, making it critical to understand how particular compositions of risk are mediated, resisted and re-configured by experts and practitioners. Criminal justice organizations are working with computer scientists, software engineers and private companies that are skilled in big data analytics to produce new ways of thinking about and managing risk. Little is known, however, about how criminal justice systems, social justice organizations and individuals are shaping, challenging and redefining conventional actuarial risk episteme(s) through the use of big data technologies. The use of such analytics is shifting organizational risk practices, challenging social science methods of assessing risk, producing new knowledge about risk and consequently new forms of algorithmic governance. This article explores how big data reconfigure risk by producing a new form of algorithmic risk—a form of risk which is posited as different from the social science (psychologically) informed risk techniques already in use in many justice sectors. It also shows that new experts are entering the risk game, including technologists who make data public and accessible to a range of stakeholders. Finally, it demonstrates that big data analytics can be used to produce forms of usable knowledge that constitute types of ‘information activism’. This form of activism produces alternative risk narratives, which are focused on ‘criminogenic structures’ or ‘criminogenic policy’.

Keywords

Actuarial risk assessment big data predictive police algorithms race sentencing

Introduction

Some argue that we are now living in an ‘algorithmic age where mathematics and computer science are coming together in powerful new ways’ that influence individual behaviour and governance (Danaher et al., 2017: 1). Scholars in various fields are examining how big data is being assembled and used. For example, international security agencies access and sort through extensive communications metadata to define, identify and neutralize national security risks, including terrorism (Lyon, 2014). Governments also routinely merge financial transactional data with various forms of client case file data to detect and respond to fraud (Ruppert, 2012), and health data are mined and tracked to predict and monitor disease and prevent outbreaks (Thomas, 2014). Policing agencies are becoming increasingly sophisticated in their uses of biodata, facial recognition software, traffic cameras, body/car cameras, licence plate readers and Global Positionning System (GPS) locators, all of which produce digital data that can be combined to identify and track individuals in the pursuit of safety and security (Brayne, 2017; Gates, 2011; Gitlin, 2012; Kitchin, 2014, Lupton, 2015; Smith and O’Malley, 2017).

Several private companies are marketing and distributing data-driven technologies to criminal justice institutions (Brayne, 2017). One of the best known is PredPol, a predictive policing software: its software developers work with local police agencies to map criminal events in real-time, allowing police to efficiently deploy resources (PredPol, 2017).¹ Florida uses machine learning algorithms to set bail amounts (Eckhouse, 2017).² Some US prisons are also adopting new technologies: for example, some provide prisoners with tablets pre-loaded with approved applications to manage their mail and music (Tynan, 2016). While these technological enhancements in prisons can be seen as progressive, they also reflect a trend towards digital monitoring and decision making. These tablets allow for the collection, collation, manipulation and storage of vast amounts of data about individual prisoners, their families, including forms of data that are unrelated to prison (e.g. Facebook, credit histories, internet activity, health records, neighbourhood information). Digital storage and mining of this type of data is quite appealing to penal officials and others.

Although big data is often acclaimed for its ability to enrich our understanding of particular phenomena (Lazer et al., 2009), it is also ‘a political process involving questions of power, transparency and surveillance’ (Tufekci, 2014: 1). It is not as ‘objective, neutral, or complete as they are portrayed in mainstream media representations’ (Lupton, 2015: 101). Instead, big data, its incumbent analytics and the knowledges they produce are socio-political and cultural artefacts that are transforming how we live, work and think about social problems (Lupton, 2015). Its recent uptake by criminal justice actors in relation to the production of risk requires analysis. Inasmuch as criminal justice organizations are now embracing ‘big data’ analytics (Brayne, 2017), they are engaged in various new forms of ‘algorithmic governance’ (Danaher et al., 2017). Big data has invigorated the criminal justice system’s (CJS) focus on ‘smart’, ‘data-driven’ and ‘evidence-based’ solutions, allowing individuals and private companies with technical expertise to offer various big data informed options for criminal justice actors and organizations. CJS representatives are working with computer scientists, software engineers and private companies that are skilled in data analytics to produce new ways of thinking about and managing risk. These collaborations are expected to enhance the efficiency of prediction systems, while at the same time designing them in a way that will not operate in a discriminatory manner. However, little is known about how CJS, social justice organizations and individuals are producing, challenging or redefining conventional risk episteme(s) through the use of big data analytics (Kitchin, 2017). As such, the introduction of big data technologies warrants analysis both because the concept of risk is central to our legal and criminal justice culture, and because it represents an epistemic deviation from risk assessments that are grounded in psychological disciplines.

The governance of crime and understandings of risk continues to be fluid and shifting (Maurutto and Hannah-Moffat, 2006). For several decades, CJS have actively embraced a variety of risk logics and technologies with the goal of enhancing efficiency, accountability and equity. Systematic and psychologically informed actuarially based risk assessments are widely accepted as being rooted in an evidence base, and by extension are considered to be more scientifically credible than professional discretion and clinical judgement in assessing risk (Skeem and Lowenkamp, 2016). Criminologists and sociologists have studied how risk logics have become embedded in criminal justice planning and practice, and how psychologically informed actuarial risk analyses are used to predict and prevent crime, enhance security, sentence offenders and manage penal populations (Feeley and Simon, 1992; Hannah-Moffat, 2013; Harcourt, 2007; Kemshall, 2003; O’Malley, 1992, 2004, 2010). This research has helped clarify how knowledge and understandings of crime are framed and addressed though risk logics. Scholars have also argued that risk informed practices of governing crime have led to the over-representation and over-policing of particular segments of the population, mainly racialized individuals (Chouldechova, 2017). Researchers have also turned their attention to the ways in which newer big data technologies aid in predicting risk levels, managing the ‘crime problem’ and solving these inequalities (Brayne, 2017; Eubanks, 2018; Ferguson, 2017; Smith and O’Malley, 2017).

This shift to big data analytics represents a notable departure from other algorithmically influenced risk technologies. Despite the fact that social scientists have long used large official data sets to map crime patterns and track offenders, the advent of big data analytics is considered to be a game changer for criminal justice governance because of its phenomenal speed, breadth and depth capacities. It also entails new forms of data from an assemblage of sources, including but not limited to smart phones, digital cameras, GPS tracking devices, internet searches, consumer databases, social media, open data sources and smart software (Lupton, 2015; Smith and O’Malley, 2017). The term ‘big data’ then generally refers to a wide array of digitally stored information about individuals, organizations, companies and events. The term can also be used to describe the techniques used to efficiently assemble and disassemble this information for a variety of commercial and non-commercial purposes. Still, we have little understanding of where ‘big data’ actually comes from, how it is used or how it lends authority and justifies decisions. As a result, ‘big data has the effect of making-up data and, as such, is powerful in framing our understanding of those data and the possibilities that they afford’ (Beer, 2016: 1). As a socio-technical phenomenon, it can trigger apprehension and hope, depending on how, and by whom, it is used to govern. Thus, there is a need to better understand how algorithmic governance is designed and used (Danaher et al., 2017), how it translates policy problems in to computer code (Kitchin, 2017) and how it fundamentally alters risk prediction logics and practices. This article is an effort to respond to these needs.

This article explores how big data is altering organizational risk practices, challenging social science methods of assessing risk and affecting knowledge about risk. It also shows that big data has initiated evidence-based public critiques about ‘data harms’ and empowered critics in ways that can help reshape the CJS. First, I argue that big data reconfigures risk by producing a form of algorithmic risk, which is different from the social science, dominantly psychologically informed actuarial risk techniques already in use in many justice sectors. Second, I show that new players or experts are entering the risk game: technologists (e.g. concerned citizens, computer scientists, software engineers and hackers—usually not trained in social science) who make data public and accessible to a range of stakeholders. Third, I argue that big data analytics can be used to produce forms of usable knowledge that constitute types of ‘networked social action’ and ‘information activism’ (Halupka, 2016; Zavos, 2015). Big data can be used to support powerful racialized social justice critiques of traditional police, court and penal practices (see, for example, Hetey et al., 2016; Mapping police violence, n.d.; Police Union Contract Project, n.d.; Police Use of Force Project, n.d.; ProPublica, 2017). These data-driven critiques can also be used to resist current definitions of ‘risk’ (Angwin et al., 2016).³ As a consequence, public knowledge about risk is shifting and expanding: big data technologies are producing an alternative risk narrative, which focuses critical attention on ‘criminogenic structures’ or ‘criminogenic policy’.

Big data

Chan and Bennett Moses (2016) argue that big data is flexible and subject to different interpretations by those seeking to use it for a variety of scholarly, commercial or government purposes. Data mining has endless possibilities in terms of potential sources and applications in the governance of crime. Vast amounts of data are routinely collected and stored. Most people are indifferent or unaware of their data traces and willingly use products or services-including fitness trackers, GPS, banking services, online shopping, smartphones, Facebook, Instagram, Snapchat and Twitter— that feed a digital economy and contribute to vast data banks used to create algorithmic identities. Big data analytics are increasingly being used as a new technique of security (Valverde, 2014) with little oversight; it is a critical part of the evidence used to justify many intersecting forms of social and legal governance. Big data analytics are also shifting how we think about research (Chan and Bennett Moses, 2016), as well as influencing criminological practices, as evidenced by the rise of ‘computational criminology’ (Berk, 2008; Williams and Burnap, 2016).

The ‘avalanche of numbers’ (Hacking, 1990: 5) characterizing big data is not particularly new, although the types of data and related analytics have changed (Beer, 2016). Social data have long been collected to govern people and populations, but the technical sophistication of analytics, speed, volume and variety of data available is unprecedented. Before going further, it is important to define the term ‘big data’ (Lohr, 2013). Most technical definitions of big data emphasize the three Vs: volume (the amount of data); velocity (the speed at which data are added and processed); and variety (data may come from multiple sources using different formats and structures) (Chan and Bennett Moses, 2016). Boyd and Crawford (2012: 663) suggest that big data is a cultural, technological and scholarly phenomenon involving: technologies that maximize computational power and algorithmic accuracy of large amounts of data; analyses that use large data sets to make economic, social, technical and legal claims; and mythology related to the popular belief that large data sets offer insights that were previously impossible, with the ‘aura of truth, objectivity, and accuracy’. Here, I use the term ‘big data’ to refer to a concept for ‘understanding how visions of contemporary data are incorporated into the imagining of life, the production of truths and the liminal work that contains the social world’ (Beer, 2016: 5). The term ‘big data techniques’ refers to the processes by which enormous amounts and often disparate pieces of information are populated into data sets, predictive analytics and algorithms. These data are disconnected from one another and are acquired from multiple and diverse sources; using big data techniques, these data are efficiently assembled into a useable form and drawn upon to inform policy questions and institutional decisions.

Nuances of actuarial risk

The definition of ‘risk’ is highly contingent upon the temporal, socio-political and institutional context; it is a fluid construct with multiple meanings and differential effects (Hannah-Moffat, 2004b; O’Malley, 2004). Risk knowledges are fluid, flexible and capable of supporting a range of culturally contingent penal strategies. The meanings of risk in criminal justice assessment evolve and it is important to understand how particular compositions of risk are mediated, resisted and re-configured by experts and practitioners. For instance, Hannah-Moffat (2004a) has shown how understandings of risk shifted from static to dynamic categorizations, how this development aligned our understanding of risk with need and eventually how responsiveness to treatment—as a consequence of actuarial knowledge about criminogenic risk—became elevated above professional judgement as a reliable method of risk assessment. However, this reality is often forgotten with the result that the term ‘risk’ is often used imprecisely to describe and rationalize a range of governmental actions. As a result, it is important to be precise and distinguish between different forms of risk, in this case psychological and big data informed algorithmic risk.

To better understand the heterogeneity of actuarial risk and its recent evolution, the following outlines how social science (hereafter psychologically) informed risk differs from big data informed risk. The term psychologically informed risk is meant to generically capture the wide range of actuarial technologies used to assess static and/or dynamic risk and criminogenic needs. The emergence of such technologies displaced non-actuarial clinical assessments of risk, which were discredited on the basis that clinical judgement was subjective, unempirical and had poor predictive accuracy. Examples of psychologically informed actuarial tools include the Salient Factor Score and the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS, used in the United States), the Risk of Reconviction (used in the United Kingdom) and the Level of Service Inventory (LSI-R, used internationally).⁴

Although both psychologically based and big data forms of risk construction are actuarial and use statistical calculations to forecast and share similar limitations, they are epistemologically and methodologically distinct. Psychology informed risk and big data informed risk differ in their design, method and use. Psychology informed risk technologies are often designed to predict recidivism, whereas big data informed risk technologies can be applied in this manner. The technologies also differ in the size of the population from which their results are based: psychology informed risk tools are based on statistical analyses of a sample of a population, whereas big data informed risk algorithms are based on massive and potentially infinite population data. Additionally, psychology informed risk tools are based on data that have been collected and vetted as per discipline-specific research methods. Algorithmic risk, on the other hand, is based on data collected from multiple sources that is assembled and analysed with little attention to social scientific methodological standards, and with limited substantive knowledge of the subject matter (cf. Berk, 2012). In terms of how data are used, psychology informed risk technologies collect and use data for the purpose of the research at hand, and employ methods and approaches that are conceptually grounded in social science disciplines. In contrast, big data informed risk use data that are assembled and disassembled for a range of purposes and is developed without being linked to a specific discipline. Psychology informed risk technologies are designed for a particular purpose and are static, while big data risk technologies use data that are value-laden, variable and can be ‘black-boxed’ (i.e. new data can be inputted/processed by existing algorithms). Finally, psychology informed risk technologies are designed to be empirically defensible, reliable, valid and clean, while big data risk technologies contain methodological uncertainties.

Psychology informed risk assessments used by the criminal justice system are ‘evidence-based’ instruments that rely on statistical predictions of risk of a particular outcome, normally recidivism (e.g. COMPASS or LSI-R). These instruments are predicated on a discipline-based causal knowledge of criminality, and are designed in such a way as to systematically produce and organize a diverse range of information. An offender’s quantitative risk score is determined by measuring their individual factors (e.g. history of substance abuse, age at first offence). Such factors are measured by the tool because they are statistically linked to the risk of recidivism in correctional populations. The tools are typically based on research involving large aggregate population samples (historically, White male adults). The data sets are manually and deliberately constructed by social science researchers, usually adhere to social scientific methodological guidelines and are situated in theoretical frameworks (often dominated by psychology). For several decades, practitioner-driven classification, assessment and management research has focused on risk and its identification and management. Within the penal field, ‘risk/need assessment’ practices have been informed by a technical, persuasive and a now deeply embedded practitioner-driven research agenda (Hannah-Moffat, 2013).

Like psychology informed risk tools, big data technologies rely on large population data sets to produce risk-related information about individuals or events. However, the amount, source, and scale of the information available in a big data set is vastly larger than the data sets used to produce most conventional risk assessments. Big data mobilizes and exploits computational power and algorithmic accuracy to collect, analyse, link and compare large data sets as well as to ‘identify patterns in order to make economic, social, technical and legal claims’ (Chan and Bennett Moses, 2016: 24). It makes predictions depending on how data are assembled and for what purpose. Unlike psychology informed risk technologies, big data technologies are not implicitly designed to predict the risk of recidivism. Despite this, emerging algorithms designed for sentencing and bail are now being used to predict risk, and to standardize decision making in much the same way as risk assessments. With the advent of artificial intelligence and machine learning, it is believed that big data assessments can function with real-time data and thus have a higher degree of objective predictive accuracy than the psychology informed tools.

Unlike psychology informed risk assessment instruments, however, big data technologies are not constrained by preconceived theoretical or methodological disciplinary norms or necessarily administered and interpreted by certified assessors. Nonetheless, implicit and disconnected assumptions about society, individuals, social institutions and scientific practice are embedded within big data analytics. Some scholars have argued that big data analytics presents a significant challenge to long-established social scientific research methodologies, and that social scientists could lose their place as established experts on ‘the social’ if they do not keep pace with technology (Mayer-Schönberger and Cukier, 2013).

Furthermore, big data is often characterized as value-laden and variable because the meaning of data collected can be constantly fluctuating in relation to the context in which they are generated (McNulty, 2014). It is also often considered to be full of error and uncertainty, or messy and noisy (Marr, 2014). Nonetheless, big data technologies are capable of quickly collating and infinitely expanding the amount and range of data about an individual or a targeted population. All actuarial technologies are susceptible to similar critiques. Intellectual property rights make big data algorithms unavailable for public scrutiny, much in the same way that the underlying algorithms used by risk tools are neither accessible nor public. As with many forms of psychology informed risk assessments, big data can act as a ‘black box’. Tools are rarely transparent and their internal mechanics are not typically shared by companies, agencies or governments that own and develop the algorithms. Additionally, big data informed algorithms are ‘ontogenetic and performative’, which means they are fluid, opaque and often ‘modified and adapted in response to user interaction; they change in uncontrollable and unpredictable ways’ (Danaher et al., 2017: 5) as they learn or acquire new data points. Emergent forms of big data risk algorithms are often devoid of social, political and ethical consciousness surrounding the substantive CJS context as the experts developing these technologies are trained as data scientists, not as social scientists devoted to researching the criminal justice system (discussed below). Notwithstanding the complexity and lack of transparency of these programs, the presumption is that these technological developments can enhance safety, efficiently analyse data and limit error. Thus, big data technologies, like risk instruments, simultaneously appear neutral and authoritative, which can make them powerful tools of governance (Beer, 2016; Hacking, 1990).

As with risk, linking big data discourses and technologies with the legal/criminal justice complex results in big data being configured to meet the needs and logics of those systems. Emergent rationalities and techniques of algorithmic risk governance continue to have an emphasis on probability and use statistical techniques to document patterns, generate evidence and guide policy. Currently, the private sector, advocacy organizations and CJS agencies are exploring how big data informed algorithms can improve the prediction of recidivism, refine sentencing, improve police practices and reveal the difficulties associated with various forms of actuarial data. Big data analytics is reconfiguring how we think and talk about risk. The next section will explore how knowledge about crime and criminal processes is being created by different experts, and how expanded access to this knowledge allows for different exercises of power.

Munging and wrangling: Producing new criminological knowledge

Data scientists organize massive amounts of unstructured data, producing large data sets that can be used for various analytic purposes. Discussions of big data often revolve around the exceptional volume, speed, capacities and efficiencies of computer-assisted technologies. Less attention is devoted to the more mundane task of data ‘munging’ (also known as ‘wrangling’), which loosely refers to the processes involved in producing data sets: cleaning, matching, aggregating and converting data from its raw format into a compatible and useable form (data ‘scraping’). These mundane techniques of data scraping, munging and wrangling are used to assemble and reassemble knowledge about algorithmic risk.

The production of useable data and the training of statistical models are time-consuming and complex processes that require judgement, interpretation and generally some knowledge about what questions the data will be used to answer. Some software developers who produce data sets about race and crime are making their data sets searchable and publicly available; some also show the processes they use for data munging/wrangling (O’Neil and Schutt, 2013). Others provide open source code and detail the steps they used to build statistical models. However, specific criminal justice jurisdictions that produce risk algorithms usually do not show how data are processed. Independent researchers or commercial risk tool developers can protect their data through copyright and other more nefarious techniques. Additionally, many computer and data scientists, as well as software engineers, are unfamiliar with the normative politics and nuances of criminal justice data—an unfamiliarity that can lead to conceptual and methodological problems.

Big data, and the propagation of useable data and open source code, provide access to data that can be used to reveal patterns of systemic and overt discrimination. These patterns have been easy to conceptualize and demonstrate qualitatively on a case-by-case basis, but extremely difficult to quantify. In the past, access and analysis of data measuring various forms of vulnerability, racism or discrimination has been hindered by geographical boundaries, institutional policies and organizational priorities, and blocked access to official data. In the past, official data were also unable to capture structural or CJS information (e.g. postal codes, places of arrest, incarceration) or to coordinate different data systems (e.g. court/police/correctional data systems, health information, employment information, gun registries, etc.). In effect these data were inaccessible to researchers interested in understanding how these systems and practices affect vulnerable individuals and populations, making it difficult to advocate for appropriate systemic changes.

Big data analytics can make significant contributions in terms of making data public and providing accessible information about criminal justice practices that may have been previously unknown. While making these data public raises concerns about privacy, ethics and unintended harm, as well as a host of other issues within commercial and international surveillance spheres, others have turned their attention to ethically evaluating how these data are used and mobilized. Researchers are now engaged in debates about the ethics of data usage, access and storage, commercial uses and the unknown implications of big data analytics (Metcalf and Crawford, 2016). For example, some scholars are using big data analytics to identify data harms such as racialized discrepancies between an individual’s race and their perceived risk of recidivism. Additionally, some data analysts are focusing on questions that previously preoccupied socio-legal scholars and punishment and society scholars. For instance, they are asking how ‘algorithmically generated scores’-generated by risk assessments that are uniform psychological tools or those developed for a specific jurisdiction—may produce biased outcomes.

Scholars at the New York Data and Society Research Institute⁵ were concerned about the increased reliance of risk assessments (machine-generated scores) in light of pending legislation in the United States Congress. If passed, this legislation (Bill S.2123, the Sentencing Reform and Corrections Act of 2015) would have required the use of risk assessments (Grassley, 2015). The scholars⁶ obtained the COMPAS risk scores assigned to more than 7000 people arrested from 2013–2014 in Broward County, Florida, and examined how many had been charged with new crimes over the next two years (the same benchmark used by the creators of the algorithm). They found that COMPAS scores were remarkably unreliable in forecasting violent crime. When considering all types of crime (e.g. misdemeanours such as driving with an expired licence), the algorithm was slightly more accurate than a coin toss. They also identified significant racial disparities. For example, while the number of prediction errors was similar among Black and White defendants, the errors themselves were of very different types. COMPAS scores tended to falsely flag Black defendants as future criminals, incorrectly labelling them this way at almost twice the rate as White defendants, whereas White defendants were more likely to be mislabelled as low risk than Black defendants (Angwin et al., 2016). The authors investigated whether this type of error was the product of differences between criminal records, and found that these disparities persisted despite their having controlled for criminal history (Angwin et al., 2016).

The private company that developed the popular COMPAS instrument disputed their findings, stating, ‘Northpointe does not agree that the results of your analysis, or the claims being made based upon that analysis, are correct or that they accurately reflect the outcomes from the application of the model’ (as cited in Angwin et al., 2016: n.p.). Nonetheless, this example shows how risk assessment can be demystified. By exposing the inner workings of risk instruments, and revealing the tautological nature of risk scales, the research team disproved the claim that risk instruments provide an ‘unbiased’ risk of recidivism score. Interestingly, this research was not conducted by risk scholars or disseminated in an academic forum. Instead, the findings were published on a public website called ProPublica: Journalism in the Public Interest (ProPublica, n.d.). This kind of exposure shifts the logic of responsibilization and risk management. In this instance, the institution is responsibilized for preventing data harms; the evidence supports the delegitimization of actuarial risk-based governance, which appears to produce and reproduce various forms of systemic racism. Because data mining technologies are reconfiguring risk and its management, it is important to document and conceptualize how the governance of risk is simultaneously evolving.

Public aspects of data analytics: Sharing and facilitating knowledge

Advocates of big data stress the importance of open access or open data, whereby data are proactively placed in the public domain without restrictions on their use. Various criminal justice stakeholders and advocacy organizations are embracing big data techniques because they allow them to shift and shape narratives by participating in ‘evidence-based’ dialogue. One example of this form of information activism emerged in response to a February 2016 Virginia Supreme Court decision. The Supreme Court ruled that court offices were not required to provide anyone with bulk aggregate information about lower court cases, despite the fact that all of this information is available on a public website and can be searched on a case-by-case basis. According to the ruling:

[The] Supreme Court does not have to release a bulk collection of lower court case information, even though the collection is maintained to provide access to the records through a public website […] the Supreme Court is not the custodian of the records and does not have authority to release them; instead, the records belong to the individual circuit court clerks who provide them to the Supreme Court.

(VirginiaCourtData.org, 2017)

The court website does not have an integrated interface or centralized database; individuals wanting to retrieve this type of data are required to use a menu to separately pick one of the 118 circuit courts that use the system.

However, the production of bulk data from these 118 circuit courts ispossible, and relatively simple. In 2014, at the request of a journalist from the Roanoke Times, volunteer software engineers created free, open-source software to get around the website’s roadblocks and allow anyone to perform a statewide name search. As of 2016, an independent website committed to open data had 2.2 million anonymized court cases available for bulk download (VirginiaCourtData.org, 2017). This kind of software had been adapted for use by journalists and lawyers to produce data sets related to various issues (e.g. low prosecution rates for sexual assault cases, hospitals hiding malpractice lawsuits and criminal defendants being denied their constitutional right to counsel). As these examples demonstrate, various big data analytics are enabling lawyers and data scientists to create tools that expedite research.

Searchable data sets of judicial rulings, precedents and legislative interpretations, as well as various witness or victim statements, court logs and insights are now being created and accessed by data scientists, advocates and the public. These new computational and analytic tools are enabling law firms to analyse documents in a fraction of the time and cost than was possible just a few years ago (Markoff, 2011). Data users often publish their analyses on websites that outline their methodologies, sharing open access data sets and open source code. These websites may also include brief articles with memorable facts and images that draw attention to broader systemic patterns and issues of public concern and interest, such as racism within the criminal justice system. For example, David Colarusso, formerly a lawyer and now a legal hacker and software engineer, posted a detailed account of how he used the VirginiaCourtData.org website to show how race affected sentencing. He wrote, ‘[f]or a black man in Virginia to get the same treatment as his Caucasian peer, he must earn an additional $90,000 a year’ (Colarusso, 2016: n.p.). Although he found that race only explained 6 per cent of the variance, he concluded:

What we see here is the aggregate effect of many interlocking parts. Reality is complex. Good people can find themselves unwitting cogs in the machinery of institutional racism, and a system doesn’t have to have racist intentions to behave in a racist way.

(Colarusso, 2016: n.p.)

His findings support those of risk scholars, who have demonstrated that many routine risk assessment instruments fail to adequately capture the complexities of gender, race and ethnicity. Instead, as with specialized gender-based tools, factors historically characterized as ‘needs’ are reframed as ‘risks’ (Hannah-Moffat, 2005, 2009, 2016a).

Overall, this type of big data analytics challenges current institutional forms of data management and its availability. Researchers are devising alternative forms of predictive analytics and moving data into the public domain, where they can be freely accessed and manipulated. Some researchers have argued that access to big data has altered how people view and engage with democratic processes and social institutions, giving rise to a new form of political participation: ‘information activism’ (Halupka, 2016). Similarly, the new trend of ‘clicktivism’ allows individuals to engage in small political actions such as reacting to a post on Facebook, posting on LinkedIn or tweeting or retweeting images of police violence (Halupka, 2016). This kind of ‘networked social activism’ has been observed in various social movements, such as Idle No More, Arab Spring and Occupy Wall Street (Castells, 2012), all of which relied on forms of ‘fast, autonomous, interactive and reprogrammable communication’ (Halupka, 2016: 1488).

We the Protesters is another networked social movement engaging in information activism. It is producing algorithmic risk narratives that critique racialized police practices, especially lethal police violence (Mapping police violence, n.d.; Police Union Contract Project, n.d.; Police Use of Force Project, n.d.). It uses big data technologies to challenge the logic of evidence-based policing by focusing instead on identifying police practices that are risky and those that reproduce the systemic barriers that become measured as individual risk factors(Hannah-Moffat, 2016a, 2016b). In this way, the We the Protesters movement is using big data analytics to reveal how particular institutional techniques of governing individuals and populations can be adapted and used by different actors and organizations to different ends. Together, these examples reveal how networked social movements and information activism are challenging hegemonic forms of psychologically based actuarial risk assessment, police practices and surveillance, in ways that affect policy.

Big data technologies and critically informed risk algorithms can act as catalysts for the development of responsive social policy changes and organizational alternatives (Bennett and Segerberg, 2012; Bennett et al., 2014). Their swift diffusion of ideas can also allow ‘citizens who have become uncoupled from their political authorities […] to take political change into their own hands, engaging in their own terms in their own ways’ (Halupka, 2016: 1491). The diffusion of ‘evidence-based’ risk knowledge into the hands of activists and the public at large allows for the production of counter-narratives that challenge the efficacy and neutrality of actuarial risk and expose various forms of systemic racism. This approach appears to be more effective than academic discourse about the harms and effects of actuarial risk techniques. By producing and publicly disseminating information about how CJS agencies assess risk, it is possible to expose and contest these forms of power, knowledge, related techniques and accompanying experts.

These examples also illustrate how risk crosses institutional contexts and is shaped by knowledge that may be independent of traditional social science and can be applied for various purposes. The rise of ‘algorithmic risk’, and the new knowledge brokers who are using these data to frame policy and institutional practices as ‘risky’, are now moving the focus away from individual criminals or specific criminal events, to particular governmental practices and policies that contribute to crime and thus recidivism. Put another way, big data is enabling new techniques of social critique that disrupt the traditional institutional reliance on ‘evidence’—it is reconfiguring power and knowledge assemblages.

Finally, the examples reveal public aggregation, consumption and dispersal of risk information. Big data analytics can influence public opinion, assist protesters and ultimately lead to change. The point here is not whether algorithms are accurate, neutral, reliable or valid, but that algorithmic risk is itself being used as ‘evidence’. The traditional CJS rarely considers the epistemic origins or wider jurisprudential impact of risk evidence. In contrast, big data analytics is shifting and popularizing access to various forms of criminal justice and legal data. This has forced the CJS to be more publicly accountable and responsive to value-laden empirical questions. In responding to critiques made possible by big data analytics, criminal justice organizations can reduce the ‘data harms’ resulting from its traditionally discriminatory practices.

Opening a debate about social structure, policy and jurisdictional risk

Big data technologies allow opportunities to critique and rethink what evidence-based policing, smart sentencing and risk informed decision making can mean. They can help situate risk as external to the penal subject, and to responsibilize different actors. Recently, a special issue of Psychology, Crime and Law (Gannon, 2016) was devoted to dynamic risk factors: it interrogated current assumptions underpinning the concept of dynamic risk. The omission of socio-structural risks in most dialogues of dynamic risk can overestimate the predictive power of assessments and obscure important structural features of recidivism. Additionally, the current focus on the individual when assessing dynamic risk ignores the social and jurisdictional practices and processes that are themselves criminogenic, dynamic or changeable. Moving the focus to these socio-structural factors can expand the idea of dynamic risk (and risk in general), as well as redistribute responsibility and allow for different kinds of interventions.

Thus, big data analytics have contributed to theoretical and practical dialogues on risk assessment. Specifically, this form of analysis has allowed for the possibility of analysing structural risk assessments to reveal how particular laws, institutional practices and social/welfare policies actually generate and produce risk. This kind of analysis explores how conditions that are beyond an individual’s control, but are still alterable (e.g. jurisdiction-specific policies, police, bail and parole practices), can increase the risk of recidivism and hamper efforts to desist from crime. Big data technologies can challenge and shift debates about ‘criminogenic risk’, thereby allowing for different lines of risk inquiry, encouraging the re-examination of existing risk instruments and complementing socio-structural analyses of dynamic risk.

A cautionary caveat

Big data and its various technologies are reframing criminal justice debates and producing different forms of advocacy and mobilization. Advocacy organizations and individuals are using predictive analytics to track social problems and propose alternative specific data informed approaches. However, this trend should also be met with caution and concern. Big data technologies can be used for high-tech profiling of people, events and places; they can violate constitutional protections, produce a false sense of security and be exploited for commercial gain. Data are valuable to criminal justice agencies and private companies, and scholars are demonstrating how seemingly innocuous data can be accessed and assembled in ways that compromise civil liberties, and feed into discriminatory policies and forms of surveillance. Technical and civil libertarian concerns about big data include: increased capacity for high-tech profiling; inability to ensure fairness in automated decision-making processes (e.g. by obscuring systemic discrimination/racism in complex algorithms); loss of the protection of privacy (e.g. erosion of the ‘right to be forgotten’); information about people being used for unintended purposes; inaccurate data; misrepresented populations; and commercial exploitation (boyd and Crawford, 2012; McDermott, 2017; Metcalf and Crawford, 2016). All technologies, regardless of how well intentioned, are vulnerable to error, misapplication and manipulation. What remains unclear is how big data technologies that purport to facilitate equality, open access, transparency and free information exchange may be used to further a range of commercial, regulatory and punitive agendas. There can and will be unintended effects, and more research will be needed to explore unresolved questions about privacy, data ownership, implementation capacities, fairness, ethics, applications of algorithms, values determination, transparency, constitutionality and interpretation. The hasty embrace of big data analytics and risk algorithms will lead to a host of jurisprudential, methodological, interpretive and data quality issues.

Conclusion

Although it is difficult at this time to ascertain with precision how or if big data informed risk algorithms will supplement or supplant alternative forms of psychologically informed risk assessments, it is important to understand the fluidity of risk and how various forms of algorithmic governance are intersecting with risk in the management of ‘crime’. In assessing how actuarially based risk techniques affect governance, O’Malley (1992: 268) has argued that ‘the direction of development, the form in which they are put into effect in specific policies, their scope vis-a-vis that of other technologies, and the nature of their social impact are all quite plastic’. He also noted that ‘one of the emergent outcomes of new social technologies is opposition, formed in important ways by the form and anticipated impact of the technology itself. Such opposition may never turn back the clock’ (1992: 269). It is likely that big data technologies will function similar to other forms of risk prediction, evaluation and prevention technologies, in that they will be fluid and used for diverse and sometimes contradictory ends—with both meaningful and problematic outcomes. Big data technologies will produce new ways of framing, predicting and managing risk that challenge, embrace and modify current forms of actuarial and non-actuarial risk prediction and management.

There is a need for criminological and socio-legal scholars to consider how this technology may affect governance, shift debates and produce new ways of thinking about expertise, crime and related social issues. Data-driven solutions may be developed and applied by diverse stakeholders. While data collection is advancing governmental security capacities, and is differentially rationalizing the continued and insidious governance of multiple marginalized populations, big data can also be used for other purposes. For example, individuals and groups are using big data to challenge governmental actions and reveal how risk and other forms of crime control and punishment have produced a variety of ‘data harms’, such as individual and systemic discrimination and the rationalized targeting of marginal and excluded groups.

Finally, the capacities of big data have empowered new actors and opened new spaces for identifying and contesting the problems of crime and responses to crime. These capacities have also made possible strong analyses that focus on the social, economic, and political antecedents to crime and the structural factors that shape regulation, policing and punishment. As boyd and Crawford (2012: 668) have astutely noted, big data analytics often enables ‘the practice of apophenia: seeing patterns where none actually exist, simply because enormous quantities of data can offer connections that radiate in all directions’. Big data allows us to merge data sets that were not previously amalgamated, and consequently opens new lines of inquiry. More research will be needed to explore the assemblages created by big data, their outcomes and any subsequent reshaping of the criminological and penal field, and of the social sciences more generally. Interpretation remains at the heart of the big data debate because data taken out of context, while capable of making powerful statements, can also lose meaning and value (boyd and Crawford, 2012).

Some scholars have suggested that big data analytics is a ‘game changer’ for criminology (Chan and Bennett Moses, 2016). Clearly, it complicates the traditional role of academics, who were once considered responsible for calling attention to how things are constituted. Some scholars have even claimed that reliance on big data analytics will render social science powerless (Mayer-Schönberger and Cukier, 2013). If so, and given the rapidity of big data tracking and analysis, academics may have a new role: to try to slow the speed at which data are collected and processed. Finally, there is a continued need for research that enhances the transparency of algorithms and data, and that carefully assesses the socio-political, legal and institutional contexts in which they are deployed.

Footnotes

Acknowledgements

The author would like to thank Linn Clark, Paula Maurutto, Kelly Struthers-Munford, Pat O’Malley and the anonymous reviewers for their careful readings and helpful advice.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Notes

Author biography

Kelly Hannah-Moffat is a professor of in the department of Criminology & Sociolegal Studies and Vice President Human Resources & Equity at the University of Toronto. She is the co-editor of the international journal Punishment and Society and has published several articles and books on risk, punishment, human rights and detention, parole, gender and diversity, specialized courts, and criminal justice decision-making.

References

Andrews

Bonta

(1998) Psychology of Criminal Conduct. Cincinnati, OH: Andersen Publishing.

Angwin

Larson

Mattu

, et al. (2016) Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica, 23 May. Available at: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

Beer

(2016) How should we do the history of big data? Big Data & Society 3: 205395171664613. https://doi.org/10.1177/2053951716646135.

Bennett

Segerberg

(2012) The logic of connective action. Information, Communication & Society 15(5): 739–768.

Bennett

Segerberg

Walker

(2014) Organization in the crowd: Peer production in large-scale networked protests. Information, Communication & Society 17: 232–260.

Berk

(2008) How you can tell if the simulations in computational criminology are any good. Journal of Experimental Criminology 4: 289–308.

Berk

(2012) Criminal Justice Forecasts of Risk: A Machine Learning Approach. New York: Springer, Verlag.

boyd

Crawford

(2012) Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society 15: 662–679.

Brayne

(2017) Big data surveillance: The case of policing. American Sociological Review 82: 977–1008.

10.

Castells

(2012) Networks of Outrage and Hope: Social Movements in the Internet Age. Cambridge: Polity.

11.

Chan

Bennett Moses

(2016) Is big data challenging criminology? Theoretical Criminology 20(1): 21–39.

12.

Chouldechova

(2017) Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. ArXiv170300056 Cs Stat.

13.

Colarusso

(2016) Uncovering big bias with big data. Lawyerist.com, 31 May. Available at: https://lawyerist.com/big-bias-big-data/.

14.

Danaher

Hogan

Noone

, et al. (2017) Algorithmic governance: Developing a research agenda through the power of collective intelligence. Big Data & Society 4: 205395171772655. https://doi.org/10.1177/2053951717726554.

15.

Eckhouse

(2017) Big data may be reinforcing racial bias in the criminal justice system. Washington Post, 10 February. Available at: https://www.washingtonpost.com/opinions/big-data-may-be-reinforcing-racial-bias-in-the-criminal-justice-system/2017/02/10/d63de518-ee3a-11e6-9973-c5efb7ccfb0d_story.html?utm_term=.8941ea2df048.

16.

Eubanks

(2018) Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor. New York: St Martin’s Press.

17.

Feeley

Simon

(1992) The new penology: Notes on the emerging strategy of corrections and its implications. Criminology 30: 449–474.

18.

Ferguson

(2017) The Rise of Big Data Policing: Surveillance, Race, and the Future of Law Enforcement. New York: New York University Press.

19.

Gannon

(2016) Editor’s introduction to the special issue. Psychology, Crime and Law 22: 1.

20.

Gates

(2011) Our Biometric Future: Facial Recognition Technology and the Culture of Surveillance. New York: NYU Press.

21.

Gitlin

(2012) Occupy Nation: The Roots, the Spirit, and the Promise of Occupy Wall Street. Harper New York: Collins.

22.

Grassley

(2015) S.2123—114th Congress (2015–2016): Sentencing Reform and Corrections Act of 2015.

23.

Hacking

(1990) The Taming of Chance. Cambridge: Cambridge University Press.

24.

Halupka

(2016) The rise of information activism: How to bridge dualisms and reconceptualise political participation. Information, Communication & Society 19: 1487–1503.

25.

Hannah-Moffat

(2004a) Losing ground: Gendered knowledges, parole risk, and responsibility. Social Politics 11: 363–385.

26.

Hannah-Moffat

(2004b) V. Gendering risk at what cost: Negotiations of gender and risk in Canadian women’s prisons. Feminism & Psychology 14: 243–249.

27.

Hannah-Moffat

(2005) Criminogenic needs and the transformative risk subject: Hybridizations of risk/need in penality. Punishment & Society 7: 29–51.

28.

Hannah-Moffat

(2009) Gridlock or mutability: Reconsidering ‘gender’ and risk assessment. Criminology & Public Policy 8: 209–219.

29.

Hannah-Moffat

(2013) Actuarial sentencing: An ‘unsettled’ proposition. Justice Quarterly 30: 270–296.

30.

Hannah-Moffat

(2016a) Risk knowledge(s), crime and law. In: Burgess

Alemanno

Zinn

(eds) Routledge Handbook of Risk Studies. Abingdon: Routledge, 241–251.

31.

Hannah-Moffat

(2016b) A conceptual kaleidoscope: Contemplating ‘dynamic structural risk’ and an uncoupling of risk from need. Psychology, Crime and Law 22: 33–46.

32.

Harcourt

(2007) Against Prediction. Chicago, IL: University of Chicago Press.

33.

Hetey

Monin

Maitreyi

, et al. (2016) Data for change: A statistical analysis of police strops,searches, handcuffings, and arrests in Oakland, Calif., 2013–2014. Stanford | Social Psychological Answers to Real-world Questions.

34.

Kemshall

(2003) Understanding Risk in Criminal Justice. Berkshire: McGraw-Hill Education (UK).

35.

Kitchin

(2014) Ethical, political, social and legal concerns. In: The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. London: SAGE, 165–183.

36.

Kitchin

(2017) Thinking critically about and researching algorithms. Information, Communication & Society 20: 14–29.

37.

Lazer

Pentland

Adamic

, et al. (2009) Computational social science. Science 323: 721–723.

38.

Lohr

(2013) Sizing up big data, broadening beyond the internet. New York Times, 19 June, F1.

39.

Lupton

(2015) Digital Sociology. Abingdon: Routledge.

40.

Lyon

(2014) Surveillance, Snowden, and big data: Capacities, consequences, critique. Big Data & Society 1: 205395171454186. https://doi.org/10.1177/2053951714541861.

41.

McDermott

(2017) Conceptualising the right to data protection in an era of Big Data. Big Data & Society 4: 2053951716686994. https://doi.org/10.1177/2053951716686994.

42.

McNulty

(2014) Understanding big data: The seven V’s. Dataconomy, 22 May. Available at: http://dataconomy.com/2014/05/seven-vs-big-data/.

43.

Mapping police violence ( n.d.) Available at: https://mappingpoliceviolence.org/ (accessed 1 May 2017).

44.

Markoff

(2011) Armies of expensive lawyers, replaced by cheaper software. New York Times, 4 March. Available at: http://www.nytimes.com/2011/03/05/science/05legal.html.

45.

Marr

(2014) Big data: The 5 Vs everyone must know. LinkedIn Pulse. Available at: https://www-linkedin-com-s.web.bisu.edu.cn/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-know (accessed 5 May 2017).

46.

Maurutto

Hannah-Moffat

(2006) Assembling risk and the restructuring of penal control. British Journal of Criminology 46: 438–454.

47.

Mayer-Schönberger

Cukier

(2013) Big Data: A Revolution That Will Transform How We Live, Work, and Think. Boston, MA: Houghton Mifflin Harcourt.

48.

Metcalf

Crawford

(2016) Where are human subjects in big data research? The emerging ethics divide. Big Data & Society 3: 205395171665021. https://doi.org/10.1177/2053951716650211.

49.

O’Malley

(1992) Risk, power and crime prevention. Economy and Society 21: 252–275.

50.

O’Malley

(2004) Risk, Uncertainty and Government. London: The GlassHouse Press.

51.

O’Malley

(2010) Crime and Risk. London: SAGE.

52.

O’Neil

Schutt

(2013) Doing Data Science: Straight Talk from the Frontline. Beijing: O’Reilly Media.

53.

Police Union Contract Project ( n.d.) Check police. Available at: http://www.checkthepolice.org/ (accessed 1 May 2017).

54.

Police Use of Force Project ( n.d.) Police use force project. Available at: http://useofforceproject.org/ (accessed 1 May 2017).

55.

PredPol (2017) How PredPol works: Predictive policing. PredPol. Available at: http://www.predpol.com/how-predictive-policing-works/ (accessed 1 May 2017).

56.

ProPublica (2017) COMPAS recidivism risk score data and analysis: ProPublica Data Store. Available at: https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis (accessed 1 May 2017).

57.

ProPublica (n.d.) ProPublica. Available at: https://www.propublica.org/ (accessed 5 May 2017).

58.

Ruppert

(2012) The governmental topologies of database devices. Theory, Culture & Society 29: 116–136.

59.

Skeem

Lowenkamp

(2016) Risk, race, and recidivism: Predictive bias and disparate impact. Criminology 54(4): 680–712. Available at: https://doi.org/10.1111/1745-9125.12123.

60.

Smith

GJD

O’Malley

(2017) Driving politics: Data-driven governance and resistance. British Journal of Criminology 57(2): 275–298.

61.

Thomas

(2014) Pandemics of the future: Disease surveillance in real time. Surveillance & Society 12: 287–300.

62.

Tufekci

(2014) Engineering the public: Big data, surveillance and computational politics. First Monday 19. https://doi.org/10.5210/fm.v19i7.4901.

63.

Tynan

(2016) Online behind bars: If internet access is a human right, should prisoners have it? Guardian, 3 October. Available at: https://www.theguardian.com/us-news/2016/oct/03/prison-internet-access-tablets-edovo-jpay.

64.

Valverde

(2014) Studying the governance of crime and security: Space, time and jurisdiction 1. Criminology & Criminal Justice 14: 379–391.

65.

VirginiaCourtData.org (2017) Why it’s difficult to get court data in Virginia. Available at: VirginiaCourtData.org.

66.

Williams

Burnap

(2016) Cyberhate on social media in the aftermath of Woolwich: A case study in computational criminology and big data. British Journal of Criminology 56: 211–238.

67.

Zavos

(2015) Digital media and networks of Hindu activism in the UK. Culture and Religion 16: 17–34.