Abstract
How do algorithms shape the imaginary and practice of security? Does their proliferation point to a shift in the political rationality of security? If so, what is the nature and extent of that shift? This article argues that efforts to strengthen global health security are major drivers in the development and proliferation of new algorithmic security technologies. In response to a seeming epidemic of potentially lethal infectious disease outbreaks – including HIV/AIDS, Severe Acute Respiratory Syndrome (SARS), pandemic flu, Middle East Respiratory Syndrome (MERS), Ebola and Zika – governments and international organizations are now using several next-generation syndromic surveillance systems to rapidly detect new outbreaks globally. This article analyses the origins, design and function of three such internet-based surveillance systems: (1) the Program for Monitoring Emerging Diseases, (2) the Global Public Health Intelligence Network and (3) HealthMap. The article shows how each newly introduced system became progressively more reliant upon algorithms to mine an ever-growing volume of indirect data sources for the earliest signs of a possible new outbreak – gradually propelling algorithms into the heart of global outbreak detection. That turn to the algorithm marks a significant shift in the underlying problem, nature and role of knowledge in contemporary security policy.
Introduction
Rapid advancements in automated and digital interconnectivity during the late 20th and early 21st centuries have spawned new ways of imagining and practising security. In particular, a wide array of algorithm-driven technologies now seek to harness the data-based, predictive capacities imbued in these developments in order to inform governments about the probability of threats that are as yet unforeseen. Such algorithmic technologies represent new strategic instruments of security and preemption – especially through their seeming capacity to infinitely amass, aggregate and transcribe unintelligible mass datasets, thus rendering visible and intelligible future-facing knowledge for the rapid identification of insecurity. The increasing integration and centrality of such automated and algorithmic technologies within a broad spectrum of contemporary security practice has already been documented – for example, in relation to the proliferation of digital algorithms for human iris detection at airports and national borders (Daugman, 2004), in relation to automated identification capacities for enhanced surveillance within human biometrics and facial recognition programmes (Introna and Wood, 2004), and – more generally – within the context of homeland securitization in the ‘war on terror’ in the United States and elsewhere (Amoore, 2009). In fact, digital algorithmic processes appear increasingly ubiquitous within the security and surveillance strategies of the 21st century.
Efforts to strengthen global health security are a major, if frequently overlooked, driver for the development and international proliferation of such new algorithmic security technologies. The experience of a seeming epidemic of new epidemics over the past two decades – from HIV/AIDS, SARS and pandemic flu, through to MERS, Ebola and Zika – has begun to recontour perspectives of insecurity and accorded greater international political centrality to the threat that international public health emergencies can pose for populations, economies, states and infrastructures (Elbe, 2006, 2007; Lakoff, 2008, 2015; Samimian-Darash, 2011). Already, these repeated outbreak experiences have compelled many governments to make the protection of their populations and economies against the emergence of new, lethal infectious diseases an explicit part of national security strategies (Elbe, 2009, 2010, 2014). Policy discourses and practice within global health security now resonate with notions of ‘early detection’, ‘preemption’ and ‘response’.
Algorithms are central to these developments because they have enabled the creation of several next-generation syndromic surveillance systems now routinely used by governments and international organizations to rapidly detect new infectious disease outbreaks occurring around the world. In the United Kingdom, Public Health England (2015) defines syndromic surveillance as ‘the process of collecting, analysing and interpreting health-related data to provide an early warning of human or veterinary public health threats, which require public health action’. The precise definition and application of syndromic surveillance systems still varies considerably in practice and remains subject to scholarly debate – with some scholars even noting underlying ‘confusion about the terminology’ (Morse, 2012: 9). Indeed, Henning (2004: 1–2) observes, ‘specific definitions for syndromic surveillance are lacking and the name itself remains imprecise’ – with the term presently being used to cover a wide array of early warning and outbreak detection systems. Notwithstanding those competing definitions, Morse (2012: 9) rightly suggests that most definitions of syndromic surveillance ‘highlight the use of “non-diagnostic” data – that is, information on possible health events before, or without, definite laboratory confirmation’.
Syndromic surveillance thus marks a departure from more traditional forms of public health surveillance, which tended to rely upon the reporting of official scientific and statistical health information to guide responses to emergent health emergencies. By contrast, digital syndromic surveillance functions through the constant, omnipresent and (near) real-time monitoring, collection and reporting of a range of non-diagnostic (and often open-source) data to detect early signals of a new infectious disease outbreak. Rather than waiting for the older, and usually lengthier, process of direct clinical and laboratory confirmation of a new infectious disease outbreak, syndromic surveillance systems continuously monitor a wide range of more indirect data – such as reports from hospital emergency departments, hospital admissions, sales of medicines from pharmacies, telephone calls to health advice providers, levels of absenteeism at school and/or workplaces, etc. – for early indications that a new outbreak or even a bioterrorist attack may have occurred. 1 The main idea behind the use of such ‘proxy’ data is that if a novel infectious disease breaks out, then informal signals of unusual clusters of illnesses may surface before any official clinical and laboratory analysis can be undertaken. For example, people suddenly becoming ill may start to search for unusual symptoms on internet search engines, may begin to purchase over-the-counter remedies, or may decide to stay home from work or school. Through close and continuous monitoring of these early, preclinical signals, it may become possible to considerably speed up the process of outbreak detection – thus gaining vital time for preparing a government response.
Over the years, the type (and spectrum) of data that such systems can draw upon has gradually expanded. This has given rise to a new set of syndromic surveillance systems analysing a much wider variety of non-traditional data sources for such early signs of an outbreak (Morse, 2012: 9). In particular, the past two decades have witnessed the creation of several new syndromic surveillance systems harnessing the growing availability of open-source public health news and information that is now widely available and/or exchanged over the internet (Zhang et al., 2009: 509). The wager behind these internet- and news-based syndromic surveillance systems is that new cases of unusual symptoms, illness or death occurring anywhere around the world might well become the subject of local media reports, or online discussion, long before a government can officially confirm a new outbreak and report it to the relevant international health organizations. Such syndromic surveillance systems are now extensively relied upon by the World Health Organization (WHO) for its international outbreak detection activities (Wilson and Brownstein, 2009; Zhang et al., 2009: 509).
Yet processing the immense volume of indirect and mass datasets in near real time is a substantial challenge. That is why automated algorithmic technologies are increasingly relied upon to report and transcribe the potentially dangerous geographies of global health. Within this era of ‘Big Data’, in which continually generating and widely accessible open-source data streams provide ‘the ability … to harness information in novel ways to produce useful insights’ (Mayer-Schönberger and Cukier, 2013: 2), the algorithm has emerged as a creditable knowledge logic for managing the means through which information is produced, circulated and consumed in a complex informational society (Gillespie, 2013: 191). This article traces the origins, design and function of three novel online syndromic surveillance systems for strengthening global health security: (1) the Program for Monitoring Emerging Diseases, (2) the Global Public Health Intelligence Network and (3) HealthMap. The article shows how each new system has become progressively more reliant upon algorithms to continuously mine an ever-growing volume of indirect data sources for the earliest signs of a possible new outbreak – propelling algorithms into the heart of global outbreak detection. That turn towards the algorithm signals a significant shift in terms of how knowledge is problematized, the kind of knowledge that is produced and the underlying role that knowledge performs in contemporary security practices.
First steps: The Program for Monitoring Emerging Diseases (ProMed-mail)
Governments wishing to protect their populations and economies against new infectious disease outbreaks want to be alerted to a new outbreak as quickly as possible. Time is considered critical for acquiring a better understanding of the nature, epidemiology and likely spread of the new disease. It is also crucial for preparing government responses and – where necessary – readying healthcare systems for a surge in patients. However, governments have historically encountered at least two obstacles in achieving rapid international outbreak detection.
First, because new outbreaks can occur anywhere in the world, governments are effectively reliant upon the honesty (and capability) of other governments to rapidly and accurately report new infectious disease outbreaks to the rest of the international community where and when they occur. In the past, many governments have been loath to be candid and forthcoming about such new outbreaks, owing to fears about the possible impact of such news on trade, tourism and so forth. At the apex of the SARS epidemic’s emergence in southern China in 2002–03, for example, international public health experts had to confront the spectre of Chinese state sovereignty and significant government stonewalling. Yet international health organizations such as the WHO could not officially take action until China had come forward to formally notify them of the outbreak. Such ‘political’ challenges in attaining timely and verified outbreak data in the case of SARS highlighted significant discrepancies between the monitoring and reporting of infectious disease outbreaks, as well as an overreliance on nationally curated health information, within global health surveillance initiatives. The political sovereignty of other countries thus represents one pertinent obstacle to a more rapid and reliable system of global outbreak detection.
In addition to potential ‘political’ interference with rapid infectious disease outbreak reporting, a second challenge stems from the traditionally lengthy process of scientifically confirming and notifying the presence of a new infectious disease. The process of safely collecting samples of new microbes, conducting complex laboratory analyses and feeding the information through the relevant government channels can prove a lengthy one and be prone to bureaucratic delays and/or errors. In reflecting upon an outbreak of poliomyelitis in Guinea in 1967–68, for example, Weir and Mykhalovskiy (2010: 118) argue that within older governmental systems of health surveillance, ‘outbreak and epidemic control occurred in local time and place, were weakly articulated to both official and unofficial knowledge in the first weeks of onset, and were incorporated in national and global time after the fact’. In a global context where people (and therefore also microbes) could move rapidly across the world through the conduit of international airline infrastructures, but where the reporting and surveillance mechanisms of traditional health infrastructures generally functioned within the office hours of national disease institutes and scientific laboratories, interest emerged in developing faster detection systems that would move infectious disease outbreak surveillance closer to ‘real’ time temporalities (Lakoff, 2015).
One of the first attempts to develop such a novel outbreak detection system was the Program for Monitoring Emerging Diseases (ProMed-mail). Tracing its origins to the fledgling days of the World Wide Web and emerging online interconnectivity in 1994, ProMed-mail was designed as an effort by the Federation of American Scientists to streamline emerging practices of global health surveillance. The central objective of this first-of-its-kind online surveillance system was
to promote communication amongst the international infectious disease community, including scientists, physicians, epidemiologists, public health professionals and others interested in infectious diseases on a global scale … [and] to participate in discussions on infectious disease concerns, to respond to requests for information, and to collaborate together in outbreak investigations and prevention efforts.
2
In retrospect, ProMed-mail is now regarded as the original prototype of open-source syndromic surveillance systems for global health security (Zhang et al., 2009).
From its inception, ProMed-mail differed from previous systems of health surveillance in terms of both design and purpose. As Lawrence Madoff (2004: 227) points out, a founding principle of the new online health technology was that it would remain open to all global sources and not just official health data produced by government health authorities. The new system would function by harnessing the power of personal computers, email and the internet to connect subscribers working in different countries. Those subscribers could then use the networked platform to exchange reports and information about new infectious disease outbreaks that they became aware of – irrespective of where they were geographically located in the world. Critically, this networked platform meant that the ProMed-mail system could both draw upon official data and report other types of more ‘informal’ material submitted by its subscriber base. As it evolved, the system thus gathered information from a diversity of outlets, including official government health and departmental reports but also press briefings, bulletins from international organizations, as well as professional or personal observations and other open-source material of subscribed ProMed-mail members, media sources and even local rumours (Cowan et al., 2006: 1091).
Those incoming reports from an array of sources are subsequently analysed by expert subject-area moderators within the ProMed-mail team. Those experts possess wide-ranging thematic and scientific knowledge about bacterial, viral and parasitic diseases, as well as animal and zoonotic diseases, plant disease, epidemiology and entomology (Cowan et al., 2006: 1091). Through the use of human analytic expertise, as well as the medians of the personal computer and the World Wide Web, the ProMed-mail system thus seeks to either verify or discredit submitted global health data and to scan the submitted material for accuracy and relevancy. Relevant incoming health reports are also further assessed by the ‘top’ moderator on duty within the system for additional review and consideration, and with a view to assigning reports an urgency rating. An estimated 30 daily disease notification reports are received from ProMed-mail subscribers, with an estimated 20 of these reports being forwarded to the top moderator on duty for further verification (Madoff, 2004: 229).
In the next step, moderators can then assign an urgency rating to the incoming reports. A red rating indicates a public health emergency of highest concern. In this event, reports are immediately uploaded onto the system’s public website, with further notification emails being sent to ProMed-mail subscribers on the system listserv. Reports listed as yellow may or may not be uploaded to the website, with discretion being left to the top moderator. Items with low or no urgency are labelled green and sent on for further review and finalization. In terms of its overarching time horizon, this system of outbreak news classification and dissemination thus enables infectious disease outbreak reporting to become a daily activity, with the ProMed-mail system publishing and uploading 7 daily reports, 365 days per year, on infectious disease trends.
ProMed-mail has been incredibly successful at attracting subscribers from around the world. The new system may have only counted 40 members at its launch in 1994; however, by 1997 interest in novel online open-source reporting had already witnessed the growth of the subscriber base to 11,000 members spread across 135 countries. As the initiative approached its tenth anniversary in 2004, ProMed-mail membership had nearly tripled to more than 32,000 individuals based in 150 countries. By 2014 – 20 years after the novel launch of ProMed-mail – the subscribership consisted of more than 60,000 members in at least 185 countries. 3 The extensive uptake of the new system is remarkable in and of itself, but it also further extends the system’s geographic coverage and penetration because the system allows members to submit information that they encounter locally. The more people around the world who use the system, the more information on infectious disease outbreaks can be garnered and exchanged.
With the benefit of hindsight, we can see that ProMed-mail marked a pivotal development in the history of infectious disease surveillance as a pioneering effort to harness emergent technologies for faster and more extensive surveillance of infectious disease outbreaks across the world (Weir and Mykhalovskiy, 2010: 88). In terms of its underlying temporalities, ProMed-mail was able to move closer to the goal of attaining real-time reporting by introducing a cycle of daily reporting. Politically, it was far less constrained by traditional demarcations of the state and national sovereignty because it was located online and within the emergent world of the World Wide Web. Epistemically, it utilized a greater range and volume of sources by combining both official and unofficial information sources. Geographically, it achieved ever-greater coverage as its membership began to emanate from more and more countries around the world. All of those aspects combined to make ProMed-mail quite a revolutionary system of surveillance capable of creating a regime of visibility around infectious diseases not previously achievable or available within official government programmes of health surveillance. And yet ProMed-mail also remained a surveillance system still extensively reliant upon the capacity of human analysts to analyse, translate and upload all this relevant reporting of potential infectious disease outbreaks. As innovative a system as it was, ProMed-mail did not yet make extensive use of algorithmic technologies.
Harnessing the algorithm: The Global Public Health Intelligence Network (GPHIN)
Only three years after the inception of ProMed-mail, another new syndromic surveillance system called the Global Public Health Intelligence Network (GPHIN) was developed in partnership between the WHO and Health Canada. Focusing explicitly on open-source media reports, the operational objective of GPHIN was ‘to determine the feasibility and effectiveness of using news media sources to continuously gather information about possible disease outbreaks worldwide, and to rapidly alert international bodies of such events’ (Keller et al., 2009: 690). GPHIN would essentially aim to constantly monitor online news and media reports from around the world (and in multiple languages) with the aim of detecting the earliest signs of a possible new infectious disease outbreak.
The idea for the GPHIN initiative can be traced back to the 1994 outbreak of pneumonic plague in India. On 23 September 1994, reports of human infections with pneumonic plague began to surface in Surat, a city located within the extremely densely populated state of Gujarat. The outbreak resulted, albeit briefly, in an unprecedented panic with serious international ramifications. The perceived rapid spread of the Surat plague in the following days provoked an outward mass flight of an estimated quarter of the 1.5 million city-dwellers from Surat across India. The Surat plague thus seemed to capture all the fears of how a potentially lethal infectious disease could suddenly break out and then rapidly spread to other geographic areas – posing a risk to an ever-growing number of people within and beyond the borders of India.
Surat was certainly not the first outbreak of plague, but it was nevertheless striking because the outbreak was exposed, tracked and widely reported upon by an emergent, 24-hour global news media (Dutt et al., 2006; Pallipparambil, 2014). The emergence of increased global interconnectivity and digital media at the end of the 20th century meant that media organizations such as the BBC and CNN would come to report upon the plague situation in Surat and would widely disseminate such unofficial health knowledge to their viewers (Weir and Mykhalovskiy, 2010). Crucially, those media reports appeared to come in more quickly than many of the traditional channels of infectious disease outbreak reporting (Pallipparambil, 2014). In many respects, the media effectively began to constitute an alternative, widely accessible – albeit more informal – disease surveillance network beyond the control of national jurisdictions.
For some, the experience of the Surat plague exposed the ways in which official systems of infectious disease outbreak reporting were clearly lagging behind the pace of events and media coverage. For others, however, such rapid media reporting also represented a critical opportunity to further speed up the process of international outbreak detection. If there could be an effective way of continuously monitoring such media reports in near real time, this could lead to much earlier warnings about new outbreaks than having to wait for the official diplomatic channels of disease notification to run their course. The aim of the new GPHIN system would thus be to analyse and report upon emergent and probable health data trends by processing mass collated data in the form of international media reports. The underlying data would be harnessed and filtered through novel information-aggregating technologies. Unlike the earlier ProMed-mail system, the operation of GPHIN thus relied heavily on media aggregation systems – notably two models called Factiva and Al Bawaba. Factiva is an online multilingual, media content service owned by Dow Jones & Company, whereas the Al Bawaba service amasses and makes available media reporting sources in Arabic and English. In employing such automated surveillance capacities, the GPHIN platform automatically scans these sources every 15 minutes for input data to be processed further by a combination of human analytics and automated and algorithmic programming (Mawudeku and Blench, 2006: 9).
Indeed, a crucial difference in GPHIN relates to the growing role that automated processes play in analysing the data streams. Corresponding to the rise in computing capacities and further refinement of the internet, formidable portions of the GPHIN platform became reliant upon automatic computing and digital operation processes to supplement human infectious disease observation, classification and reporting. In addition to the design and application of web-crawling programs to continuously extract relevant open-source health data from selected websites, GPHIN thus became the first online health syndromic surveillance model to integrate an information-retrieval algorithm to further observe, classify and filter aggregated incoming data sources. GPHIN had effectively begun to harness the power of the algorithm for speeding up the analysis of incoming reports in a way that ProMed-mail had not.
In fact, the turn towards algorithmic programming within GPHIN was the key to managing the growing volume of continually generating mass online datasets being fed into the system. Through automated computing facilities, the GPHIN operating platform effectively scans and ‘pulls’ pertinent global health media reports at recurrent 15-minute intervals, 24 hours a day, 7 days a week, from the news aggregator systems. It then locates and retrieves from this data specific words, expressions, phrases and syntax relevant to designated health categories within the GPHIN system – including human, animal and plant diseases, biological, chemical and radiological risks, and natural disasters (Keller et al., 2009: 690). It does all of this, moreover, with sources in a number of different languages. GPHIN even provides timely and routinized automated machine translation of curated English articles into Arabic, Chinese (simplified and traditional), Farsi, French, Portuguese, Russian and Spanish report formats.
The incorporation of such algorithmic technologies within the GPHIN platform meant that even the initial classification of the significance of a new report could now become automated. The information-retrieval algorithm, based upon predefined classification by taxonomy, thus automatically processes and assigns an initial ‘relevancy score’ by category. This score is ‘derived from the proprietary algorithm utilizing the values attributed to the keywords and terms within the taxonomies or taxonomy it has been assigned to’ (Blench, 2008: 300–302). Through this process of automated observation and calculation, incoming health data that receive an assigned ‘high’ relevancy rating are automatically loaded or published onto the GPHIN database. Articles listed as immediate concerns are forwarded to the GPHIN subscriber base (in which consumers access information on the basis of a fee-access system) and users via e-mail health reports, while articles deemed redundant or irrelevant are ‘trashed’ and not considered for further risk analysis (Blench, 2008: 300–302).
Overall, GPHIN thus signals a move towards greater reliance on algorithmic technologies for the purposes of international outbreak detection. Its use of algorithms was critical in enabling GPHIN to achieve a speed and volume of information retrieval, processing and filtering not feasible through the use of human analytics alone. GPHIN thus became one of the first news-based syndromic surveillance systems to replace traditional human labour and analytics with algorithmic processing to construct a globalized view of infectious disease risk. The power of this new, algorithmically-based system would become evident during the SARS outbreak in late 2002 and early 2003, when GPHIN was able to identify signals within the online ‘noise’ of Chinese media sources pointing to the emergence of an unusual strain of flu an estimated 11 days after the first outbreak occurred in late 2002. Crucially, GPHIN’s detection preceded the Chinese Ministry of Health’s official notification protocol to the WHO (which only occurred on 7 February 2003), as well as the WHO’s first official report on the occurrence, which was released to the global public on 25 February 2003 and detailed the progression of an ‘atypical pneumonia outbreak’ (Blench, 2008: 300). GPHIN had powerfully demonstrated during the SARS outbreak how – through the careful design of new algorithms – mass sets of unofficial, open-source news data could be translated into actionable indicators of a possible emerging infectious disease threat.
Mapping and visualizing the outbreaks: The birth of HealthMap
The year 2006 witnessed the launch of a further automated, internet-based syndromic surveillance system called HealthMap. Whereas both ProMed-mail and GPHIN focused on generating and disseminating information about possible infectious disease outbreaks (albeit in very different ways), HealthMap wished to go one step further by visualizing and aggregating the data in a much more easily accessible and user-friendly way. HealthMap would do this through use of an online, digital and global mapping system – supplemented by a Google Maps plug-in – to neatly visualize the information flows and data streams that could otherwise be overwhelming to the user or that could even ‘obscure important elements of a disease outbreak’ (Keller et al., 2009: 691) in the era of continually generating mass unintelligible online data. The advent of these next-generation syndromic surveillance mechanisms like ProMed-mail and GPHIN (as well as other systems) meant that more and more information about emergent infectious diseases was becoming available – so much so that there was a risk that this mass of online raw data would become overwhelming and meaningless without the ability to transcribe and translate their signals accurately, reliably and accessibly (Rouvroy, 2015). HealthMap was launched in response to this new challenge of ‘taming the chaos’ and connecting the predictive dots within continually generating sets of mass health data.
The HealthMap system was originally designed as a multi-stream, real-time syndromic surveillance platform that monitors and continually aggregates electronic health data on new, ongoing and emerging infectious disease outbreaks (Nelson, 2008: 596). As Keller et al. (2009: 691) note, the system integrates online outbreak data from multiple electronic and digital sources, including online news wires (e.g. Google News), Really Simple Syndication (RSS) feeds, expert-curated accounts (e.g. ProMed-mail), multinational surveillance reports (e.g. from Eurosurveillance) and validated official alerts (e.g. from the WHO). Like ProMed-mail and GPHIN, HealthMap thus also includes an array of unofficial sources of data, ‘including online news aggregators, eyewitness reports, expert-curated discussions and validated official reports, to achieve a unified and comprehensive view of the current global state of infectious diseases and their effects on human and animal health’. 4 Yet in building upon the premise that such ‘raw’ electronic data sources were not yet well organized or integrated (Freifeld et al., 2008: 150), the HealthMap platform utilized automated web-crawling and web-scraping programming to scan the internet on a continual, 24-hours-a-day, 365-days-a-year basis, collecting up to 20,000 online sources per hour – all with the view of better organizing and displaying this data in a more readily accessible manner.
A further innovation of the HealthMap system thus pertains to what happens next. All of these relevant aggregated raw data are subsequently loaded onto the HealthMap operating platform, where incoming disease outbreak reports are converted into standard ‘alert’ formats containing the fields of ‘headline’, ‘date’, ‘description’ and ‘info text’. This information is then forwarded into an automated classification system that strips down and simplifies datasets in order to determine the precise geographic and epidemiological information associated with each alert. Text-processing algorithms further juxtapose and link incoming data alongside an integrated in-system dictionary, which contains approximately 1,800 disease conditions along with 5,000 global geographic patterns that include countries, capital and major cities, regions, etc. The incoming text data are further stripped of any non-alphanumeric content and are then automatically matched alongside the HealthMap dictionary for correlation and classification between the simplified datasets and the listed geographic locations and disease conditions automatically generated by the HealthMap system.
Following this initial classification and geocoding performed through the use of algorithms, a further automated process within HealthMap then collects and tallies the number of alerts, feeds and updates obtained online about the specific occurrence of an outbreak of infectious disease within a single geographic location. The system also uses a programmed sorting algorithm in order to determine the ‘heat’ or relevance based upon a 1–10 integer scale for each disease in each location. Finally, this classified, translated and ranked global health reporting is then overlaid on an interactive (online) map for user-friendly access to the original report, enabling the easy viewing of the present, real-time state of global infectious disease outbreaks occurring anywhere around the globe (Keller et al., 2009: 692). Like GPHIN before it, HealthMap thus also replaces the previous surveillant procedures of human observation with the more automated, continually generating capacities of digital technology and algorithmic logics. Indeed, the core objective of HealthMap was to ‘allow for greater possibilities in extracting structure algorithmically from a variety of disparate data sources’ (Freifeld et al., 2008: 151).
HealthMap’s ability to do so successfully was powerfully demonstrated when it detected signs of a strange fever occurring in Macenta, Guinea on 14 March 2014. That crucial signal was detected one week before the Guinean Ministry of Health officially confirmed an outbreak of the Ebola virus that would go on to become the largest outbreak of Ebola in human history. The Ebola outbreak in West Africa had showed how HealthMap’s largely automated surveillance platform, powered by innovative algorithmic technologies, was able to capture and identify specific signals, signs or aberrations of a new outbreak within a massive dataset. It had shown how the use of carefully designed algorithms could help reconcile the double-pronged problematique of Big Data – which is both potentially telling and yet seemingly infinite and unintelligible at the same time.
More broadly, the launch of the HealthMap system represented yet another step along the continuum towards real-time identification of potential or emergent infectious disease outbreaks. ProMed-mail had sought to create an online, real-time forum for the exchange of open-source health information, but the system remained contingent upon the temporalities of the early and emergent World Wide Web. GPHIN had subsequently sought to expand the dimensions of near real-time temporality through the utilization of then cutting-edge innovations in web-crawling programming, retrieval algorithms and simultaneous computer-translating capacities. HealthMap, in turn, marked the first online syndromic surveillance system of the Web 2.0 era and represented a further step in the move towards capturing the ‘real time’ in health surveillance reporting. It used algorithmic technologies to further accelerate real-time temporalities of health surveillance through the online visualization, chronological mapping and open-access presentations of geographies of risk derived from data. Functioning largely through automated computing, supported to an extent by human curation to correct misclassifications and examine geographic coverage (Freifeld et al., 2008: 151), and fully supplemented by algorithmic programming for precise information classification and geocoding, the HealthMap project reflects the latest trajectory in the growing integration of algorithmic programming into processes for global outbreak detection.
Syndromic surveillance and algorithmic governmentality
The evolutionary arc of these novel syndromic surveillance systems over the past two decades has moved algorithms to the core of global outbreak detection. As a pioneering online forum of infectious disease information exchange, ProMed-mail represented a novel response to the problem of outbreak detection utilizing the emerging power of the personal computer and the internet, but it still fell short of harnessing the power of the automated algorithm. GPHIN, with its subsequent integration of information-retrieval algorithms, signalled the transition to a more specifically algorithmic tactic of surveillance by automatically processing a vast volume of news data becoming openly available through online media outlets. HealthMap, in yet a further transformation, marked the rise of wholescale implementation of automated computing processes and algorithmic programming within a single operating platform. Such growing reliance on algorithms signals a significant shift in terms of how knowledge is problematized, the kinds of knowledge that are produced and also the underlying role of knowledge in contemporary security practices.
The problem of knowledge: From scarcity to excess
The prominent role that algorithmic technologies now play in these syndromic surveillance systems points to a pivotal change in the underlying problem of knowledge. Traditional systems of health surveillance were moored within national health institutes and scientific laboratories with specialist infectious disease knowledge – frequently taking the form of population records, health and laboratory reports, and so forth. These products of health knowledge tended to be time-consuming and costly to produce, as well as being almost entirely reliant on human observation and analytics for curation. However, they were essentially geared towards creating new knowledge for the art of governing the health of populations in a context where there was previously very little or even no such knowledge available.
The rapid rise of the algorithm within syndromic surveillance systems, by contrast, responds not so much to a problem of information scarcity, but – on the contrary – to an underlying problem of data excess. The core challenge for syndromic surveillance is how to process an ever-increasing volume of more indirect and informal information indicating that an outbreak may have occurred. ProMed-mail became the first syndromic surveillance system to seek to address this changing problematization of knowledge through its novel use of a transnational, digital platform for information exchange. The system recognized that there had been a proliferation of multiple new data sources, including many informal ones that could be better harnessed for the purposes of rapid outbreak detection. ProMed-mail was able to successfully connect these different data sources (principally via electronic mail and the internet), but the scale of the information flow was such that it could still be managed largely through human analysis. The turn towards monitoring the much vaster information ecology of online news media in the search for possible outbreaks through the GPHIN system, however, posed an additional challenge. There were so many media reports being constantly generated that they could no longer be processed in a speedy and cost-effective way through the use of human analytics alone – necessitating the introduction of automated sorting and retrieval algorithms. The subsequent co-existence of multiple such surveillance systems, together with a proliferation of other data sources, would see a further intensification in the recourse to algorithms through the launch of the HealthMap system, which was largely sustained and regulated by algorithmic processing and automated computing. The turn to algorithms in these surveillance systems thus represents an attempt to tame the ‘chaos’ of excessive and constantly generating data flows that are impossible to process solely by utilizing human analytics alone. In the move from traditional public health surveillance to syndromic surveillance, the underlying problem of knowledge has, in many ways, become exactly reversed.
Proxy data – or ‘knowledge without truth’
The extensive reliance on algorithms also generates a new type of knowledge for governing global health security. Much of the knowledge produced by these next-generation syndromic surveillance systems is, strictly speaking, more of a ‘surface’ knowledge of indirect signs and proxy signals. In his lectures on governmentality, Foucault famously discussed the rise of statistics as a new form of knowledge for governing populations. The lectures even illustrated the importance of such statistical knowledge specifically in relation to the problem of governing infectious diseases – focusing on the example of smallpox and inoculation practices of 18th-century Europe. In that case, statistical science – applied to the living, biological world – sought to render visible and intelligible ‘how many people are infected with smallpox, at what age, with what effects, with what mortality rate’ (Foucault, 2007: 10).
The new surveillance systems explored above, however, have used a combination of algorithms and open-source data to generate a different kind of knowledge. They are part of a knowledge regime of electronic and digitized signs, codes and signals. They produce, in the words of Antoinette Rouvroy, a kind of ‘knowledge without truth’. By this, she means first, that knowledge is no longer produced – as within previous truth regimes – through the collection, quantification and utilization of human logic, but by ‘building upon the factual availability of enormous amounts of raw digital data’; and second, that knowledge generated by algorithmic processes increasingly escapes the previous systems of verification, including the kinds of trials, tests, examinations and experiments that were viewed as essential to ‘attest to the robustness, truth, validity and legitimacy of claims and hypothesis about reality’, within previous governmental regimes of truth (Rouvroy, 2013: 151).
The emergence of this new kind of ‘surface’ knowledge in the area of global health security became evident in 1994 when the new ProMed-mail system began to depart from previous health surveillance systems by also processing and curating non-official, non-clinical open-source information. The system thus marked a considerable ‘loosening’ of the inclusion criteria for suitable outbreak data that could be considered. GPHIN subsequently utilized the emergent World Wide Web as a forum in which to communicate and disseminate an unofficial and largely digitized form of ‘surface’ knowledge gained from online media sources. The launch of HealthMap in 2006 represented yet a further step in this direction through the introduction of a largely automated surveillance platform. Indeed, HealthMap deployed algorithmic processing to determine correlations between captured raw data and to visualize a truly networked, digitized view of present global health trends informed entirely by patterns and signals within datasets contra official statistical verification and scientific confirmations of previous systems of knowledge. Overall, those novel syndromic surveillance systems are premised less on transcribing actual cases of illness and/or outbreaks and more geared toward the interception of unmediated data flows – always seeking to harness data points and correlations towards greater intelligibility and visibility of the probable or potential. However, in the end, all the indirect and non-clinical data utilized by these syndromic surveillance systems to generate early warnings may – or may not – be indicative of an actual disease outbreak.
From the normal to the exceptional
The rise of the algorithm in these syndromic surveillance systems also signals a shift in the underlying role that knowledge performs in contemporary security practices. In presenting the epistemic transformation from a disciplinary economy of power towards that of a governmental economy of power in his Security, Territory, Population lecture series, Foucault (2007: 63) sought to make distinct the particular ways in which populations are ‘normalized’ through the identification and modification of risk groups. Unlike in a disciplinary economy of power, whereby ‘an ideal norm is imposed from the outside, and the distinction between the normal and the abnormal is undertaken according to this ‘artificial’ norm (Foucault, 2007: 63), in this emergent economy of security, which is increasingly powered by algorithmic logic and foresight, there is a shift away from processes of normalization (originally discussed by Foucault) towards the detection of the exceptional. Accordingly, normalization – as described within Foucault’s understanding of governmentality – constituted ‘the plotting of the normal and the abnormal, of different curves or normality, and the operation of normalization consists in establishing an interplay between these different distributions of normality’ (Foucault, 2007: 63). Fearnley (2008) has shown that such processes of normalization were central within previous governmental strategies of public health, especially through the establishment of regulatory mechanisms aimed at the development of an average or an equilibrium within the overarching governance and optimization of population health. However, those past systems of public health and the governance of risk were also frequently limited by costs, external conditions, and most significantly limitations within existent possibilities of knowledge (Fearnley, 2008: 1617).
The syndromic surveillance systems analysed here depart from the centrality of the norm and from the optimal statistical distribution for governing the health of populations. Rather than seeking to plot the normal versus the abnormal to determine the norm through interplay of facts, statistics and figures, algorithmically driven syndromic surveillance systems constantly survey in order to detect, determine and report upon that which constitutes the exception within the governance of infectious diseases. This divergence from traditional public health surveillance was already evident in the rationale for designing the Program for Monitoring Emerging Diseases (ProMed-mail), which – from its inception – sought to locate and report upon emergent infectious disease outbreaks, as opposed to determining averages and relative norms of existing disease trends within populations. GPHIN, too, was orientated not towards measuring and assessing the global burden of infectious disease among populations, but specifically towards capturing the emergence of new infectious disease outbreaks. The same holds true for the algorithmic processes that capture, match and upload online mass datasets to HealthMap. HealthMap, too, seeks to visualize aberrations not averages, as illustrated by the syndromic surveillance system’s initial detection of a ‘mysterious fever’ occurring in Macenta, Guinea on 14 March 2014.
Collectively, these syndromic surveillance systems are thus beginning to recontour understandings of the norm and the average through what Rouvroy (2015) highlights as the non-selectivity and exhaustive nature of ‘Big Data’ and algorithmic processing. Where the cultivation of the norm within governmentality involved the plotting of statistical points and seeking to bring what was considered abnormal into the fold of the ‘norm’, the idea of the average within algorithmic governmentality disappears completely with the capacity for automated computing and algorithmic logic to effectively capture and present all relevant points of data, no matter how distant. Unlike the previous generation of statistical knowledge, whereby points or data that deviated too far from a central or common finding were disregarded, the infinite scope and capacity of algorithmic programming means that even the most isolated or singular of points can now be taken into greater analysis, specification and account (Rouvroy, 2015). In the end, the rise of the algorithm within the context of these new syndromic surveillance systems signals a change not only in the nature and problem of knowledge within contemporary security practices, but also in its underlying role and purpose.
Conclusion
The three syndromic surveillance systems analysed in this article differ considerably. Yet, viewed collectively, they also show that algorithmic programming has become progressively more central to the practice of global health security. As the earliest of the three outbreak detection systems, ProMed-mail was the only system not to explicitly utilize algorithmic programming. Nonetheless, it represented an early attempt to harness and curate the knowledge available from freely available and exchangeable, open-source health data – thus marking a significant deviation from the more traditional utilization of statistical practices and official health knowledge in health surveillance. ProMed-mail may have still relied on human analytics rather than algorithmic programming, but it laid the operational groundwork from which a more advanced integration of algorithmic technologies would soon become possible.
Such integration of algorithmic programming into syndromic surveillance was subsequently achieved through the development of the Global Public Health Intelligence Network. As one of the first digitally interconnected, semi-automated health surveillance systems, GPHIN sought to integrate leading machine translation and automated/digital processes to develop an online health system with a ‘real-time’, 24-hours-a-day, 7-days-a-week surveillance capacity (Mawudeku and Blench, 2006: 7–11; Weir and Mykhalovskiy, 2010: 84–85). It explicitly introduced an information-sorting-and-retrieval algorithm in response to the growing volume and complexity of information available on the internet during the early days of GPHIN. Here, the launch of the GPHIN initiative also coincided historically with the turn of the millennium – a time during which personal-computer usage, consumption of online information and expanding access to the World Wide Web increased considerably.
HealthMap signalled yet a further step towards a fully automated surveillance platform. Sustained and informed even more extensively by algorithmic programming, HealthMap requires only minimal human analytics for precise geocoding, infectious disease classification, outbreak alert report curation and interactive visualization of ongoing infectious disease trends globally. Indeed, of the three systems to emerge between 1994 and 2006, HealthMap represents the most extensive and sustained effort yet to harness the power of the algorithm for the purposes of strengthening global health security. The iterative evolution of these new syndromic surveillance systems – from ProMed-mail, via GPHIN, through to HealthMap – has thus pushed algorithmic technologies to the centre of international outbreak detection.
The turn to algorithms for strengthening global health security signals a considerable shift in the underlying problem, nature and role of knowledge within contemporary security practices. The initial introduction of the algorithm responds to an underlying problem of knowledge no longer characterized by a situation of a scarcity of data relating to probable outbreaks of infectious disease, but rather by a system of constantly expanding and generating mass datasets that may contain patterns, associations and correlations indicative as yet unforeseen public health emergencies. Indeed, the very development of these new syndromic surveillance systems is predicated upon the conviction that existent forms of knowledge generation within health security are insufficient for addressing contemporary concerns regarding the threat of globalized pandemic and infectious disease outbreaks (Lakoff, 2015). Through the introduction of algorithmic technologies, moreover, syndromic surveillance systems then generate a new and quite different kind of knowledge – more of a ‘surface’ knowledge of indirect digital signs and signals that may enable faster responses and earlier interventions. Nor is the function of this new knowledge even still that of normalization as described in Foucault’s influential account of governmentality. Whereas normalization sought to establish a particular norm through ‘the plotting of the normal and the abnormal, of different curves or normality, and the operation of normalization consists in establishing an interplay between these different distributions of normality’ (Foucault, 2007: 63), syndromic surveillance systems move towards the detection of the exceptional or the aberrant within mass datasets.
Do all of these changes also point to a deeper transformation in the political rationality of security? On the one hand, all three systems considered here still remain continuous with the central aspects of governmentality described by Michel Foucault. Foucault famously referred to governmentality as ‘the ensemble formed by institutions, procedures, analyses and reflections, calculations, and tactics that allow the exercise of this very specific, albeit very complex, power that has the population as its target, political economy as its major form of knowledge, and apparatuses of security as its essential technical instrument’ (Foucault, 2007: 108; see also Collier and Lakoff, 2015: 26). The surveillance systems analysed in this article remain continuous with that notion of governmentality in that they too are all geared towards securing the welfare of populations – in this case against the death and illnesses that could be caused by new infectious diseases (Elbe, 2009; Elbe et al., 2014). It would thus be premature to speak of a complete transformation, or even break, in the underlying rationality of security.
At the same time, we have seen that the close integration of algorithmic technologies into processes of global outbreak detection does signal a distinct shift or inflection within Foucault’s influential understanding of governmentality – especially in terms of how security knowledge is today problematized and understood, as well as the wider role it performs. Antoinette Rouvroy (2011) has usefully coined the term ‘algorithmic governmentality’ to capture the emergence of a new governmental rationality made possible by the growing complexity of information technology – one that ‘utilises algorithmic processes of data-aggregation, translation and prediction’ and that is ‘chiefly oriented towards the neutralization of potentiality. 5 The syndromic surveillance systems analysed here constitute evidence that such a specifically algorithmic governmentality is also beginning to manifest itself more fully in the area of global health security. More than that, they show that attempts to strengthen global health security – with its dual concern about the threat posed by pandemics and bioterrorist attacks – have actually been a key driver behind the international development, enhancement and proliferation of new algorithmic security systems.
As with debates about the utility of ‘Big Data’ more generally, pertinent questions still remain about how effective, accurate and reliable these new syndromic surveillance systems are overall. This issue emerges particularly clearly in relation to Google Flu Trends. This system sought to improve early detection through the monitoring of ‘health-seeking behaviour in the form of queries to online search engines, which are submitted by millions of users around the world each day’ (Ginsberg et al., 2009: 1012). The idea behind the system was that as people began to suffer from the flu, they would be likely to search the internet for information on flu symptoms and remedies. It might thus be possible to trace the patterns of such search activities, and to aggregate this data into ‘historical logs of online web search queries’, in order to predict the onset and scale of the flu outbreak (Ginsberg et al., 2009: 1012). Initially, the system attracted a lot of positive attention and acclaim after reports that it was considerably quicker than the US Centers for Disease Control and Prevention in identifying the onset and scale of seasonal flu in the United States (Ginsberg et al., 2009). However, it later emerged that the system was not nearly as powerful or accurate as initially thought, when in 2012 the system ‘identified a sudden surge of flu cases but overstated the amount’ (Mayer-Schönberger and Cukier, 2013: 2–3). More recently, there has been prominent coverage of the ‘failure’ of the digital surveillance system Google Flu Trends to accurately predict oncoming seasonal influenza arcs (Butler, 2013; Lazer et al., 2014). The uneven experience with Google Flu Trends thus serves as a pertinent reminder that the accuracy and effectiveness of these new surveillance systems remains an open question.
In the end, however, it is not just the effectiveness of such systems that matters. Beyond their accuracy, the very existence and introduction of such new systems can also have wider political ramifications. In the case of global health security, there is evidence that the introduction of new syndromic surveillance systems is already changing the way states are identifying outbreaks and exercising their sovereignty. The revised International Health Regulations (WHO, 2005), for example, officially authorized the utilization of such unofficial health data generated by these surveillance models to further inform and guide responses to emergent and probable public health emergencies. Indeed, as early as 2001, Health Canada and the WHO entered into an agreement that GPHIN would supply the WHO with monitoring data while the WHO would engage in verification processes with its member-state contacts, officially consolidating the GPHIN surveillance system within the operational frameworks of the WHO’s Global Outbreak Alert Response Network (Heymann and Rodier, 2001, 2004). In that sense, syndromic surveillance systems have formed a significant part of the global outbreak surveillance landscape for more than a decade now.
The existence of these new systems will likely also bear on the deliberations of governments regarding their decisions about whether and how quickly to report new infectious outbreaks to the rest of the international community. David Fidler (2004: 144) argues that such ‘unofficial health information and global alerts now act as a global health governance pincer that squeezes the state’s sovereign decision of whether to report outbreak information and to cooperate with the WHO and other countries’. Governments considering whether or not to rapidly declare a new outbreak to the international community will now need to weigh up the chances that others will become aware of the outbreak through such syndromic surveillance systems, or that they may even be aware of an outbreak already. Although it is difficult to determine the precise motivations and actions of governments, evidence is emerging that the introduction of such syndromic surveillance systems appears to be having an effect. For example, a quantitative assessment in the timeliness of infectious disease outbreak reporting (Chan et al., 2010) points to a shortening of the interval between detection and public communication of outbreaks since the introduction of these new systems. However, the progressive integration of algorithms into global outbreak detection is already giving rise to a more nuanced and diversified global health surveillance regime in which, as Weir and Mykhalovskiy (2010: 104) argue, ‘sovereign states did not disappear at the turn of the 21st century, but they are now reactive to the presence of a transnationalized system of outbreak communication that they themselves have politically and legally authorized’.
Footnotes
Funding
The research leading to this article has received funding through the Chancellor’s International Research Scholarship (2013–2016) from the University of Sussex, UK.
