Mapping the Cyberthreat Landscape in Healthcare Using GDELT: A Multimethod Approach

Abstract

Cyberattacks that target critical national infrastructure, such as hospitals, pose a significant threat to the safety and wellbeing of individuals, as evidenced by incidents like the WannaCry worldwide ransomware attack. To better understand vulnerabilities within the healthcare sector and develop preventive measures, it is crucial to examine the evolving nature of cyberthreats and the types of attacks occurring. In this article, we describe a multimethod approach comprising social networks analysis, natural language processing, and machine learning, using data from GDELT (Global Database of Events, Language, and Tone), to identify the prevalence of attacks on hospitals while considering the type of attack and its date. Through this approach, meaningful patterns in the evolution of cyberattacks are revealed by analyzing the relationships between emerging cyberattacks mentioned in news reports. Findings show that the number of attacks from 2017 to 2023 increased substantially, with hospitals being more prone to critical attacks such as cyberterrorism/state actor-sponsored criminal activities, advanced persistent threats, and distributed denial of service. Mapping real-time data from diverse sources using a multimethod approach, such as the framework proposed in this article, can lead to better understanding of the threat landscape. This is a crucial step in determining necessary cyberdefenses and informing the development of policy interventions to ensure the cybersecurity of critical national infrastructure.

Introduction

Critical national infrastructure (CNI) refers to the pivotal systems in a country that enable the effective functioning of society, including systems that provide electricity, healthcare services, water, or telecommunication services.¹ Historically, the focus has been largely on physical CNI assets and their protection; however, this view has changed with the digital transformation and increasing dependence on digital infrastructure. These systems are evolving at a rapid pace, and hence, a holistic view of the threat landscape is required to manage and reduce the risks associated with CNI.² The healthcare sector in particular has been targeted by attackers. Since the WannaCry worldwide ransomware attack in 2017, there has been a huge increase in the number of cyberattacks on hospitals. Indeed, the European Union Agency for Cybersecurity (ENISA) has suggested that nearly 53% of the cyberattacks in the European Union targeted healthcare organizations.³

Hospitals are complex organizations⁴ that have a unique mission and operate in a context where allocation of resources is crucial to the delivery of clinical care, clinical research, and education. In this context, mitigating cybersecurity vulnerabilities represents a challenge for hospitals that do not have adequate resources, such as budgets,⁵ technical know-how,⁶ and consequently, an understanding of the threats,⁶ thereby causing hospitals to lag behind other industries.⁷ The impact of cyberattacks on hospitals is profound and its ramifications can go beyond monetary or reputational damage. There are negative impacts of these attacks on patient outcomes as evidenced by the 2022 attack on the UK National Health Service (NHS) 111, a free service for urgent healthcare and advice, which was rendered inoperable and inhibited thousands from receiving care during the pandemic.⁸ For instance, delay in receiving medical care led to the death of a patient in Germany due to a ransomware attack.⁹

To effectively counter cyberthreats and prepare organizational and industrywide strategies, up-to-date empirical data are required to map the types of attacks that are prevalent for specific sectors.¹⁰ Making informed decisions about cybersecurity requires collecting accurate data from various sources, as well as aggregating and enriching the information presented. Such decisions are typically based on organizational-level data using conventional threat signals received through intrusion detection systems for comparing traffic patterns with baseline data. These data do not usually include unconventional signals through media sources and news outlets that are not linked specifically to known vulnerabilities in a targeted organization. While not all unconventional signals are useful or effective, and collating such data can be challenging and expensive, incorporating such data is crucial for advancing scientific understanding and mapping the threat landscape. As such, several researchers argue that dependable data and trustworthy data sources are lacking, especially those that show the evolution of threats mapping both regional and industry-specific contexts.^11-13

To bridge this gap, we developed and implemented a multimethod framework to explore the trends of cyberattacks on global hospitals from 2017 to 2023, with a particular focus on identifying patterns and relationships between different types of attacks and hospitals. We used unconventional data sources obtained through the Global Database of Events, Language, and Tone (GDELT). GDELT is an open platform from which organizations, news, locations, themes, people, quotes, or images can be extracted and identified. GDELT had been used by several researchers to map and predict events such as social unrest,¹⁴ future violence levels in specific regions,¹⁵ and stock market prices.¹⁶ Our study used social network analysis (SNA), machine learning (ML), and natural language processing (NLP) methods to collect real-time data from several sources, obtain data on past cyberattacks, classify these attacks based on their severity, and analyze the evolving threat landscape.

Our research contributes to the health cybersecurity research domain in 2 distinct yet interconnected ways. First, it provides a robust methodological framework that enables enhanced threat detection and response through real-time data collection and pattern identification. Second, the framework enables practitioners and organizations to make better risk assessments and data-driven decisions, helping to overcome uncertainties in predicting cyberattack trends and be more proactive in reducing failure rates.¹⁷ Healthcare organizations are thus able to identify potential vulnerabilities, anticipate threats, and implement proactive measures while effectively prioritizing their resources. This systematic approach to understanding the threat landscape is fundamental for safeguarding critical infrastructure and ensuring the continuum of care for patients.

Cyberattacks on Hospitals

Cyberattacks involving hospitals have become an alarming global issue, with a growing number of incidents resulting in significant consequences for healthcare institutions, their patients, and staff.¹⁸ Protecting sensitive data has become especially concerning considering the emergence of new technologies such as robotics, artificial intelligence (AI), and wearable devices that introduce risks related to data privacy and security.¹⁹

The integration of new technologies with legacy infrastructure creates a complex ecosystem vulnerable to attacks.^20-22 These attacks can be classified based on their severity as outlined in the Cyber Incident Response Plan (CIRP) framework.²³ A CIRP outlines how an organization can effectively respond to cybersecurity incidents and is written to ensure that the incidents cause minimal damage to the organization. Creating a CIRP that clearly defines processes, roles, and responsibilities will enable organizations to understand communication plans and standard operating procedures, identify the source of the incident, and ensure that steps are taken to contain the threat and isolate the enterprise from the attacker.

The CIRP classifies attacks based on the impact to the organization, with 4 levels of attacks.²⁴ Critical (Level 4) incidents are defined as those with catastrophic impacts that can pose a threat to the life of individuals (patients in our case), and can cause substantial destruction to hospital capabilities and their IT applications or systems with major reputational and financial losses.²³ High severity (Level 3) incidents are defined as those where the impacts are “substantial to the proper conduct of” hospitals and can cause “impactful destruction” to the hospital capabilities and their IT applications or systems with a substantial reputational and financial loss. In our context, critical and high severity incidents are primary infiltration attacks and include those where patients are directly impacted, such as disruption of critical medical equipment or compromised patient records.^23,24 The classification of an incident as critical or high severity depends on its degree of impact. Moderate severity (Level 2) incidents are defined as those where the impacts are “moderate to the proper conduct of” hospitals and cause moderate disruptions over a period or affect several wards. This results in limited damage in terms of financial and reputational loss.²³ Moderate severity incidents are secondary infiltration attacks that affect patients indirectly and have implications for the hospital such as data exposures, breaches to billing systems, or attacks on the supply chain. Low severity (Level 1) incidents are those where the impacts are “generally limited to the proper conduct of” hospitals and do not usually disrupt organizational processes, or in some cases, maybe a single ward is impacted. It does not cause any direct patient harm and are usually tertiary infiltration attacks that target the broader hospital infrastructure.²⁴

Incidents can be categorized into these 4 levels based on documented sources for the attack types and severity. However, the framework does not provide specific categorization for hospitals/CNI. Using guidance from the CIRP framework, we have listed attacks on hospitals and their impact. Table 1 summarizes the types of attacks on hospitals from 2015 to 2023, including details on their effects, severity level, and real-world examples. For instance, distributed denial of service (DDOS) attacks are a high severity level with the aim of impacting patient care and disrupting critical services. A notable example occurred in 2020 at the University of Vermont Medical Center, where a DDOS attack disrupted patient appointments and delayed elective procedures, resulting in financial losses of US$1 million.⁵

Table 1.

Types of Cyberattacks on Hospitals, 2015-2023

Type of Attack	Effect	Severity	Example	Impact
AI-based attacks	Misdiagnoses, compromised treatment recommendations	Moderate (Level 2)	AI-driven diagnostic tools may be intentionally manipulated, leading to incorrect diagnoses. These could be due to the limitations of the AI tool that exacerbate existing disparities and provide biased results.²⁷	Patient care: Delayed or incorrect treatment
DDOS	Loss of critical services, impact on patient care	High (Level 3)	In 2020, the University of Vermont Medical Center suffered a DDOS attack, affecting patient appointments and delaying elective procedures.	Financial loss: US$1 million⁵
Insider threats	Unauthorized data access, sabotage	Moderate (Level 2)	In 2022, a previous staff member of BayCare Health System in Florida illicitly accessed patient records of 193,947 patients, resulting in the possible exposure of protected health information. The breach was attributed to tracking pixels used by Advocate Aurora Health, a partnering company. These tracking pixels, typically used for targeted marketing and monitoring visitor activity, inadvertently revealed information about patient engagements with BayCare Clinic’s patient portal.	Reputation damage: Loss of trust and further scrutiny from public, regulatory authorities Financial loss: Class action against the company²⁵ Patient impact: Loss of privacy
Internet of Things vulnerabilities	Patient safety risks, data exposure	Moderate (Level 2)	In 2023, Medtronic’s insulin pumps were found vulnerable to remote attacks whereby the attacker can alter insulin dosage to the patients from an adjacent network.	Patient safety: Risk of insulin overdose²⁶
Phishing	Unauthorized access to sensitive data	Moderate (Level 2)	In 2021, Finnish psychotherapy centre Vastaamo faced a phishing breach, exposing patient therapy records. Patients received ransom emails demanding²⁴³ €200 in Bitcoin payment to not publicly expose discussions with therapists.	Reputation damage: Loss of trust Patient impact: Mental distress for patients who sought victim support services
Ransomware	Data loss, operational disruption	Critical (Level 4)	In 2017, the WannaCry attack paralyzed the UK NHS, delaying treatment plans and rerouting ambulances.	Financial impact: Cost the NHS over £92 million²² Patient impact: Over 19,000 appointments were canceled²² and 34% of NHS trusts were disrupted²³
Robotic surgery vulnerabilities	Surgical errors, patient harm	High (Level 3)	In 2022, a group of researchers simulated cybersecurity attacks that could potentially disrupt a robotic-assisted surgery, resulting in unintended incisions.	Patient safety: Surgical errors can result in bleeding, infection, and other adverse outcomes²⁸
Wearable device exploits	Unauthorized data access, privacy breaches	Moderate (Level 2)	In 2021, 61 million records of individuals containing sensitive health data were inadvertently leaked from an unsecured database from the company GetHealth.	Patient safety: Exposure of personal health information breaching privacy and potential harm if data is tampered with²⁹
Man in the middle	Intercepting sensitive data and/or compromise patient care	High (Level 3)	In 2015, UCLA Health experienced a breach from a MITM attack, which resulted in the theft of patient data and the compromise of 4.5 million patients.³⁰	Loss of reputation: Apology from the hospital to patients Potential identity theft Financial loss: Class action lawsuit by patients affected with over $7.5 million paid in settlement

Abbreviations: AI, artificial intelligence; DDOS, distributed denial of service; MITM, man in the middle; NHS, National Health Scheme; UCLA, University of California Los Angeles.

The impact of cyberattacks on hospitals extends beyond financial losses. Other consequences include operational disruptions, compromised patient records, and erosion of trust between healthcare providers and patients. For example, a ransomware attack on the Hollywood Presbyterian Medical Center in 2016 had a substantial impact, not only in terms of financial and reputational damage but also disruption of workflows and continuum of acute care for patients.²⁵ An article by Neprash et al²⁶ describes how emergency room nurses were unable to handle patient care after a ransomware attack at St. Michael Medical Center in Silverdale, Washington, and the nurses requested emergency services to help them handle redirecting patients. Patient safety can be threatened due to cyberattacks, especially when they involve the use of electronic medical devices.²⁷ Beyond the continuum of care, cyberattacks have also led to deaths of patients. Between 2016 and 2021, an estimated 42 to 67 deaths of Medicare patients occurred due to system issues caused by ransomware attacks.²⁸ It is also estimated that the total number of deaths would surpass this approximation because it does not include patients with other types of health insurance. Other examples include a cyberattack on the Anna Jaques Hospital in Massachusetts on December 24, 2023, resulting in the rerouting of ambulances and transfer of patients to other facilities, with substantial delays in patient care.²⁹ Similarly, the UK NHS trusts’ pathology system provider suffered a ransomware cyberattack on June 3, 2024. This caused severe disruptions with major trusts such as King’s College Hospital, Guy’s and St. Thomas’ NHS Foundation Trust, and the transplant centers at Harefield Hospital, Royal Brompton Hospital, and Evelina London Children’s Hospital. Damages included the cancellation of several thousand appointments and surgeries as well as disruptions to general practitioner services.³⁰

Independent of the perpetrator’s motive or the nature of the cyberattack, the purpose of cybersecurity revolves around safeguarding the digital assets of the organization to forestall and mitigate the repercussions from cyberattacks. To do this, it is necessary to have an in-depth understanding of the landscape of malicious attacks evolution in conjunction with emergent technologies.

Methods

Overview

In this study, we propose a multimethod framework for collecting a comprehensive dataset of news articles pertaining to cyberattacks on hospitals using GDELT. GDELT is a database that tracks global news from a wide range of sources, including broadcast, print, and online platforms, spanning over 100 languages. It offers the possibility to collect information on key individuals, locations, organizations, themes, and events that influence global society. The use of extensive data sources, such as GDELT, is becoming increasingly common for analyzing shifts in news coverage regarding actors, events, and sentiments.³¹

We used NLP, ML, and SNA to map interdependencies between the reported events and the entities—in this case, cyberattacks, and hospitals respectively. Extending the data-driven cybersecurity approach,³² a 5-step process was adopted to include data extraction, data scraping, data preprocessing, feature extraction, and SNA (Figure 1).

Figure 1.

Five-step methodological framework for collecting a comprehensive dataset of news articles using GDELT. Elongated rectangles indicate the beginning (data extraction) and end of the process (social network analysis), smaller rectangles show the actions, and diamonds show decisions that must be adopted. Arrows indicate the directional flow of the process. Abbreviations: CSV, comma-separated values; DDOS, distributed denial of service; DOS, denial of service; GDELT, Global Database of Events, Language, and Tone; NAN, not a number; NLP, natural language processing; URL, uniform resource locator.

While the steps themselves are not new, the combination of steps as a methodological framework is novel because it enables healthcare organizations to have a broader picture of the threat landscape. The methodological framework was applied to cybersecurity in this study, but it can be replicated in other contexts by modifying the search term selection, time period, and relevant features for further analysis.

Five-Step Process

Step 1: Data Extraction

This is the first step in the process and comprises 4 substeps. The first substep is source identification, whereby the selection criteria and GDELT were identified after careful consideration to ensure comprehensiveness and relevance. From GDELT, the Uniform Resource Locator (URL) for each of the articles had to be extracted. This was done in the second substep using a web crawler or bot that extracted the information. This step included identification of the correct keywords and time period for which the search had to be done. Keywords included “cyber attack,” “information attack,” “data attack,” “DDOSs,” “DOS,” “ransomware,” “information security attack,” “cyber breach,” “information security breach,” “data breach,” “hospital,” and “healthcare.” The time period was restricted from 2017 to 2023 to see how the trends in attacks had changed before, during, and after the COVID-19 pandemic. This substep also included pagination to ensure the web crawler could navigate several pages in a systematic manner. The Beautiful Soup Python library ensured the web crawler iterated through all pages and extracted dates from each of the web pages, ultimately creating a date list, navigating the events index page, and downloading the data by day.

The articles were then grouped by year for longitudinal analysis. Once the GDELT data files were downloaded, the comma-separated values (CSV) files were read using the Pandas Python library. Initially, 1,045,076 articles were downloaded, but several were false positives with varied other types of attacks on hospitals or generic cyberattacks on other sectors. Once the web crawler was configured, the next step was to validate the initial data with checks to ensure that only relevant data were included. Articles that did not pertain to cyberattacks on hospitals were removed in this substep, resulting in a total of 39,796 articles included. Further actions were taken to include only English-language articles, which reduced the total articles to 27,789. The final substep was metadata collection in which a CSV file with 3 columns—URL, date, and title—was extracted and stored.

Step 2: Data Scraping

In the second step, we extracted information from the different web pages identified in step 1 using the Beautiful Soup Python library. Extracted information included the content where the keywords (used in step 1) appear in the document (indicating relevant and necessary content for the analysis), organizational names, document title, and publication date. This was implemented through the Python requests library, resulting in the content being retrieved in HTML (hypertext markup language) format. A Beautiful Soup object in Python was then created that enabled us to parse the HTML structure. Specific information was extracted through the use of functions such as “find and find all,” especially for scanning the text within the HTML data and filtering the lines that contained any or all of the keywords. This approach enabled us to obtain and organize all necessary data to be used in the next steps for analysis and to capture all relevant information from the articles extracted. A new column “Context” was added to the existing CSV file to store the news content. To avoid issues with scraping, a 10-second timeout was established and sites that were nonresponsive returned “not a number” (NAN) in the context column. This was particularly useful as a robust error handling mechanism to manage issues like CAPTCHA (completely automated public Turing test to tell computers and humans apart), dynamic content loading, and access restrictions.

Step 3: Data Preprocessing

In the third step, we checked whether the data obtained from the first 2 steps were consistent and could be used. This included data cleaning whereby inconsistencies and issues with missing data were handled. For example, in some cases, the context column included records that were not consistent due to several reasons such as unresponsive websites, broken links, issues with CAPTCHA, dynamic content, or access-restricted sites that returned NAN in this column. The next substep included identifying non-English character sets and unintelligible data, which were removed. Duplicated records were also removed, resulting in a total of 18,009 articles included that mentioned 18,009 different attacks on hospitals.

Step 4: Feature Extraction

In the fourth step, the following 5 attributes were obtained from the news articles: type of cyberattack, severity level, hospital name, hospital type, and whether the hospital was private or public. Several NLP techniques, including tokenization, lowercase conversion, stop word removal, and lemmatization were used³³ to extract cyberattack keywords from the articles. First, the tokenization process was used to separate sentences in an article, which is crucial because it splits the article into smaller tokens that can be used later in the analysis. Next, we converted all tokens to lowercase to ensure accuracy in any comparisons of the data, and we removed frequently occurring text such as prepositions and articles using the stop-word removal process. Finally, we applied the lemmatization process, using morphological analysis and vocabulary to identify the base form of words. For example, words such as “studies” or “studying” would have the common base word “study.”

Once these techniques were completed, we used the spaCY Python library to conduct Named Entity Recognition, which involved parsing the text to extract relevant entities and categorize them appropriately. For example, we identified and extracted hospital names, types of cyberattacks, and severity levels. The types and severity levels of cyberattacks were based on the literature described in the previous section. All names of hospitals were extracted and, finally, hospital types were defined based on a list of predefined keywords from the literature to include rural hospitals, teaching hospitals, and clinics.³⁴ This step resulted in a list of 18,073 hospitals over the 6-year period that have suffered a cyberattack.

Table 2 shows a sample of the final dataset that we collected including (1) the link to the news article discussing the cyberattack; (2) the date the news article was published; (3) the title of the news article; (4) a brief summary of the incident; (5) the type of cyberattack; (6) the severity level of the attack; (7) the targeted hospital; (8) the type of hospital; and (9) whether the hospital is public or private. This systematic approach ensures that data are accurately categorized and prepared for identifying trends and patterns in cyberattacks on hospitals.

Table 2.

Sample Line of the Final Dataset

Link to News Article	Date of Publication	Title of News Article	Brief Summary	Type of Attack	Targeted Hospital	Hospital Type	Public/ Private	Severity Level
https://www.digitalhealth.net/2017/01/update-trojan-malware-blamed-for-barts-cyber-attack-2/	12/12/2017	NHS trusts vulnerable to cyberattacks due to irregular app testing	England’s biggest NHS trust says malware was behind a cyberattack that forced the trust to shut down some IT systems for 4 days.	Ransomware	[‘Newham University Hospital’]	[‘Teaching Hospital’]	[‘Private Hospitals’]	Level 4

Abbreviations: IT, information technology; NHS, National Health Service.

Step 5: Social Network Analysis

The final step in the process is SNA, a research method involving tools and techniques used to explain how social entities are connected and interact with one other. It focuses on relationships and enables us to understand how patterns can be used to map the evolving cyberthreat landscape.

We created a bipartite network containing 2 different sets of nodes.³⁵ One set represents hospitals, and the other set represents types of cyberattacks. The edges represent the relationships that exist between the sets of nodes. In this context, an edge connects a hospital to a cyberattack if, for example, a particular hospital has experienced a specific type of cyberattack. In other words, the edge between the node represents the occurrence of a particular type of cyberattack at a specific hospital by year. Using this structure is beneficial because it provides insights into the relationships between hospitals and cyberattacks. Attributes such as the types of hospitals, severity levels, and countries are added to the data.

To identify the most important nodes within the network, we used the degree centrality measure,³⁵ which is the number of cyberattacks on hospitals, and for each cyberattack, the number of hospitals it targeted. A higher degree centrality for a cyberattack indicates that it is more widespread and has affected a larger number of hospitals. This can help identify the most common or pervasive types of cyberattacks in the healthcare sector. We also conducted temporal analysis using the NetworkX Python library to observe how the network evolves over time, providing insights into trends and shifts in the cyberthreat landscape.

Results

The trend in the number of articles about cyberattacks on hospitals by year between 2017 and 2023—of the 18,009 attacks reported in the literature—is shown in Figure 2.

Figure 2.

Upward trend in the number of articles about cyberattacks on hospitals by year, 2017 to 2023.

Table 3 provides the number of cyberattacks on hospitals by year from 2017 to 2023, categorized by their severity levels. It indicates that compared to the 1,987 attacks reported in 2017, 3,416 attacks were reported in 2023, a 72% increase. Classifying these attacks based on severity level also shows a substantial increase in the number of critical (level 4) attacks over the years. A total of 1,116 critical attacks were reported in 2017, while 2023 had 3,123 critical attacks, an increase by 179%. This indicates an overall increase in the number of cyberattacks over the years, with a notable rise in critical severity level attacks. Ransomware and advanced persistent threats (APTs) were classified as critical; DDOS and man-in-the-middle (MITM) attacks were classified as high; denial of service (DOS), phishing, password attacks, and SQL injections were classified as moderate; and attacks such as DNS (domain name system) attacks or URL interpretation were classified as low, based on the literature.²⁴

Table 3.

Number and Severity Level of Cyberattacks Against Hospitals by Year, 2017 to 2023 (N=18,009)

Severity Level	2017	2018	2019	2020	2021	2022	2023
Critical (Level 4)	1,116	1,252	1,256	1,988	2,583	2,609	3,123
High (Level 3)	738	750	757	432	98	116	209
Moderate (Level 2)	79	89	121	188	63	98	101
Low (Level 1)	54	52	55	80	10	9	17
Total	1,987	2,143	2,189	2,688	2,754	2,832	3,416

Figure 3 shows the distribution of cyberattacks across 2,898 hospitals during the COVID-19 pandemic in 2020. Cyberespionage, cyberterrorism, DDOS, MITM, and Ransomware were the most frequent types of cyberattacks targeting hospitals.

Figure 3.

Temporal bipartite network showing the distribution of cyberattacks across 2,898 hospitals during the COVID-19 pandemic in 2020. The blue triangles on the right represent the different types of cyberattacks. Ransomware, cyberespionage, cyberterrorism, DDOS, and MITM were the most frequent types of cyberattacks targeting hospitals. The red dots on the left represent the hospitals. The gray lines represent the relationships between hospitals and types of cyberattacks, indicating which types of attacks target specific hospitals. Abbreviations: DDOS, distributed denial of service; MITM, man in the middle.

Finally, to understand how the threat landscape has evolved, centrality measure of degree was calculated to identify the types of attacks that most hospitals suffered over time. Table 4 provides an overview of the top 5 types of cyberattacks by year and degree centrality, indicating the prominence of each type of attack within the data. In 2017, phishing had the highest degree centrality value at 4.690, indicating that it was the most significant attack type that year. By 2023, the situation had changed considerably, with APTs having the highest degree centrality value at 6.873. Ransomware also became much more common in 2023, with a degree centrality value of 6.452, compared to 0.188 in 2017. DDOS attacks were also prominent, with a value of 3.976 in 2023. Interestingly, phishing, which was the most prominent type of attack in 2017 with a degree centrality value of 4.690, had a much lower degree centrality value of 0.645 in 2023, indicating a relative decrease in its prominence compared to other attack types. Overall, these data indicate a shift away from phishing and toward APTs and ransomware as the most critical threats in 2023. This shift highlights the increasing complexity and sophistication of cyberattacks over the years. An increase in the prominence of APTs and ransomware suggests that attackers are focusing more on persistent and highly damaging attacks, which reflects the evolving strategies in the cyberthreat landscape.

Table 4.

Top 5 Types of Attacks by Year and Degree Centrality, 2017–2023

Year	Type of Attack	Degree Centrality Value (Normalized)
2017	Phishing	4.690
	Password attack	1.250
	SQL injections	0.563
	DNS	0.438
	Ransomware	0.188
2018	Phishing	4.592
	DOS	2.095
	MITM	0.905
	SQL	0.619
	Ransomware	0.381
2019	Phishing	4.312
	Social engineering	2.988
	DDOS	1.876
	Ransomware	1.451
	Password attack	0.976
2020	Ransomware	3.928
	Cyber espionage	3.703
	Cyber terrorism/state actors	3.561
	DDOS	2.175
	MITM	1.295
2021	Cyber terrorism/state actors	4.373
	Ransomware	4.194
	APT	3.200
	Phishing	1.329
	DOS	1.311
2022	Ransomware	5.386
	APT	3.876
	DDOS	2.762
	DOS	1.843
	MITM	0.990
2023	APT	6.873
	Ransomware	6.452
	DDOS	3.976
	MITM	0.972
	Phishing	0.645

Abbreviations: APT, advanced persistent threat; DOS, denial of service; DDOS, distributed denial of service; MITM, man in the middle; SQL, structured query language.

An examination of the attributes obtained from the feature extraction in step 4 shows that teaching hospitals, public hospitals, and academic medical centers are more prone to critical (Level 4) and high (Level 3) severity attacks including cyberterrorism/state actor-sponsored criminal activities, APTs, and DDOS, whereas community hospitals, children’s hospitals, and clinics are more prone to low (Level 1) and moderate (Level 2) severity attacks.

Discussion

This study offers a novel methodological approach for collecting and analyzing different types of cyberattacks on hospitals. It maps the relationship between attacks and hospitals, identifies various types of attacks, and distinguishes the most prominent attacks affecting hospitals over time. The healthcare sector has experienced an increasing number of attacks as shown in our findings. This is because hospitals serve as vital components of a country’s healthcare infrastructure, managing sensitive patient data, medical records, and essential services. Consequently, they become prime targets for cybercriminals, state-sponsored actors, and hacktivists due to the valuable information they hold including patient health records, research data, and intellectual property.³⁶

In analyzing attack patterns, we found 2 distinct trends. Low severity (Level 1) cyberattacks such as phishing attacks, DOS attacks, or SQL injections have decreased due to improved preparedness, including implementation of multifactor authentication and regular audits.³ This aligns with the recent literature showing that hospitals have increased their cybersecurity capabilities and the familiarity the organization has in dealing with such attacks and constant threat monitoring capabilities for phishing tactics can be attributed to decrease in the number of phishing attacks.³⁷ Another study⁶ showcases that hospitals’ preparedness have increased as most hospital organizations now implement preventive measures such as multifactor authentication and conduct regular audits which lead to effectively deterring low level attacks such as SQL injections or DOS attacks despite these remaining a threat. However, we observed that critical (Level 4) and high (Level 3) severity attacks are increasing, with public hospitals and academic medical centers being particularly vulnerable to ransomware and cyberterrorism. This aligns with a recent study emphasizing an increase in ransomware attacks as hospitals lack adequate training in cybersecurity awareness and best practices, creating vulnerabilities that attackers can exploit.³⁸ Indeed, since 2020, there has been a substantial increase in critical attacks such as cyberespionage and state-sponsored attacks, which were not as prevalent before the COVID-19 pandemic. Specifically, these 2 hidden threats were more frequent as there were further benefits for the threat actor such as intellectual property theft and getting an oversight of intelligence operations.³⁹ Similarly, state-sponsored attacks—especially from hacker groups in China, North Korea, and Russia—operated covertly in the cyberspace to evade political responsibilities.^40,41 Moreover, nation states may target healthcare institutions for geopolitical reasons, leading to cyberterrorism, espionage, and other state-sponsored criminal activities that can compromise national security.^42,43

Despite regulatory requirements like medical device regulations, good clinical practice, and the General Data Protection Regulation (GDPR), many healthcare organizations remain reluctant to disclose incidents, leading to substantial fines. For example, within the European Union, 163 fines have been issued to healthcare organizations due to noncompliance with the GDPR, accounting for €16 million.³ Indeed, as pointed out in the ENISA report,³ most healthcare organizations rely hugely on incidents that are publicly disclosed by the victim organizations and are thus dependent on deliberate disclosures or sometimes through unintentional data leaks.

To this end, our multimethod framework enhances pattern identification for organizational-level policymakers while automating security data gathering. Using the proposed methodological framework enables better pattern identification that cyber policymakers at the organizational level can use to summarize huge volumes of information to obtain actionable intelligence. This also helps organizations automate security data gathering, thereby minimizing manual effort, which enables cybersecurity professionals to focus more on strategic tasks. This is especially important as a recent study⁴⁴ shows that many hospital organizations, especially in times of economic downturns, have reduced their investments in the health workforce and technologies, instead implementing absorptive capacity policies that focus only on critical hospital functions. Hence, cybersecurity professionals will not have the luxury of large teams and will need to prioritize tasks effectively. They must identify and address the most critical threats first, ensuring that limited resources are used efficiently to maintain robust security postures.

Integration of advanced ML, NLP, and SNA methods for data collection will provide a better understanding of the dynamic threat landscape, which can enhance cybersecurity standards such as ISO 27000 series and IEC (International Electrotechnical Commission) 62442 and policy development. Indeed, the need to use and integrate advanced methodology is also emphasized by a recent study⁴⁵ that shows the importance of supporting real-time data analysis risk assessment and decisionmaking processes. More specifically, a robust methodological framework, such as the one we propose, will enable enhanced threat detection and response, as well as better risk assessments, and will pave the way for data-driven decisionmaking. Our multimethod framework has the capability to analyze large volumes of real-time data and identify patterns to detect anomalies that could indicate emerging threats. For instance, while phishing attacks are still common, our real-time data show that the number of successful attacks reported based on the impact has declined. It is important to note that APTs and critical attacks have increased. Understanding these trends is vital because this capability enables the development of robust cybersecurity policies and standards that also allow for adaptive and proactive threat detection and cyber response.⁴⁶ Similarly, using the multimethod framework and adding predictive capabilities can enable policymakers to accurately predict potential threats and refine risk assessment methodologies and standards to ensure they are effective and relevant in the evolving threat landscape. Cybersecurity professionals can then better prepare for a response by dynamically allocating resources to counter the risks.

Limitations and Future Work

This study provides a methodological framework for extracting real-time data to map the cyberthreat landscape in the healthcare sector. While some studies have quantitatively specified the types of attacks,^11,12,22 they relied on voluntarily disclosed data. There have not been many studies that use a multimethod approach including ML, NLP, and SNA, as well as real data from conventional and unconventional signals—such as GDELT—to investigate and map the current types of attacks on the healthcare sector. There are some limitations to this study. First, we applied SNA by using centrality measure of degree to evaluate the most prevalent attacks; however, future research could combine network measures with hospitals attributes (eg, countries, performance indicators, size) to analyze whether hospitals that occupy a central position in the network are more prone to experiencing state-sponsored attacks. Along the same lines, network models can be applied to observe how the network evolves over time. Second, we used longitudinal data to showcase how the severity of threats has changed over time, but we did not use any predictive models. Future researchers could incorporate predictive modeling using GCN (graph convolutional network) or node2Vec methods based on the historic data available. Being able to predict future events with high precision will certainly help healthcare organizations to not only understand the current landscape but also improve their preparedness for such events.

Conclusion

This study developed an important framework for implementing a multimethod approach to collect real-time data from several sources, identify patterns and relationships between attacks and hospitals, and explore cyberattack trends in the healthcare sector. The results of our analysis show an exponential growth in critical (Level 4) and high (Level 3) severity attacks, while low (Level 1) severity attacks have diminished over time. The approach presented in this study allows healthcare institutions to identify potential vulnerabilities, anticipate threats, and implement proactive measures to mitigate risks. Furthermore, it enables them to prioritize resources effectively, focusing on areas with the highest risk or potential impact. As the frequency and severity of attacks tends to increase, understanding the threat landscape is a fundamental step in safeguarding this critical infrastructure.

Footnotes

Acknowledgments

The authors wish to acknowledge funding received by Networks and Urban Systems Centre (NUSC) from the University of Greenwich (2022-2023).

References

National Cyber Security Centre (NCSC). NCSC Annual Review: case study: securing the UK’s critical national infrastructure. Published November 14, 2023. Accessed February 12, 2024. https://www.ncsc.gov.uk/collection/annual-review-2023/resilience/case-study-securing-cni

National Cyber Security Centre (NCSC). NCSC warns of enduring and significant threat to UK’s critical infrastructure. Published November 14, 2023. Accessed February 12, 2024. https://www.ncsc.gov.uk/news/ncsc-warns-enduring-significant-threat-to-uks-critical-infrastructure

European Union Agency for Cybersecurity (ENISA). ENISA Threat Landscape: Health Sector. Athens: ENISA; 2023. Accessed February 26, 2025. https://www.enisa.europa.eu/sites/default/files/publications/Health%20Threat%20Landscape.pdf

Jalali

, Kaiser

. Cybersecurity in hospitals: a systematic, organizational perspective. J Med Internet Res. 2018; 20(5):e10059.

Argaw

, Troncoso-Pastoriza

, Lacey

, et al. Cybersecurity of hospitals: discussing the challenges and working towards mitigating the risks. BMC Med Inform Decis Mak. 2020; 20(1):146.

Wasserman

, Wasserman

. Hospital cybersecurity risks and gaps: review (for the non-cyber professional). Front Digit Health. 2022; 4:862221.

Kruse

, Frederick

, Jacobson

, Monticone

. Cybersecurity in healthcare: a systematic review of modern threats and trends. Technol Health Care. 2017; 25(1):1-10.

British Computer Society. Biggest healthcare cyber attacks this decade. Published January 9, 2023. Accessed February 12, 2024. https://www.bcs.org/articles-opinion-and-research/biggest-healthcare-cyber-attacks-this-decade/

Eddy

, Perlroth

. Cyber attack suspected in German woman’s death. New York Times. September 18, 2020. Accessed February 24, 2025. https://www.nytimes.com/2020/09/18/world/europe/cyber-attack-germany-ransomeware-death.html

10.

Science of Security Virtual Organization. Accessed February 25, 2025. https://www.sos-vo.org/

11.

Bakdash

, Hutchinson

, Zaroukian

, et al. Malware in the future? Forecasting of analyst detection of cyber events. J Cybersecur. 2018; 4(1):tyy007.

12.

Bobowska

, Choras

, Wozniak

. Advanced analysis of data streams for critical infrastructures protection and cybersecurity. J Univers Comput Sci. 2018; 24(5):622-633.

13.

Sun

, Zhang

, Rimba

, Gao

, Zhang

, Xiang

. Data-driven cybersecurity incident prediction: a survey. IEEE Commun Surv Tutor. 2019; 21(2):1744-1772.

14.

Galla

, Burke

. Predicting social unrest using GDELT. In: Perner

, ed. Machine Learning and Data Mining in Pattern Recognition. International Conference, MDLM 2018, New York, NY, USA, July 15-19, 2018, Proceedings, Part II, . Cham, Switzerland: Springer International Publishing; 2018:103-116. Accessed February 24, 2025. https://doi.org/10.1007/978-3-319-96133-0_8

15.

Yonamine

. Predicting Future Levels of Violence in Afghanistan Districts Using GDELT. Dissertation. Pennsylvania State University; 2013. Accessed February 25, 2025. http://data.gdeltproject.org/documentation/Predicting-Future-Levels-of-Violence-in-Afghanistan-Districts-using-GDELT.pdf

16.

Jakel

. Using Sentiment Data from the Global Database of Events, Language and Tone (GDELT) to Predict Short-Term Stock Price Developments. Bachelor’s thesis. University of Twente; 2019. Accessed February 25, 2025. https://essay.utwente.nl/78614/

17.

Kwon

, Johnson

. Proactive versus reactive security investments in the healthcare sector. MIS Q. 2014; 38(2):451-472, A1-A3.

18.

Ayala

. Medical facility cyber-physical attacks. In: Ayala L. Cybersecurity for Hospitals and Healthcare Facilities: A Guide to Detection and Prevention. Berkeley, CA: Apress; 2016:39-45. Accessed February 25, 2025. https://doi.org/10.1007/978-1-4842-2155-6_4

19.

Radanliev

. Dance as a mental health therapy in the Metaverse: exploring the therapeutic potential of Dance Movement Therapy as a non-pharmacological treatment in the Metaverse. Front Comput Sci. 2024; 6:1334027.

20.

Lee

, Sokolsky

, Chen

, et al. Challenges and research directions in medical cyber–physical systems. Proc IEEE. 2012; 100(1):75-90.

21.

Coventry

, Branley

. Cybersecurity in healthcare: a narrative review of trends, threats and ways forward. Maturitas. 2018; 113:48-52.

22.

, Aliyu

, Evans

, Luo

. Health care cybersecurity challenges and solutions under the climate of COVID-19: scoping review. J Med Internet Res. 2021; 23(4):e21747.

23.

Scottish Government. Scottish Public Sector Cyber Incident Response Plan . Version v1.4. Edinburgh: Scottish Government; 2024. Accessed February 25, 2025. https://www.gov.scot/binaries/content/documents/govscot/publications/advice-and-guidance/2019/10/cyber-resilience-incident-management/documents/cyber-incident-response-toolkit-scottish-public-sector-incident-response-plan-cirp-v1-4/cyber-incident-response-toolkit-scottish-public-sector-incident-response-plan-cirp-v1-4/govscot%3Adocument/Cyber%2Bincident%2Bresponse%2Btoolkit%2B-%2BScottish%2Bpublic%2Bsector%2Bincident%2Bresponse%2Bplan%2B%2528%2BCIRP%2529%2Bv1.4.docx

24.

Argaw

, Bempong

, Eshaya-Chauvin

, Flahault

. The state of research on cyberattacks against hospitals and available best practice recommendations: a scoping review. BMC Med Inform Decis Mak. 2019; 19(1):10.

25.

van Boven

, Kusters

RWJ

, Tin

, et al. Hacking acute care: a qualitative study on the health care impacts of ransomware attacks against hospitals. Ann Emerg Med. 2024; 83(1):46-56.

26.

Neprash

, McGlave

, Rydberg

, Henning‐Smith

. What happens to rural hospitals during a ransomware attack? Evidence from Medicare data. J Rural Health. 2024; 40(4):728-737.

27.

Alhammad

, Yusof

, Jambari

. A review of cyber threats to medical devices integration with electronic medical records. Paper presented at: 2022 International Conference on Cyber Resilience (ICCR) 2022; October 6-7, 2022; Dubai, United Arab Emirates. Accessed February 25, 2025. https://dx-doi-org.web.bisu.edu.cn/10.1109/ICCR56254.2022.9995984

28.

McGlave

, Neprash

, Nikpay

. Hacked to pieces? The effects of ransomware attacks on hospitals and patients. Preprint. SSRN. Posted October 4, 2023. Accessed February 25, 2025. https://dx-doi-org.web.bisu.edu.cn/10.2139/ssrn.4579292

29.

Warminsky

. Cyberattack on Massachusetts hospital disrupted records system, emergency services. The Record. December 29, 2023. Accessed June 6, 2024. https://therecord.media/cyberattack-on-massachusetts-hospital-disrupted-health-record-system

30.

Lovell

. NHS issues urgent call for O-type blood donors following London cyber attack. Digital Health. June 10, 2024. Accessed June 20, 2024. https://www.digitalhealth.net/2024/06/nhs-issues-urgent-call-for-o-type-blood-donors-following-london-cyber-attack/

31.

Buckingham

, Brandt

, Anderson

, LF

do Amaral

, Singh

. The untapped potential of mining news media events for understanding environmental change. Curr Opin Environ Sustain. 2020; 45:92-99.

32.

Coulter

, Han

, Pan

, Zhang

, Xiang

. Data-driven cyber security in perspective-intelligent traffic analysis. IEEE Trans Cybern. 2020; 50(7):3081-3093.

33.

Pant

, Sharma

, Kundu

. An overview of stemming and lemmatization techniques. In: Kumar

, Sharma

, Chopra

, Rattan

, eds. Advances in Networks, Intelligence and Computing. London: CRC Press; 2024:308-321.

34.

Gallagher Healthcare. What are the different types of hospitals? Published March 22, 2018. Accessed February 12, 2024. https://www.gallaghermalpractice.com/blog/post/what-are-the-different-types-of-hospitals

35.

Everett

, Borgatti

. Extending centrality. In: Carrington

, Scott

, Wasserman

, eds. Models and Methods in Social Network Analysis. Cambridge University Press; 2005:57-76. Accessed February 25, 2025. https://library.uc.edu.kh/userfiles/pdf/18.Models%20and%20Methods%20in%20Social%20Network%20Analysis.pdf

36.

Ahmed

, Daclin

, Olivaux

, Dusserre

. Cybersecurity challenges for field hospitals: impacts of emergency cyber threats during emergency situations. Int J Emerg Manag. 2023; 18(3):274-292.

37.

Vanderbilt University Medical Center. Phishing attacks are targeting the health care industry; some tactics to familiarize yourself with to stay safe. VUMC News. June 5, 2024. Accessed December 18, 2024. https://news.vumc.org/2024/06/05/phishing-attacks-are-targeting-the-health-care-industry-some-tactic-to-familiarize-yourself-with/

38.

Dameff C, Tully J, Chan TC, et al. Ransomware attack associated with disruptions at adjacent emergency departments in the US. JAMA Netw Open. 2023; 6(5):e2312270.

39.

Tokat

. Cyber threats to hospitals and critical infrastructure in times of COVID-19 pandemic. Preprint. SSRN. Posted August 18, 2023. Accessed February 25, 2025. https://dx-doi-org-s.web.bisu.edu.cn/10.2139/ssrn.4539458

40.

Wiggen

. The impact of COVID-19 on cyber crime and state-sponsored cyber activities. Konrad Adenauer Foundation. Published June 6, 2020. Accessed February 25, 2025. https://www.kas.de/en/analysen-und-argumente/detail/-/content/die-auswirkungen-von-covid-19-auf-cyberkriminalitaet-und-staatliche-cyberaktivitaeten

41.

Wilner

, Luce

, Ouellet

, Williams

, Costa

. From public health to cyber hygiene: Cybersecurity and Canada’s healthcare sector. Int J: Canada J Glob Pol Anal. 2021; 76(4):522-543.

42.

UK Government. UK coronavirus (COVID-19) alert level increased from Level 3 to Level 4. Published December 12, 2021. Accessed February 12, 2024. https://www.gov.uk/government/news/uk-coronavirus-covid-19-alert-level-increased-from-level-3-to-level-4

43.

UK Government Cabinet Office. National Cyber Security Strategy 2022 (HTML). Updated December 15, 2022. Accessed February 12, 2024. https://www.gov.uk/government/publications/national-cyber-strategy-2022/national-cyber-security-strategy-2022

44.

Foroughi

, Ebrahimi

, Aryankhesal

, Maleki

, Yazdani

. Hospitals during economic crisis: a systematic review based on resilience system capacities framework. BMC Health Serv Res. 2022; 22(1):977.

45.

Radanliev

, De Roure

. Disease X vaccine production and supply chains: risk assessing healthcare systems operating with artificial intelligence and industry 4.0. Health Technol (Berl). 2023; 13(1):11-15.

46.

Liu

. Closing the gaps in security through AI and ML. Forbes. February 11, 2022. Accessed December 18, 2024. https://www.forbes.com/councils/forbestechcouncil/2022/02/11/closing-the-gaps-in-security-through-ai-and-ml/