Abstract
Cyberattacks that target critical national infrastructure, such as hospitals, pose a significant threat to the safety and wellbeing of individuals, as evidenced by incidents like the WannaCry worldwide ransomware attack. To better understand vulnerabilities within the healthcare sector and develop preventive measures, it is crucial to examine the evolving nature of cyberthreats and the types of attacks occurring. In this article, we describe a multimethod approach comprising social networks analysis, natural language processing, and machine learning, using data from GDELT (Global Database of Events, Language, and Tone), to identify the prevalence of attacks on hospitals while considering the type of attack and its date. Through this approach, meaningful patterns in the evolution of cyberattacks are revealed by analyzing the relationships between emerging cyberattacks mentioned in news reports. Findings show that the number of attacks from 2017 to 2023 increased substantially, with hospitals being more prone to critical attacks such as cyberterrorism/state actor-sponsored criminal activities, advanced persistent threats, and distributed denial of service. Mapping real-time data from diverse sources using a multimethod approach, such as the framework proposed in this article, can lead to better understanding of the threat landscape. This is a crucial step in determining necessary cyberdefenses and informing the development of policy interventions to ensure the cybersecurity of critical national infrastructure.
Introduction
C
Hospitals are complex organizations 4 that have a unique mission and operate in a context where allocation of resources is crucial to the delivery of clinical care, clinical research, and education. In this context, mitigating cybersecurity vulnerabilities represents a challenge for hospitals that do not have adequate resources, such as budgets, 5 technical know-how, 6 and consequently, an understanding of the threats, 6 thereby causing hospitals to lag behind other industries. 7 The impact of cyberattacks on hospitals is profound and its ramifications can go beyond monetary or reputational damage. There are negative impacts of these attacks on patient outcomes as evidenced by the 2022 attack on the UK National Health Service (NHS) 111, a free service for urgent healthcare and advice, which was rendered inoperable and inhibited thousands from receiving care during the pandemic. 8 For instance, delay in receiving medical care led to the death of a patient in Germany due to a ransomware attack. 9
To effectively counter cyberthreats and prepare organizational and industrywide strategies, up-to-date empirical data are required to map the types of attacks that are prevalent for specific sectors. 10 Making informed decisions about cybersecurity requires collecting accurate data from various sources, as well as aggregating and enriching the information presented. Such decisions are typically based on organizational-level data using conventional threat signals received through intrusion detection systems for comparing traffic patterns with baseline data. These data do not usually include unconventional signals through media sources and news outlets that are not linked specifically to known vulnerabilities in a targeted organization. While not all unconventional signals are useful or effective, and collating such data can be challenging and expensive, incorporating such data is crucial for advancing scientific understanding and mapping the threat landscape. As such, several researchers argue that dependable data and trustworthy data sources are lacking, especially those that show the evolution of threats mapping both regional and industry-specific contexts.11-13
To bridge this gap, we developed and implemented a multimethod framework to explore the trends of cyberattacks on global hospitals from 2017 to 2023, with a particular focus on identifying patterns and relationships between different types of attacks and hospitals. We used unconventional data sources obtained through the Global Database of Events, Language, and Tone (GDELT). GDELT is an open platform from which organizations, news, locations, themes, people, quotes, or images can be extracted and identified. GDELT had been used by several researchers to map and predict events such as social unrest, 14 future violence levels in specific regions, 15 and stock market prices. 16 Our study used social network analysis (SNA), machine learning (ML), and natural language processing (NLP) methods to collect real-time data from several sources, obtain data on past cyberattacks, classify these attacks based on their severity, and analyze the evolving threat landscape.
Our research contributes to the health cybersecurity research domain in 2 distinct yet interconnected ways. First, it provides a robust methodological framework that enables enhanced threat detection and response through real-time data collection and pattern identification. Second, the framework enables practitioners and organizations to make better risk assessments and data-driven decisions, helping to overcome uncertainties in predicting cyberattack trends and be more proactive in reducing failure rates. 17 Healthcare organizations are thus able to identify potential vulnerabilities, anticipate threats, and implement proactive measures while effectively prioritizing their resources. This systematic approach to understanding the threat landscape is fundamental for safeguarding critical infrastructure and ensuring the continuum of care for patients.
Cyberattacks on Hospitals
Cyberattacks involving hospitals have become an alarming global issue, with a growing number of incidents resulting in significant consequences for healthcare institutions, their patients, and staff. 18 Protecting sensitive data has become especially concerning considering the emergence of new technologies such as robotics, artificial intelligence (AI), and wearable devices that introduce risks related to data privacy and security. 19
The integration of new technologies with legacy infrastructure creates a complex ecosystem vulnerable to attacks.20-22 These attacks can be classified based on their severity as outlined in the Cyber Incident Response Plan (CIRP) framework. 23 A CIRP outlines how an organization can effectively respond to cybersecurity incidents and is written to ensure that the incidents cause minimal damage to the organization. Creating a CIRP that clearly defines processes, roles, and responsibilities will enable organizations to understand communication plans and standard operating procedures, identify the source of the incident, and ensure that steps are taken to contain the threat and isolate the enterprise from the attacker.
The CIRP classifies attacks based on the impact to the organization, with 4 levels of attacks. 24 Critical (Level 4) incidents are defined as those with catastrophic impacts that can pose a threat to the life of individuals (patients in our case), and can cause substantial destruction to hospital capabilities and their IT applications or systems with major reputational and financial losses. 23 High severity (Level 3) incidents are defined as those where the impacts are “substantial to the proper conduct of” hospitals and can cause “impactful destruction” to the hospital capabilities and their IT applications or systems with a substantial reputational and financial loss. In our context, critical and high severity incidents are primary infiltration attacks and include those where patients are directly impacted, such as disruption of critical medical equipment or compromised patient records.23,24 The classification of an incident as critical or high severity depends on its degree of impact. Moderate severity (Level 2) incidents are defined as those where the impacts are “moderate to the proper conduct of” hospitals and cause moderate disruptions over a period or affect several wards. This results in limited damage in terms of financial and reputational loss. 23 Moderate severity incidents are secondary infiltration attacks that affect patients indirectly and have implications for the hospital such as data exposures, breaches to billing systems, or attacks on the supply chain. Low severity (Level 1) incidents are those where the impacts are “generally limited to the proper conduct of” hospitals and do not usually disrupt organizational processes, or in some cases, maybe a single ward is impacted. It does not cause any direct patient harm and are usually tertiary infiltration attacks that target the broader hospital infrastructure. 24
Incidents can be categorized into these 4 levels based on documented sources for the attack types and severity. However, the framework does not provide specific categorization for hospitals/CNI. Using guidance from the CIRP framework, we have listed attacks on hospitals and their impact. Table 1 summarizes the types of attacks on hospitals from 2015 to 2023, including details on their effects, severity level, and real-world examples. For instance, distributed denial of service (DDOS) attacks are a high severity level with the aim of impacting patient care and disrupting critical services. A notable example occurred in 2020 at the University of Vermont Medical Center, where a DDOS attack disrupted patient appointments and delayed elective procedures, resulting in financial losses of US$1 million. 5
Types of Cyberattacks on Hospitals, 2015-2023
Abbreviations: AI, artificial intelligence; DDOS, distributed denial of service; MITM, man in the middle; NHS, National Health Scheme; UCLA, University of California Los Angeles.
The impact of cyberattacks on hospitals extends beyond financial losses. Other consequences include operational disruptions, compromised patient records, and erosion of trust between healthcare providers and patients. For example, a ransomware attack on the Hollywood Presbyterian Medical Center in 2016 had a substantial impact, not only in terms of financial and reputational damage but also disruption of workflows and continuum of acute care for patients. 25 An article by Neprash et al 26 describes how emergency room nurses were unable to handle patient care after a ransomware attack at St. Michael Medical Center in Silverdale, Washington, and the nurses requested emergency services to help them handle redirecting patients. Patient safety can be threatened due to cyberattacks, especially when they involve the use of electronic medical devices. 27 Beyond the continuum of care, cyberattacks have also led to deaths of patients. Between 2016 and 2021, an estimated 42 to 67 deaths of Medicare patients occurred due to system issues caused by ransomware attacks. 28 It is also estimated that the total number of deaths would surpass this approximation because it does not include patients with other types of health insurance. Other examples include a cyberattack on the Anna Jaques Hospital in Massachusetts on December 24, 2023, resulting in the rerouting of ambulances and transfer of patients to other facilities, with substantial delays in patient care. 29 Similarly, the UK NHS trusts’ pathology system provider suffered a ransomware cyberattack on June 3, 2024. This caused severe disruptions with major trusts such as King’s College Hospital, Guy’s and St. Thomas’ NHS Foundation Trust, and the transplant centers at Harefield Hospital, Royal Brompton Hospital, and Evelina London Children’s Hospital. Damages included the cancellation of several thousand appointments and surgeries as well as disruptions to general practitioner services. 30
Independent of the perpetrator’s motive or the nature of the cyberattack, the purpose of cybersecurity revolves around safeguarding the digital assets of the organization to forestall and mitigate the repercussions from cyberattacks. To do this, it is necessary to have an in-depth understanding of the landscape of malicious attacks evolution in conjunction with emergent technologies.
Methods
Overview
In this study, we propose a multimethod framework for collecting a comprehensive dataset of news articles pertaining to cyberattacks on hospitals using GDELT. GDELT is a database that tracks global news from a wide range of sources, including broadcast, print, and online platforms, spanning over 100 languages. It offers the possibility to collect information on key individuals, locations, organizations, themes, and events that influence global society. The use of extensive data sources, such as GDELT, is becoming increasingly common for analyzing shifts in news coverage regarding actors, events, and sentiments. 31
We used NLP, ML, and SNA to map interdependencies between the reported events and the entities—in this case, cyberattacks, and hospitals respectively. Extending the data-driven cybersecurity approach, 32 a 5-step process was adopted to include data extraction, data scraping, data preprocessing, feature extraction, and SNA (Figure 1).

Five-step methodological framework for collecting a comprehensive dataset of news articles using GDELT. Elongated rectangles indicate the beginning (data extraction) and end of the process (social network analysis), smaller rectangles show the actions, and diamonds show decisions that must be adopted. Arrows indicate the directional flow of the process. Abbreviations: CSV, comma-separated values; DDOS, distributed denial of service; DOS, denial of service; GDELT, Global Database of Events, Language, and Tone; NAN, not a number; NLP, natural language processing; URL, uniform resource locator.
While the steps themselves are not new, the combination of steps as a methodological framework is novel because it enables healthcare organizations to have a broader picture of the threat landscape. The methodological framework was applied to cybersecurity in this study, but it can be replicated in other contexts by modifying the search term selection, time period, and relevant features for further analysis.
Five-Step Process
Step 1: Data Extraction
This is the first step in the process and comprises 4 substeps. The first substep is source identification, whereby the selection criteria and GDELT were identified after careful consideration to ensure comprehensiveness and relevance. From GDELT, the Uniform Resource Locator (URL) for each of the articles had to be extracted. This was done in the second substep using a web crawler or bot that extracted the information. This step included identification of the correct keywords and time period for which the search had to be done. Keywords included “cyber attack,” “information attack,” “data attack,” “DDOSs,” “DOS,” “ransomware,” “information security attack,” “cyber breach,” “information security breach,” “data breach,” “hospital,” and “healthcare.” The time period was restricted from 2017 to 2023 to see how the trends in attacks had changed before, during, and after the COVID-19 pandemic. This substep also included pagination to ensure the web crawler could navigate several pages in a systematic manner. The Beautiful Soup Python library ensured the web crawler iterated through all pages and extracted dates from each of the web pages, ultimately creating a date list, navigating the events index page, and downloading the data by day.
The articles were then grouped by year for longitudinal analysis. Once the GDELT data files were downloaded, the comma-separated values (CSV) files were read using the Pandas Python library. Initially, 1,045,076 articles were downloaded, but several were false positives with varied other types of attacks on hospitals or generic cyberattacks on other sectors. Once the web crawler was configured, the next step was to validate the initial data with checks to ensure that only relevant data were included. Articles that did not pertain to cyberattacks on hospitals were removed in this substep, resulting in a total of 39,796 articles included. Further actions were taken to include only English-language articles, which reduced the total articles to 27,789. The final substep was metadata collection in which a CSV file with 3 columns—URL, date, and title—was extracted and stored.
Step 2: Data Scraping
In the second step, we extracted information from the different web pages identified in step 1 using the Beautiful Soup Python library. Extracted information included the content where the keywords (used in step 1) appear in the document (indicating relevant and necessary content for the analysis), organizational names, document title, and publication date. This was implemented through the Python requests library, resulting in the content being retrieved in HTML (hypertext markup language) format. A Beautiful Soup object in Python was then created that enabled us to parse the HTML structure. Specific information was extracted through the use of functions such as “find and find all,” especially for scanning the text within the HTML data and filtering the lines that contained any or all of the keywords. This approach enabled us to obtain and organize all necessary data to be used in the next steps for analysis and to capture all relevant information from the articles extracted. A new column “Context” was added to the existing CSV file to store the news content. To avoid issues with scraping, a 10-second timeout was established and sites that were nonresponsive returned “not a number” (NAN) in the context column. This was particularly useful as a robust error handling mechanism to manage issues like CAPTCHA (completely automated public Turing test to tell computers and humans apart), dynamic content loading, and access restrictions.
Step 3: Data Preprocessing
In the third step, we checked whether the data obtained from the first 2 steps were consistent and could be used. This included data cleaning whereby inconsistencies and issues with missing data were handled. For example, in some cases, the context column included records that were not consistent due to several reasons such as unresponsive websites, broken links, issues with CAPTCHA, dynamic content, or access-restricted sites that returned NAN in this column. The next substep included identifying non-English character sets and unintelligible data, which were removed. Duplicated records were also removed, resulting in a total of 18,009 articles included that mentioned 18,009 different attacks on hospitals.
Step 4: Feature Extraction
In the fourth step, the following 5 attributes were obtained from the news articles: type of cyberattack, severity level, hospital name, hospital type, and whether the hospital was private or public. Several NLP techniques, including tokenization, lowercase conversion, stop word removal, and lemmatization were used 33 to extract cyberattack keywords from the articles. First, the tokenization process was used to separate sentences in an article, which is crucial because it splits the article into smaller tokens that can be used later in the analysis. Next, we converted all tokens to lowercase to ensure accuracy in any comparisons of the data, and we removed frequently occurring text such as prepositions and articles using the stop-word removal process. Finally, we applied the lemmatization process, using morphological analysis and vocabulary to identify the base form of words. For example, words such as “studies” or “studying” would have the common base word “study.”
Once these techniques were completed, we used the spaCY Python library to conduct Named Entity Recognition, which involved parsing the text to extract relevant entities and categorize them appropriately. For example, we identified and extracted hospital names, types of cyberattacks, and severity levels. The types and severity levels of cyberattacks were based on the literature described in the previous section. All names of hospitals were extracted and, finally, hospital types were defined based on a list of predefined keywords from the literature to include rural hospitals, teaching hospitals, and clinics. 34 This step resulted in a list of 18,073 hospitals over the 6-year period that have suffered a cyberattack.
Table 2 shows a sample of the final dataset that we collected including (1) the link to the news article discussing the cyberattack; (2) the date the news article was published; (3) the title of the news article; (4) a brief summary of the incident; (5) the type of cyberattack; (6) the severity level of the attack; (7) the targeted hospital; (8) the type of hospital; and (9) whether the hospital is public or private. This systematic approach ensures that data are accurately categorized and prepared for identifying trends and patterns in cyberattacks on hospitals.
Sample Line of the Final Dataset
Abbreviations: IT, information technology; NHS, National Health Service.
Step 5: Social Network Analysis
The final step in the process is SNA, a research method involving tools and techniques used to explain how social entities are connected and interact with one other. It focuses on relationships and enables us to understand how patterns can be used to map the evolving cyberthreat landscape.
We created a bipartite network containing 2 different sets of nodes. 35 One set represents hospitals, and the other set represents types of cyberattacks. The edges represent the relationships that exist between the sets of nodes. In this context, an edge connects a hospital to a cyberattack if, for example, a particular hospital has experienced a specific type of cyberattack. In other words, the edge between the node represents the occurrence of a particular type of cyberattack at a specific hospital by year. Using this structure is beneficial because it provides insights into the relationships between hospitals and cyberattacks. Attributes such as the types of hospitals, severity levels, and countries are added to the data.
To identify the most important nodes within the network, we used the degree centrality measure, 35 which is the number of cyberattacks on hospitals, and for each cyberattack, the number of hospitals it targeted. A higher degree centrality for a cyberattack indicates that it is more widespread and has affected a larger number of hospitals. This can help identify the most common or pervasive types of cyberattacks in the healthcare sector. We also conducted temporal analysis using the NetworkX Python library to observe how the network evolves over time, providing insights into trends and shifts in the cyberthreat landscape.
Results
The trend in the number of articles about cyberattacks on hospitals by year between 2017 and 2023—of the 18,009 attacks reported in the literature—is shown in Figure 2.

Upward trend in the number of articles about cyberattacks on hospitals by year, 2017 to 2023.
Table 3 provides the number of cyberattacks on hospitals by year from 2017 to 2023, categorized by their severity levels. It indicates that compared to the 1,987 attacks reported in 2017, 3,416 attacks were reported in 2023, a 72% increase. Classifying these attacks based on severity level also shows a substantial increase in the number of critical (level 4) attacks over the years. A total of 1,116 critical attacks were reported in 2017, while 2023 had 3,123 critical attacks, an increase by 179%. This indicates an overall increase in the number of cyberattacks over the years, with a notable rise in critical severity level attacks. Ransomware and advanced persistent threats (APTs) were classified as critical; DDOS and man-in-the-middle (MITM) attacks were classified as high; denial of service (DOS), phishing, password attacks, and SQL injections were classified as moderate; and attacks such as DNS (domain name system) attacks or URL interpretation were classified as low, based on the literature. 24
Number and Severity Level of Cyberattacks Against Hospitals by Year, 2017 to 2023 (N=18,009)
Figure 3 shows the distribution of cyberattacks across 2,898 hospitals during the COVID-19 pandemic in 2020. Cyberespionage, cyberterrorism, DDOS, MITM, and Ransomware were the most frequent types of cyberattacks targeting hospitals.

Temporal bipartite network showing the distribution of cyberattacks across 2,898 hospitals during the COVID-19 pandemic in 2020. The blue triangles on the right represent the different types of cyberattacks. Ransomware, cyberespionage, cyberterrorism, DDOS, and MITM were the most frequent types of cyberattacks targeting hospitals. The red dots on the left represent the hospitals. The gray lines represent the relationships between hospitals and types of cyberattacks, indicating which types of attacks target specific hospitals. Abbreviations: DDOS, distributed denial of service; MITM, man in the middle.
Finally, to understand how the threat landscape has evolved, centrality measure of degree was calculated to identify the types of attacks that most hospitals suffered over time. Table 4 provides an overview of the top 5 types of cyberattacks by year and degree centrality, indicating the prominence of each type of attack within the data. In 2017, phishing had the highest degree centrality value at 4.690, indicating that it was the most significant attack type that year. By 2023, the situation had changed considerably, with APTs having the highest degree centrality value at 6.873. Ransomware also became much more common in 2023, with a degree centrality value of 6.452, compared to 0.188 in 2017. DDOS attacks were also prominent, with a value of 3.976 in 2023. Interestingly, phishing, which was the most prominent type of attack in 2017 with a degree centrality value of 4.690, had a much lower degree centrality value of 0.645 in 2023, indicating a relative decrease in its prominence compared to other attack types. Overall, these data indicate a shift away from phishing and toward APTs and ransomware as the most critical threats in 2023. This shift highlights the increasing complexity and sophistication of cyberattacks over the years. An increase in the prominence of APTs and ransomware suggests that attackers are focusing more on persistent and highly damaging attacks, which reflects the evolving strategies in the cyberthreat landscape.
Top 5 Types of Attacks by Year and Degree Centrality, 2017–2023
Abbreviations: APT, advanced persistent threat; DOS, denial of service; DDOS, distributed denial of service; MITM, man in the middle; SQL, structured query language.
An examination of the attributes obtained from the feature extraction in step 4 shows that teaching hospitals, public hospitals, and academic medical centers are more prone to critical (Level 4) and high (Level 3) severity attacks including cyberterrorism/state actor-sponsored criminal activities, APTs, and DDOS, whereas community hospitals, children’s hospitals, and clinics are more prone to low (Level 1) and moderate (Level 2) severity attacks.
Discussion
This study offers a novel methodological approach for collecting and analyzing different types of cyberattacks on hospitals. It maps the relationship between attacks and hospitals, identifies various types of attacks, and distinguishes the most prominent attacks affecting hospitals over time. The healthcare sector has experienced an increasing number of attacks as shown in our findings. This is because hospitals serve as vital components of a country’s healthcare infrastructure, managing sensitive patient data, medical records, and essential services. Consequently, they become prime targets for cybercriminals, state-sponsored actors, and hacktivists due to the valuable information they hold including patient health records, research data, and intellectual property. 36
In analyzing attack patterns, we found 2 distinct trends. Low severity (Level 1) cyberattacks such as phishing attacks, DOS attacks, or SQL injections have decreased due to improved preparedness, including implementation of multifactor authentication and regular audits. 3 This aligns with the recent literature showing that hospitals have increased their cybersecurity capabilities and the familiarity the organization has in dealing with such attacks and constant threat monitoring capabilities for phishing tactics can be attributed to decrease in the number of phishing attacks. 37 Another study 6 showcases that hospitals’ preparedness have increased as most hospital organizations now implement preventive measures such as multifactor authentication and conduct regular audits which lead to effectively deterring low level attacks such as SQL injections or DOS attacks despite these remaining a threat. However, we observed that critical (Level 4) and high (Level 3) severity attacks are increasing, with public hospitals and academic medical centers being particularly vulnerable to ransomware and cyberterrorism. This aligns with a recent study emphasizing an increase in ransomware attacks as hospitals lack adequate training in cybersecurity awareness and best practices, creating vulnerabilities that attackers can exploit. 38 Indeed, since 2020, there has been a substantial increase in critical attacks such as cyberespionage and state-sponsored attacks, which were not as prevalent before the COVID-19 pandemic. Specifically, these 2 hidden threats were more frequent as there were further benefits for the threat actor such as intellectual property theft and getting an oversight of intelligence operations. 39 Similarly, state-sponsored attacks—especially from hacker groups in China, North Korea, and Russia—operated covertly in the cyberspace to evade political responsibilities.40,41 Moreover, nation states may target healthcare institutions for geopolitical reasons, leading to cyberterrorism, espionage, and other state-sponsored criminal activities that can compromise national security.42,43
Despite regulatory requirements like medical device regulations, good clinical practice, and the General Data Protection Regulation (GDPR), many healthcare organizations remain reluctant to disclose incidents, leading to substantial fines. For example, within the European Union, 163 fines have been issued to healthcare organizations due to noncompliance with the GDPR, accounting for €16 million. 3 Indeed, as pointed out in the ENISA report, 3 most healthcare organizations rely hugely on incidents that are publicly disclosed by the victim organizations and are thus dependent on deliberate disclosures or sometimes through unintentional data leaks.
To this end, our multimethod framework enhances pattern identification for organizational-level policymakers while automating security data gathering. Using the proposed methodological framework enables better pattern identification that cyber policymakers at the organizational level can use to summarize huge volumes of information to obtain actionable intelligence. This also helps organizations automate security data gathering, thereby minimizing manual effort, which enables cybersecurity professionals to focus more on strategic tasks. This is especially important as a recent study 44 shows that many hospital organizations, especially in times of economic downturns, have reduced their investments in the health workforce and technologies, instead implementing absorptive capacity policies that focus only on critical hospital functions. Hence, cybersecurity professionals will not have the luxury of large teams and will need to prioritize tasks effectively. They must identify and address the most critical threats first, ensuring that limited resources are used efficiently to maintain robust security postures.
Integration of advanced ML, NLP, and SNA methods for data collection will provide a better understanding of the dynamic threat landscape, which can enhance cybersecurity standards such as ISO 27000 series and IEC (International Electrotechnical Commission) 62442 and policy development. Indeed, the need to use and integrate advanced methodology is also emphasized by a recent study 45 that shows the importance of supporting real-time data analysis risk assessment and decisionmaking processes. More specifically, a robust methodological framework, such as the one we propose, will enable enhanced threat detection and response, as well as better risk assessments, and will pave the way for data-driven decisionmaking. Our multimethod framework has the capability to analyze large volumes of real-time data and identify patterns to detect anomalies that could indicate emerging threats. For instance, while phishing attacks are still common, our real-time data show that the number of successful attacks reported based on the impact has declined. It is important to note that APTs and critical attacks have increased. Understanding these trends is vital because this capability enables the development of robust cybersecurity policies and standards that also allow for adaptive and proactive threat detection and cyber response. 46 Similarly, using the multimethod framework and adding predictive capabilities can enable policymakers to accurately predict potential threats and refine risk assessment methodologies and standards to ensure they are effective and relevant in the evolving threat landscape. Cybersecurity professionals can then better prepare for a response by dynamically allocating resources to counter the risks.
Limitations and Future Work
This study provides a methodological framework for extracting real-time data to map the cyberthreat landscape in the healthcare sector. While some studies have quantitatively specified the types of attacks,11,12,22 they relied on voluntarily disclosed data. There have not been many studies that use a multimethod approach including ML, NLP, and SNA, as well as real data from conventional and unconventional signals—such as GDELT—to investigate and map the current types of attacks on the healthcare sector. There are some limitations to this study. First, we applied SNA by using centrality measure of degree to evaluate the most prevalent attacks; however, future research could combine network measures with hospitals attributes (eg, countries, performance indicators, size) to analyze whether hospitals that occupy a central position in the network are more prone to experiencing state-sponsored attacks. Along the same lines, network models can be applied to observe how the network evolves over time. Second, we used longitudinal data to showcase how the severity of threats has changed over time, but we did not use any predictive models. Future researchers could incorporate predictive modeling using GCN (graph convolutional network) or node2Vec methods based on the historic data available. Being able to predict future events with high precision will certainly help healthcare organizations to not only understand the current landscape but also improve their preparedness for such events.
Conclusion
This study developed an important framework for implementing a multimethod approach to collect real-time data from several sources, identify patterns and relationships between attacks and hospitals, and explore cyberattack trends in the healthcare sector. The results of our analysis show an exponential growth in critical (Level 4) and high (Level 3) severity attacks, while low (Level 1) severity attacks have diminished over time. The approach presented in this study allows healthcare institutions to identify potential vulnerabilities, anticipate threats, and implement proactive measures to mitigate risks. Furthermore, it enables them to prioritize resources effectively, focusing on areas with the highest risk or potential impact. As the frequency and severity of attacks tends to increase, understanding the threat landscape is a fundamental step in safeguarding this critical infrastructure.
Footnotes
Acknowledgments
The authors wish to acknowledge funding received by Networks and Urban Systems Centre (NUSC) from the
