Abstract
In the present environment, as cybersecurity attacks become more sophisticated and frequent, classic protection techniques, such as rule-based firewalls and signature-based detection systems, are no longer effective. Modern cyberattacks require innovative solutions that can change and react in real time. Deep reinforcement learning (DRL) is a subfield of artificial intelligence that efficiently solves complicated decision-making issues in numerous domains, such as cybersecurity. This research aims to improve cybersecurity by establishing deep reinforcement learning algorithms for automated threat detection and response in dynamic network environments. To address these problems, this study proposed a novel Improved Mayfly Optimized Deep Deterministic Policy Gradient (IMO-DDPG) to automate threat detection and improve cybersecurity response in dynamic network environments. Collect real-time network intrusion datasets accessible to the public to train the DRL model. The data was preprocessed using normalization, for feature extraction Principal Component Analysis (PCA). To improve performance in an increasingly developing environment, the study customizes DRL algorithms with an emphasis on continuous action spaces, adversarial training, and customized reward structures. The results indicate that the IMO-DDPG method outperforms accuracy (94.5%), recall (94.0%), precision (92.5%), and F1-score (93.5%) to compare existing algorithms. The remarkable success rate, highest average reward, and superior efficiency in interpretation significantly improve threat detection and response capabilities. The results show the suggested algorithms for automated detection and response, highlighting the right decision, can significantly enhance cybersecurity defences.
Keywords
Introduction
In the rapidly changing landscape of the digital world, various types of cybersecurity attacks are occurring with increasing frequency and complexity, often on an hourly basis. However, modern threats often circumvent traditional security measures such as rule-based firewalls and signature-based detection systems. 1 To address the growing sophistication of cyberspace, new solutions are required that can respond in real-time with the appropriate responses at the optimal moment—solutions not yet fully investigated or developed. 2 This paper focuses on the use of automated threat detection and response mechanisms that can successfully operate in dynamic network environments to enhance cybersecurity. 3 As organizations increasingly rely on digital infrastructure, the urgency to identify and mitigate cyber threats swiftly becomes paramount. By employing artificial intelligence (AI) to develop systems capable of detecting security threats in networks characterized by numerous dynamic connections, these systems not only identify potential risks but can also learn from experience and improve their capabilities. Such dynamic systems must remain relevant by continuously adapting to emerging threats. 4
A significant portion of this research is dedicated to leveraging real-time network intrusion data to construct more sophisticated models aimed at enhancing threat detection. 5 These models are designed to improve performance in increasingly complex and evolving environments through continual learning and adaptation. 6 The necessity for high levels of automation in decision-making processes is critical to significantly enhance cybersecurity capabilities. The results demonstrate a marked improvement in the effectiveness of automated threat detection and response techniques, affording better success rates and greater efficiency than existing solutions. 7 The findings highlight the extent to which advanced AI solutions can redefine cybersecurity methodologies, enabling organizations to counter rapidly evolving and highly sophisticated cyber threats. 8 Ultimately, this paper advances the ongoing discussion regarding cybersecurity, providing evidence that proactive and adaptive strategies must be integrated into the adoption of such technologies. 9 As the cyber threat landscape continues to evolve, incorporating advanced technologies within cybersecurity frameworks will be essential for protecting digital assets and maintaining organizational resilience, representing a crucial step toward a secure digital future. An automated threat detection and response system embedded within dynamic network environments greatly reduces exposure to evolving cyber threats. 10
This research aims to develop Deep Reinforcement Learning (DRL) algorithms for automated threat identification and response in dynamic network environments, with the ultimate goal of enhancing cybersecurity. The main contributions of this work are as follows: (1) The collection of relevant data: Real-time network intrusion datasets are gathered. (2) Pre-processing through the application of the Z-score, with Principal Component Analysis (PCA) employed to extract the most important features from the pre-processed dataset. (3) The introduction of a novel method called the IMO-DDPG, which aims to provide enhanced detection and improve cybersecurity posture in complex networks automatically.
Related works
Shah et al. quantitatively assessed the application of machine learning (ML) algorithms in cybersecurity and concluded that these algorithms are beneficial for detecting and preventing various types of threats. 11 A substantial amount of data has been processed through ML algorithms and data-oriented approaches, enabling the identification of potentially suspicious activities. These algorithms continuously improve and learn from updates, acquiring knowledge from new input data and enhancing cybersecurity measures simultaneously. The general defense algorithms range from recognizing signs of well-known malware to protecting against previously unknown risks through anomaly detection.
Naseer et al. explored how businesses implement Real-Time Analysis (RTA) in the Incident Response (IR) process to enhance their cybersecurity performance. 12 By engaging a group of cybersecurity experts through interviews and applying the contingent resource-based view (RBV), the study yielded a framework illustrating how IR agility serves as a mediator between RTAC indirect effects and business cybersecurity outcomes.
Yungaicela-Naula et al. assessed recent studies on the automation of security within Software-Defined Networking (SDN) systems. 13 The study identified and ranked various security measures based on differing levels of automation and complexity. The degree of automation was determined using four established quantitative parameters: adaptive behavior, self-healing, self-configuration, and automatic optimization. Complexity was defined by factors such as storage capacity, processing power, and implementation requirements.
Nikoloudakis et al. proposed a machine learning-based contextual awareness framework that leverages the real-time awareness feature of the SDN paradigm to identify both new and existing entities enabled by networks. 14 The framework evaluated recognized threats and allocated suitable network resources for their connectivity. A dataset was utilized to train a machine learning-based intrusion detection system that continuously monitors the entities under evaluation.
Gudala et al. examined the potential for threat detection using artificial intelligence (AI) within Zero Trust Architecture (ZTA) frameworks. 15 The authors discussed the foundational principles of ZTA and the inherent limitations of conventional security approaches. They emphasized the advantages of employing AI for anomaly detection, highlighting the technology’s capacity to identify minor deviations from established norms in user behavior, network traffic, and system configurations. This allows security personnel to take proactive measures to neutralize potential threats before they escalate into security incidents.
Repetto et al. presented a new methodology for managing the cybersecurity of web-based service chains. 16 This innovative paradigm examines evolving norms for Information and Communication Technology (ICT) services while extending beyond traditional security perimeter models. The study emphasized the need for contemporary Security Information and Event Management (SIEM) designs to be reconfigured to accommodate dynamic topologies, multi-tenancy, and diverse administrative domains.
Mironeanu et al. described an approach for developing cybersecurity frameworks through the Experimental Cyber Attack Detection (ECAD) system, integrating various sources of information for security alerts. 17 The system coordinated machine learning elements and security solutions to identify and mitigate cyberattacks. It offered an architectural model and a potential implementation strategy utilizing cutting-edge methods such as SDN and Infrastructure as Code (IaC).
Guo and Guo investigated the features of cloud computing technologies and proposed a cloud-based real-time risk assessment and mitigation strategy for intelligent ship network security. 18 They developed a self-executing defense mechanism designed to spawn modules that prevent and reject attacks. Multi-sensor modules were primarily used to analyze data containing potentially harmful information.
Bringhenti et al. reviewed current methods for automating network security service configuration. 19 The study identified the optimal approach for coordinating an entire service in a virtualized environment, concentrating on two distinct areas: service architecture and the actual deployment of composed functions. It evaluated prior works in each category, considering various criteria such as formal verification and the satisfaction of optimality requirements.
Rose et al. investigated the application of game theory, machine learning, and network characterization to protect the Internet of Things (IoT) against cyberattacks. 20 Their proposed anomaly-based intrusion detection system dynamically and actively monitored all networked devices to detect any attempts at IoT device manipulation and suspicious network transactions. Any deviation from the specified profile was treated as a threat and scrutinized further. The machine learning classifier also processed raw traffic to identify potential attacks.
Kandhro et al. introduced a novel deep learning-based method for identifying cybersecurity vulnerabilities and breaches in cyber-physical systems. 21 The proposed framework contrasted unsupervised learning-based discriminative techniques and neural networks to detect cyber threats within IoT-driven Industrial Internet of Things (IIoT) networks, employing generative adversarial networks for enhanced detection capabilities.
Apruzzese et al. clarified the role of machine learning in cybersecurity by offering a comprehensive overview of its benefits, challenges, and future obstacles within the field. 22 The focus was on the broader cybersecurity landscape, minimizing technical jargon to ensure accessibility to a wider audience. They provided a succinct description of ML applications in identifying three categories of online threats: malware, phishing, and network intrusions. The authors also highlighted additional domains within cybersecurity, including threat intelligence, alert management, cyber risk assessment, and raw data analysis, that could benefit from machine learning’s autonomy.
Gong and Lee presented a framework for analyzing threat indicators that could be generated from advanced metering infrastructure. 23 They proposed a strategy for producing cyber threat intelligence targeting the energy cloud. The research further suggested a mechanism for sharing and exchanging cyber threat intelligence between the Advanced Metering Infrastructure (AMI) and the cloud layer, facilitating the swift implementation of a security structure for large-scale energy cloud architectures.
Furdek et al. introduced a new functional block known as the Security Operation Center (SOC). 24 The paper detailed its architecture, defined essential specifications for its capabilities, and provided guidelines for integration with the optical layer controller. Additionally, the study incorporated unsupervised and semi-supervised neural networks, enhancing the effectiveness of machine learning-based safety diagnostic approaches to address users’ unfamiliarity with high-dimensional optical tracking data in the context of previously unidentified physical layer threats. It utilized three forms of dimensionality reduction and compared their performance against execution time complexity and machine learning accuracy.
Nespoli et al. proposed an innovative dynamic rule management system to adapt to the evolving conditions of IoT environments. Their experiments demonstrated that the system’s CPU and Random Access Memory (RAM) usage were significantly lower than those observed when employing traditional techniques. 25 Notably, there was a marked increase in security levels, as evidenced by a substantial rise in the number of packets processed per second.
Methodology
Real-time network intrusion datasets are collected from publicly available sources in this study and include various network attack scenarios to train and evaluate the deep reinforcement learning (DRL) algorithms and data preprocessed using Z-score normalization for feature extraction of the dataset using Principal Component Analysis (PCA). An improved Mayfly Optimized Deep Deterministic Policy Gradient method (IMO-DDPG) is proposed, which integrates improved optimization techniques, continuous action spaces, and tailored reward structures for automated detection and response of threats in dynamic network environments, rendering the performance of cybersecurity much superior to other algorithms. Figure 1 presents the methodology structure. Methodology structure.
Data collection
Data has been collected from the Kaggle source: https://www.kaggle.com/datasets/sampadab17/network-intrusion-detection. The network intrusion detection dataset on Kaggle is used to develop models for predicting network intrusions. However, the reliance on a single dataset has inherent limitations in diversity and may not fully represent the complexities of real-world scenarios. Future work should explore multiple datasets to capture a broader range of attack patterns and ensure a more comprehensive evaluation of the proposed methods.
Data preprocessing
By applying z-score normalization to network intrusion data, research computes the mean and variance for unit scaling, facilitating efficient comparison and processing of diverse data. This method standardizes the data which work on different types of intrusion data. This enables them to learn from other patterns of these threats and subsequently enhance their functionality in the detection and response to threats under different network conditions.
Z-score normalization
This technique is most widely applied. This converts the scores into a distribution with a standard deviation of 1 and a mean of 0. It is imperative that possess a firm grasp of score distribution prior to employing this methodology, as demonstrated by equation (1).
Feature extraction using PCA
When detecting threats in the new and dynamically changing data and network environments, a set of techniques is used to enhance this model’s efficiency, including Principal Component Analysis (PCA). The applicability of PCA in security systems can be viewed in the light of mitigation of complexity of network intrusion data to make the necessary processes involved in the security systems to process and analyze the data required for identification of the threats.
Principal component analysis
The PCA algorithm’s basic concept is as follows: In addition, it is composed of the centralized data sample
Using equation (3), PCA transforms the input vector into a new vector.
The dataset covariance matrix
Improved Mayfly Optimized Deep Deterministic Policy Gradient (IMO-DDPG)
Automated threat recognition and response applies to dynamic network environments to include DDPG found to be preferred reinforcement learning technique due its ability to address continuous action spaces. The design employs an actor-critic framework where the actor is responsible for identifying optimal actions to mitigate cybersecurity threats, while the critic evaluates the action taken by providing feedback on the predicted benefits. The improvement upon existing DDPG methods is achieved through the integration of the Improved Mayfly Optimization, which enhances exploration capabilities and optimally balances exploitation of known strategies. This dual-pronged enhancement enables the algorithm to navigate the complex decision-making landscape of cybersecurity with greater efficacy.
Mayflies employ a method of cooperation in foraging, and this behavior is leveraged by Mayfly Optimization to enhance DDPG by aligning it with the relatively new theory of Swarm Informatics. This swarm intelligence program which gets its reference from nature is meant to optimize the search space so that the mayflies can shift their location with respect to the performance of the individual as well as the group. These models, collectively, offer a strong base to build on for advancing automated cybersecurity.
Deep deterministic policy gradient (DDPG)
A given reward function’s predicted cumulative value maximizes when a user learns to interact with a new setting, using the Reinforcement Learning (RL) approach. The environment is typically treated as a Partially Observable Markov Decision Process (PO-MDP); the user must choose an action based on the observations received from the environment at each moment. It is theoretically possible for the observation and the actual state of the system to differ. Since
Discrete actions and observation areas are taken into consideration in traditional RL tabular approaches. The term tabular refers to the fact that the participant in such systems usually maintains a database with the expected cumulative value of rewards
However, tabular approaches are ineffective when dealing with continuous and high-dimensional spaces; they only have distinct areas for observation and activity. To overcome this limitation, multiple modifications have been proposed in the field of technology, primarily by utilizing neural networks’ ability to serve as general functional approximators. Deep Reinforcement Learning (DRL) is the term used to describe the fusion of DL methods and DRL algorithms. Specifically, actor-critic approaches split the RL problem into two separate issues. • Critic: The critic determines an acceptable approximation of the action-value equation. • • Actor: Using a different approximator
The DDPG algorithm is an actor-critic technique that extends the DPG by utilizing deep neural networks in a model-free, off-policy manner. A critic network
If the following state is terminal,
Target networks are used to increase learning stability because the
Improved Mayfly Optimized (IMO)
The Mayfly algorithm draws inspiration from the social behavior of Mayflies, particularly from their mating habits. It is believed that mayflies are considered adult as soon as when their eggs develop. In addition to how long they live, healthy mayflies typically survive. Each mayfly in the area of search occupies a position that corresponds to a method of solving an issue. The classic mayfly method creates additional variables that lead to an area’s optimal solution by utilizing RAND functions. Levy flying and the mayfly algorithm were coupled by researchers to enhance their search skills and yield the best result. According to the Levy flight concept, due to its rapid convergence and lack of dependence on differential information, it supports probabilistic random search. Enhancing local search reduction and localized trap of the optimal solution are largely dependent on Levy fly. For the recommended mayfly optimization technique to work, the following steps must be taken: Make twin mayfly groups at random, each for both genders, to symbolize the populations of men and women. Next, each mayfly which is denoted by a
When a mayfly’s position changes, its velocity is initialized,
The male mayfly is thought to be a few meters above the water’s surface, with
In this instance, the Mayfly’s speed
The equation for the top position globally
The initial velocities of the female can vary depending on species
The attracting method used in this instance is unable to be randomized due to
This phase gives the speed of a mayfly candidate response, which is determined using the Levy flying technique. Equation (17) is utilized to ascertain the mayfly candidate solution’s velocity.
Moreover, the calculation of
In the format given below, the crossover operators describe the mating procedure for mayflies. In the same selection process, females’ attraction to males is used to pick each parent from the male and female populations. To be more precise, parents may be selected at random or according to their level of fitness. When it comes to fitness functioning, males and talented females’ pair together, the second-best male and female pair with each other, and so on. Equation (22) contains the formula for the two offsprings produced by this crossover.
Result
Operations on Windows 11 were performed using python 3.12 and the RAM included is 32 GB, Intel Core i7 processor of 12th generation. The device is being tested in a modern laptop configuration engineered to create reasonable loads of two demanding multitasking and development workloads. The proposed approach in this work presents the use of Deep Reinforcement Learning algorithms to automatically find threats in a dynamic network environment covering a higher level of accuracy compared to traditional approaches like Random Forest (RF) 26 and convolutional neural network (CNN). 27 It gives critical metrics such as accuracy (%), recall (%), precision (%), and F1-score (%) required for much better cybersecurity outcomes in the case of threat detection assessments.
Performances of threat detection
In terms of duration, the high condition required efficiency of 40% to complete with a 70% success rate and an average reward of 60%, while the medium condition performed well with an 85% success rate and 100% average reward in 60% efficiency. With just 85% efficiency, TCP achieved an 80% average payout and 90% success rate, making it the most effective protocol type. By comparison, UDP took 50% efficiency and had a 75% success rate, with an average return of 53.33%. In terms of service, HTTP stood out because it processed requests under 90% efficiency and had a 93.33% average reward and 95% success rate. FTP, on the other hand, performed poorly, requiring 25% efficiency and having an average reward of 46.67% with a 60% success rate. Figure 2 shows the performances of thread detection. Performance Metrics of Threat Detection (a) Duration, (b) Success Rates, (c) Service Efficiency.
Accuracy
To assess how well the proposed IMO-DDPG model detects and reduces network threats in dynamic environments, it is compared with other approaches. The new IMO-DDPG technique with an accuracy score of (94.5%) is significantly better than traditional methods. This is higher than CNN (75.0%) and RF (92.5%). Results demonstrate that the IMO-DDPG model enables a practical approach to enhancing cybersecurity and boosting system protection by automating threat detection and response in dynamic network environments. The accuracy result is presented in Figure 3 and Table 1. Accuracy result. Overall performance.
Recall
Recall is the system’s capacity to identify potential network threats. To utilize the IMO-DDPG, which is recommended for automatic threat identification and response, the system achieved a recall rate of 94.0%. In terms of scores, both the current approaches (CNN and RF) achieved 60.0% and 93.0%, respectively. This shows that the proposed strategy significantly enhances existing techniques for exploring cybersecurity risks in a dynamic network environment, consequently increasing the system’s ability to identify and counteract attacks. The result of the recall is shown in Figure 4 and Table 1. Recall result.
Precision
Precision is also defined as the measure of how a system detects important threat signals across a dynamic network. When compared to the current techniques, RF and CNN achieve precision values of 91.8% and 80.0%, respectively. As shown by its precision value of 92.5%, the suggested IMO-DDPG algorithm performs better than any of the existing methods in the proper detection of cybersecurity threats. The result of precision is presented in Figure 5 and Table 1. Precision result.
F1-score
F1-score is an accurate means of system evaluation of the ability to automatically identify and respond to threats in dynamic network environments. This enables to compare other methods such as CNN and RF that yielded F1 scores of (68.0%) and (92.4%), respectively. Other approaches could not outperform the suggested IMO-DDPG strategy with an F1-score of (93.5%) to solve the problem of recognizing potential threats. This indicates effective deep reinforcement learning is compared to traditional approaches when it comes to cybersecurity. The result of F1-score is presented in Figure 6 and Table 1. F1-score result.
Discussion
Random Forest (RF) and Convolutional Neural Networks (CNNs) are limited in automatic threat detection and response in dynamic network environments due to the necessity for high-quality, labeled training data that can be hard to obtain because threats are inherently dynamic. For high dimensional data, RF may find it hard to generalize, so CNNs have difficulty with both large models and slower training times. Furthermore, CNNs can be less interpretable, such that cybersecurity analysis attempting to understand the factors influencing decision-making will find them more difficult to comprehend. In addition, both methods might struggle in responding to new threats that are substantially removed from patterns previously encountered. To overcome the shortcomings of random forests and convolutional neural networks in automatic threat detection, an improved Mayfly Optimized Deep Deterministic Policy Gradient (IMO-DDPG) approach is presented. This approach makes use of reinforcement learning’s strengths for adaptive online optimization of decision-making modalities. IMO-DDPG integrates an improved mayfly optimization algorithm for improving exploration and exploitation capabilities, which aid in better dealing with high-dimensional data and dynamic threat landscapes. Additionally, this approach contributes to better interpretability and flexibility to new threats, resulting in more robust and efficient cybersecurity responses.
Conclusion
Dynamic network environments are targeted for automated threat detection and response through deep reinforcement learning (DRL) in this study. The research proposes the method of the Improved Mayfly Optimized Deep Deterministic Policy Gradient (IMO-DDPG) algorithm to tackle the issue of traditional cybersecurity methods. The IMO-DDPG model is trained on real-time network intrusion datasets and enhances performance by focusing on continuous action spaces, adversarial training, and customized reward structures. The results demonstrate accuracy (94.5%), recall (94.0%), precision (92.5%), and F1-score (93.5%). It shows that the IMO-DDPG outperforms existing algorithms. Based on the results from the study, we conclude the proposed approach that provides a strong defines against cybersecurity vulnerabilities through real-time, adaptive decision-making. Despite the promising results, the IMO-DDPG algorithm presents several limitations. It may be vulnerable to advanced adversarial attacks, highlighting the need for robust testing against such threats. The model also exhibits high computational complexity, which could hinder real-time application in resource-constrained environments. Furthermore, its generalizability to unseen environments is limited, as the training heavily relies on specific datasets. To address these challenges, future research could explore hybrid models, enhance adversarial training techniques, and incorporate adaptive learning strategies that allow for efficient scaling in larger networks.
Statements and declarations
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
