Abstract
The cloud computing and Internet of Things (IoT) have become two key technologies to meet future business requirements. However, a massive scale of Distributed Denial-of-Service (DDoS) has been widely applied to congest network critical links and to paralyze the cloud and IoT service. This is mainly due to DDoS is easily implemented, obfuscated, and occulted by launching large-scale legitimate low-speed flows and rolling target links to paralyze target network areas. Many metrics and risk access management frameworks to evaluate the impact of DDoS are proposed. However, they all lack time granularity to evaluate the cost of different scales of attacks in IoT or large-scale network structure. This study proposes an AI Driven Evaluation framework, called ADE, that applies Convolution Neural Networks to statistically evaluate the network status through end-to-end functionality (Input: network status; Output: DDoS detected or not) without any manual intervention. ADE provides quantitative security risk analysis by using learning time as the control variable, network structure as the independent variable, and time to identify DDoS as the dependent variable. The learning time to detect DDoS event and recover the system is then applied to evaluate the scale of this DDoS, the reasonability of the regulated RTO, and the vulnerability of the current net-work topology and the improvement due to the new security solution. The experiment results demonstrate the contributions of ADE are (1) providing objective and quantitative analytical security risk assessment indicator, (2) providing an autonomic DDoS defense framework without any manual intervention which allows cloud computing and Internet of Things company focuses on their service and leaves security defending to ADE, and (3) demonstrating the possibility of AI assisted risk assessment which enables security defense solution buyer with less security domain experts to evaluate suitable network defense strategy.
Keywords
Introduction
The cloud computing and Internet of Things (IoT) have become two key technologies to meet 5G ecosystem requirements. However, several well-known cloud services are encountering Tbps-class (terabit per second class) cyberattack from IoT devices. For example, in 2016, the hosting company OVH has been subject to the DDoS with peaks of over 1 Tbps of traffic from 145,607 hacked CCTV cameras and personal video recorders [1]. In 2018, 1.35 Tbps traffic hit the developer platform GitHub as shown in Fig. 1 [2]. The security mitigation service of GitHub then re-routed all the traffic coming into and out of GitHub to weed out and block malicious packets until the assaulting dropped off. These attacks immediately draw attentions of how to defend the Tbps-class DDoS, especially the traffics are largely generated from IoT devices [1-2].

Real-time traffic of GitHub in the 2018 DDoS event [2].
Two types of solutions to mitigate the DDoS are therefore proposed. First is to quickly isolate infected devices and reroute the malicious traffics [3–5]. Once the IDS (Intrusion-Detection System) detects an abnormality, the network traffic controller then updates the definition of the malicious flows in the switch and router to isolate the infected devices. The second type is to dynamically changes the service’s external IP address [6–12]. This strategy attempts to coordinate the DNS and routers to convert the real IP to virtual IP of the corresponding packets in a short period of time. So that the DDoS attackers cannot grasp the cloud service’s real network location. These types of solutions, though provide sufficient protection functionalities in different ways, are difficult to objectively evaluate their effectiveness and resilience of the underlying network when encountering DDoS attacks. From the perspective of the buyers of security defense solution, they do not seek a permanent solution because the life cycle of their cloud services will not last more than three years. Most of these buyers only care which scale of DDoS attack they should respond to under the consideration of Recovery Time Objective (RTO) regulated by government. Otherwise, they will take no action and decide to accept the risk. That is, for example, is the T-bps class DDoS more harmful than Giga-bps class DDoS? If the answer is that they both have to spend the same recovery time to both attacks, why should they buy a new solution to cope with the T-bps DDoS attack?
Many metrics and risk access management frameworks to evaluate the impact of DDoS are proposed. The National Infrastructure Advisory Council (NIAC) develops the Common Vulnerability Scoring System (CVSS) standard to assessing the severity of computer system security vulnerabilities [13]. CVSS adopts a questionnaire style of scoring system to evaluate vulnerabilities. On the other hand, Annualized Loss Expectancy (ALE) is widely adopt by many standards, e.g., International Information System Security Certification Consortium, (ISC)2 [14]. ALE is the expected monetary loss that can be expected for an asset due to a risk over a one-year-period. However, CVSS and ALE both lack time granularity to evaluate the cost of different scales of attacks in IoT or large-scale network structure, as the question we mentioned above [15]. Taking Fig. 1 as an example, CVSS and ALE do not provide a methodology to explore the relationship between the scale of DDoS (1.3T-bps), the corresponding system recover time (10 minutes), and the recovery costs of each cyberattack event, since they only consider the number of cyberattacks and the corresponding asset value during a fiscal year [13–15].
This study therefore proposes an AI Driven Evaluation framework (called ADE) to assess the scale of different DDoS cyberattack. The basic idea of ADE is to use DDoS detection as the control variable, DDoS patterns and network topology as the independent variable, and AI learning time to detect DDoS event as the dependent variable. The learning time to detect DDoS event and recover the system is then applied to evaluate (1) the scale of this DDoS, (2) the reasonability of the regulated RTO, and (3) the vulnerability of the current network topology and the improvement due to the new security solution. The evaluation procedure is that ADE first emulates real DDoS behaviours by generating a list of intentioned malicious traffic. Then, ADE applies Deep Convolutional Neural Network (DNN) model to learn how to detect this DDoS by DNN’s end-to-end functionality (Input: network work traffic screenshot; Output: detected result) without any manual feature extraction and intervention. Finally, the buyers of security defense solution can apply the learning time acquired from the second step to (1) quantitatively evaluate the impacts of different scale of DDoS on current network environment, and (2) to ensure the effectiveness of new security solution considering the regulated RTO.
ADE furthermore can provide the security defense solution seller to utilize the learning time as an objective indicator to address the issues of rant-seeking and survivorship bias when debating which features to include and which to leave for later in this solution design between the buyer and seller. Rant-seeking implies extraction of uncompensated value from others without making any contribution to productivity through manipulating the social, political environment, or so called “national security”. The survivorship bias means the logical error of concentrating on the buyer that made it past some selection process and overlooking those that did not, leading to false conclusions. As state above, the contributions of ADE are (1) using learning time as an objective and quantitative metric to evaluate vulnerability of the security defense solutions when confronting different scales of DDoS, (2) demonstrating the possibility of AI assisted risk analysis procedure which enables security defense solution buyer with less security domain experts to evaluate suitable network defense strategy.
This study is organized as follows: Section 2 re-views the risk analysis and assess management. Section 3 provides the details of ADE. Section 4 demonstrate the experiments results and the implications. Finally, Section 5 describes conclusions.
The risk assessment is a methodology of identifying vulnerabilities and threats and assessing the possible impacts to determine where to implement security controls. The information and communication industry (ICT) has different risk assessment procedures which define and utilize a standardized methodology when it comes to carrying out risk assessments, such as OCTAVE (Operationally Critical Threat, Asset, and Vulnerability Evaluation), FRAP (Facilitated Risk Analysis Process), ISO 31000, ISO/IEC 27000 Series, FMEA(Failure Modes and Effect Analysis), CRAMM (Central Computing and Telecommunications Agency Risk Analysis and Management Method) [16]. Each of standards has the same basic core components, as the 9 steps procedure shown in Figure 2.

Risk Analysis Procedure.
First is system characterization including (1) to establish the scope of the risk management, (2) to identify the information, e.g., hardware, software, responsible division, essential to assess risk.
The second step is to identify the threat. A threat is the potential to exercise a specific vulnerability accidentally or intentionally. The threats are commonly come from natural threat (e.g., earthquakes, avalanche, etc.), human threat (e.g., intentional cyberattack or unintentional acts), or environment threat (e.g., power failure).
The third step, vulnerability identification, is to develop a list which describes the system flaws or weaknesses could be exploited by the potential threat.
The fourth step is to evaluate the effectiveness of the controls that have been implemented in current system.
The sixth step is to analyze the impact resulting from a successful cyberattack. The impact may include loss of integrity, availability, or confidentiality. The expected loss is suggested set to a monetary value as the consideration for the future company budget allocation. The most common equations are the single loss expectancy (SLE) and the annual loss expectancy (ALE). The SLE is a dollar amount that represents the potential loss amount if a specific threat were to take place. The ALE demonstrates the estimated loss of a specific threat taking place within a fiscal year.
The seventh step is to determine the risk level. The risk level is commonly demonstrated as a risk scale of High, Medium, and Low, represents the degree of risk to which an IT system, facility, or procedure might be exposed.
The eighth step is to recommend a control to mitigate or eliminate the identified risk. However, a security control must be cost effective based on the business budget constraints. This means a recommended control strategy’s benefit must outweigh its cost by a cost/benefit analysis. Once the cost of the control strategy is higher than the benefit, it would be better to do nothing but accept the risk.
The ninth step is to document the analysis result and sent to the senior officer to make decision.
This study focuses on the problem of quantitative risk analysis in the sixth step because the ALE lacks time granularity to evaluate the cost of different scales of attacks in IoT or large-scale network structure. To be more specifics, first, the security defense solution buyer does not seek a permanent solution since the life cycle of their services will not last more than 1∼3 years. They only care which scale of DDoS attack they should respond to under the consideration of Recovery Time Objective (RTO) regulated by government. Second, ALE only consider, for example, the loss of a server is under attack in a fiscal year. But, an IoT or large-scale network structure is more complex than a single server. Therefore, a time-based quantitative risk analysis is required to complement to the current ALE methodology.
This study attempts to determine the recovery time of different scale DDoS cyberattack statistically through end-to-end functionality (Input: network status; Output: the system recovery time) without any manual intervention.
The problem that we investigated is formally described, as follows. First, as network traffic is continuous changing, a network status NS
t
containing n nodes at time t can be presented as:
The recovery time RT includes the time to detect a DDoS cyberattack T
d
and the time to system recovery time T
s
once the system is under attack:
The detection time T d means a DDoS cyberattack is detected by the IDS system with accuracy statistically Pr ddos where Pr ddos >90%. The T s is the time when the system is recovered according the standard operation procedure defined by the system administration. It should be noted that, the T s is set to a fixed value in this study to simplify the problem.
Thus, the problem statement is formally given, as follows:
This study proposes an AI Driven Evaluation framework (ADE) to objectively and quantitatively estimate the system recovery time under different DDoS cyberattack as shown in Fig. 3. ADE can self-learn and self-organize without any supervision to emulate real player behaviours and statistically evaluate the DDoS cyberattack without any manual feature extraction and intervention. This means ADE based on convolutional neural network methodology does not need any pre-processing or handcrafted feature extraction by heuristic rules. Therefore, ADE can reduce the labour cost regarding inefficient and subjective manual DDoS defense. The network manager simply evaluates the DDoS cyberattack by using DDoS detection as the control variable, DDoS patterns and network topology as the independent variable, and AI learning time to detect DDoS event as the dependent variable.

AI Driven Evaluation framework.
The architecture of ADE shown in Fig. 2 consists of various layers including an input layer, convolutional layer, subsampling layer, and output layer. The input layer is the first layer that receives the input to ADE. The convolutional layer performs convolution operation with filters on the network status screenshot. The convolution operation means each filter is convolved across the input data, computing the dot product between the entries of the filter and the input and producing new data. The subsampling layer performs the downsampled operation to lower the computational complexity. The output layer presents the probability of each DDoS cyberattack action. The number and type of layers can be adjusted according to the complexity of network status. The deeper the network, the better performance it achieves. However, the training time and computation time is getting higher for a deeper network, the network administrator has to gauge the trade-off between performance and computation-cost.
Figure 3 also demonstrates the 10-steps workflow of ADE. First, ADE screenshots each network status as input to the input layer. Second, the input layer combines several continuous network status screenshots as a sample of combination DDoS cyberattack. Third, ADE normalizes this sample to a standard input for the following 4th step of convolution operation and the 5th step of subsampling. The 6th step is to flatten the sample to a 1D array as the input of the 7th step of the Adaptive Moment Estimation. The Adaptive Moment estimation is a gradient-based optimization of stochastic objective function, and it is applied to estimate the probability of the current network is under DDoS attack [10]. ADE then picks up the highest probability of current network status in the 8th step, and it executes the corresponding action in the 9th step. Afterwards, the award or punish of the selected action is feedback to the ADE in the 10th step.
The detail environment settings and the experiment results are demonstrated in the following.
Simulation environment and scenario setting
The topology of this experiment contains 65 routers and varying devices including bots as shown in the Fig. 4. The red part on the graph indicate that the devices in this area shall be protected based on risk management policy. In addition, the yellow part is the decoy area which is utilized to detect DDoS and defend against DDoS cyberattack. The white area is named the risk field where ADE can monitor but cannot control the DDoS traffic. The bots of the DDoS are dispersed outside the white area. Each bot will generate 100kbps DDoS traffic. The traffics of DDoS scenarios are randomly generated from these bots and flow to the decoy and core area to paralyze the service.

Experiment Topology.
The ADE architecture designed for this experiment is presented in Fig. 5, and the detailed setting in each layer is demonstrated in Table 1. in this case study, ADE first screenshots each network status as the input, and it uses 4 continuous screenshots as a sample of combination cyberattack action. The standard input size is a 3D array (80*80*4) for the following steps. In this experiment, ADE adopts two combinations of convolution and subsampling for feature extraction.

The flow chart of ADE in this experiment.
The settings of ADE in this experiment
To evaluate the effect of different scale of DDoS cyberattack, four different scales of DDoS cyberattack are emulated in a random fashion, that are 0.4Tbps, 0.8Tbps, 1.2Tbps, and 1.6Tbps. The DDoS traffics are tagged for ADE to learn to identify these DDoS traffic.
Figure 6 demonstrates the effects of different scale of DDoS attack on system recovery time. The 0.8Tbps DDoS attack clearly had the quickest recovery time (at 100 sec). Figure 6 also shows that as the scale of DDoS attack increased, the recovery time increased. The reason is that the number of bots increased when the scale of DDoS attack increased, and ADE required more time to track and detect the these malicious traffics.

Effects of Different Scale of DDoS Attack.
However, the detect and recover times of 1.2Tbps and 1.6Tbps DDoS attack are similar at 140 sec. This is because the scale of the network topology that ADE can monitor is limited to the risk field of 65 routers, leading that the DDoS traffics, which are from outside the risk area, aggregate in the risk field. In this case, since all DDoS traffic aggregated and may excess the network capacity of the risk field, the detect and recovery time were the same, whether or not the number of bots and the scale of DDoS attack was considered.
Based on above analysis, the risk assessment result of ADE can be list as below: Considering the scale of the risk field of 65 routers, the Recovery Time Objective (RTO) shall be regulated at least 140 seconds. The most suitable DDoS defense solution of which the scale shall be no more than 1.2Tbps.
The ADE can also be applied as an Intrusion-Detection System (IDS). To evaluate the effectiveness of detecting DDoS cyberattack, we first train the ADE by a 0.8Tbps DDoS cyberattack, and then test it through a different 0.8Tbps DDoS cyberattack in a random fashion.
Figure 7 presents that the ADE can detect a DDoS attack with 90% accuracy after sufficient training process. This shows that ADE can automatically learn the rules to detect DDoS attacks without manual operations.

Results of training and testing of ADE as an IDS.
ADE, as demonstrated above, applies AI learning time as an indicator to measure the effects of different DDoS without relying on manual methods. The business owners or network managers can evaluate their information assurance through observing the AI learning time. The business owners or network managers only need to provide Topology and network flow to ADE, and then ADE will provide the corresponding learning time for different scale of DDoS, as shown in Figure 8. For example, if the network manager wants to evaluate the improvements of the new security method, he/she just compares the learning times between the new method and current ones.

The usage flowchart of ADE.
The proposed ADE is capable to objectively and quantitatively evaluate reasonability of casual game design. ADE provides quantitative security risk analysis by using learning time as the control variable, network structure as the independent variable, and time to identify DDoS as the dependent variable. The learning time to detect DDoS event and recover the system is then applied to evaluate the scale of this DDoS, the reasonability of the regulated RTO, and the vulnerability of the current net-work topology and the improvement due to the new security solution. The experiment results demonstrate the contributions of ADE are (1) providing objective and quantitative analytical security risk assessment indicator, and (2) providing an autonomic DDoS defense framework without any manual intervention which allows cloud computing and Internet of Things company focuses on their service and leaves security defending to ADE.
Currently, ADE is only applied to evaluate (1) the scale of this DDoS, (2) the reasonability of the regulated RTO, and (3) the vulnerability of the current network topology and the improvement due to the new security solution. We will extend our work to load-balance in ISP level network [17, 18].
Funding
The author would like to thank the Ministry of Science and Technology of the R.O.C., for ?nancially supporting this research under Contract No. MOST 108-2221-E-227-002-, MOST 109-2221-E-227-001-, and MOST 109-2218-E-011-007-.
