Abstract
Advanced Persistent Threat (APT) is a dangerous network attack method that is widely used by attackers nowadays. During the APT attack process, attackers often use advanced techniques and tools, thus, causing many difficulties for information security systems. In fact, to detect the APT attacks, intrusion detection systems cannot rely on one technique or method but often combine multiple techniques and methods. In addition, the approach for APT attack detection using behavior analysis and evaluation techniques is facing many difficulties due to the lack of characteristic data of attack campaigns. For the above reasons, in this paper, we propose a method for APT attack detection based on a multi-layer analysis. The multi-layer analysis technique in our proposal computes and analyzes various events in Network Traffic to detect and synthesize abnormal signs and behaviors in order to make conclusions about the existence of APT in the system. Specifically, in our proposal, we will use serial 3 main layers for the APT attack detection process including i) Detecting APT attacks based on analyzing abnormal connection; ii) Detecting APT attacks based on analyzing and evaluating Suricata log; iii) Detecting APT attacks based on analyzing behavior profiles that are compiled from layers (i) and (ii). To achieve these goals, the multi-layer analysis technique for APT attack detection will perform 2 main tasks: i) Analyzing and evaluating components of Network Traffic based on abnormal signs and behaviors. ii) building and classifying behavior profile based on each component of network traffic. In the experimental section, we will compare and evaluate the effectiveness of the APT attack detection process of each layer in the multi-layer analysis model using machine learning. Experimental results have shown that the APT attack detection method based on analyzing behavior profile has yielded better results than individual detection methods on all metrics. The research results shown in the paper not only demonstrate the effectiveness of the multilayer analysis model for APT attack detection but also provide a novel approach for detecting several other cyber-attack techniques.
Keywords
Introduction
Problem
The APT attack is a dangerous network attack technique, creating challenges for security systems. Eric [1] introduced the definition and concept for terms in this attack method that are “Advanced”, “Persistent”, and “Threat”. In which, APT is an advanced, persistent attack and has clear purposes and objectives. This attack method is often used to attack important government agencies, organizations, and enterprises. The characteristics, processes, and life cycles of APT attacks often include spying, attacking, escalating privilege, stealing information, and clearing the traces [1, 2]. In [2, 3], the authors listed a number of methods and techniques which are often used by attackers in the execution and life cycle of APT attacks, including Malware, Spear-Phishing, Exploitation of known vulnerability, Zero-day vulnerability, etc.
The publications [2–4] presented the differences between the APT attack and other attack techniques. The differences in how to organize the attack, how to steal the data and clear the traces make APT become much more difficult to detect than other network attacks.
The problem of detecting APT attacks
The two main methods of detecting APT attacks that are commonly studied and applied are the signature-based method through the ruleset and anomaly-based method based on behavior analysis in order to find anomalies [4]. In which, nowadays, many researchers choose and propose methods of detecting APT attacks based on behavior analysis techniques using machine learning or deep learning. However, we realize that these approaches have 2 problems [2, 50]: Selecting and extracting features: to detect APT attacks, research directions often focus on analyzing and finding its features based on collected datasets [40, 50]. However, in the study [29, 72], Cho et al. pointed out that proposing and extracting typical features of APT attacks on some datasets of APT attack campaigns could bring good results in the experimental process but when applied in real data monitoring, the efficiency is not high. On the other hand, research [1] pointed out that the APT is an attack technique designed for specific objects and targets. For each target, attackers use different methods and tools to attack and steal data. Therefore, if only relying on a few APT attack campaigns and then selecting some features, it cannot express all the abnormal features in APT. Lack of data correlation: The reports [1, 2] pointed out that in order to effectively detect signs of APT attacks, it is necessary to focus on detecting their signs and behavior in each stage and life cycle of the attack campaign. Besides, in APT attack, because attackers often use a lot of advanced technology to steal information and remove traces, APT behavior is not much different from normal behavior. Therefore, it is necessary to find ways to calculate the correlation and link the behaviors in the system to find anomalies in the data.
Our proposal
From the problems presented in section 1.2, in this paper we propose a new approach to APT attack detection using a multi-layer analysis technique based on Network traffic. The multi-layer analysis method in our proposal is the parallel combination of 2 detection methods with 3 different layers including: Layer 1: Detecting the APT attack using the sign set. This is a simple detection technique because it only uses the signature database of APT attacks which have happened in fact. However, the Network Traffic components extracted from this layer will be used as the main input for layer 3. Layer 2: Detecting the APT attack based on the flow. In this layer, the Network Traffic data will be analyzed into the network flows and then the features are extracted from the flows to analyze in order to find out the signs of APT attacks. The result of layer 2 can be used as a conclusion of an ATP attack at this level and will also be used as an input for the next layer prediction (layer 3). Layer 3: Detecting the APT attack based on the behavioral profiles analysis technique. Based on the outputs from layer 1 and layer 2, a set of features will be built. It can be used as a behavioral profile of Network Traffic. After that, the machine learning technique will be applied to classify the profiles to determine if an APT attack has existed in the Network Traffic. To perform this task, layer 3 consists of the following phases: Phase 1: Analyzing and extracting features from Network Traffic components, which are the outputs of layer 1. In which, based on the Network Traffic components, such as Domain Name System (DNS), HyperText Transfer Protocol (HTTP), Transport Layer Security (TLS), we analyze and extract the behaviors of Network Traffic. Previous researches about APT attack detection often tried to find out and extract the typical behaviors of APT attacks from Network Traffic. However, these approaches require collecting a big data set for a long time. This leads to difficulties in data storage and management. Therefore, in this paper, we improved the old approaches by analyzing the Network Traffic into the components, then processing and extracting features based on these components. In this approach, we will combine all the behaviors of different events to draw a conclusion about the existence of APT attacks. Phase 2: Building the behavioral profiles. Based on the behaviors and features of Network Traffic components extracted in phase 1 and layer 2, the APT attack detection system will synthesize these signs and behaviors to build a behavioral profile for each event in Network Traffic. Phase 3: Classifying the behavioral profiles. After successfully collecting and building behavioral profiles of each event based on Network Traffic in phase 2, the APT attack detection system will classify these behavioral profiles. The result of classification will determine whether the behavioral profile is similar to the behavioral profile of the APT attack campaign or not.
Thus, our multi-layer analysis method for APT attack detection is a combination of the rule-based detection method by the Suricata tool and the abnormal detection method using machine learning. The parallel combination of the two detection methods and the division into different phases makes our APT attack detection proposal capable of real-time detection and monitoring. Besides, with this proposal, our method has solved the problems that we think need to be optimized are as follows: Regarding the problem of “Selecting and extracting features”: In our research, instead of focusing on the extraction of features and typical behaviors of APT attacks as traditional studies, we will focus on analyzing and evaluating components in network traffic and then extracting abnormal behaviors on these components. By doing so, we will collect the different signs and behaviors as much as possible as the basis for assessing behaviors of APT attacks. Regarding the problem “Lack of data correlation”: In this proposal, we will use many different layers to analyze and evaluate the data components in Network Traffic and then synthesize abnormal signs and behaviors into behavior profiles. The collected behavior profile will contain values representing the correlation and influence of different components in network traffic. These behavior profiles will get the APT attack detection system more basis for calculating, evaluating, and classifying APT attacks.
Contributions
The contributions of this paper are as follows: Proposing an APT attack detection method based on a multi-layer analysis. This is a novel model for APT attack detection which has not been studied and proposed before. In our model, we try to combine the APT detection techniques and methods based on abnormal signs and behaviors in order to detect APT attacks in the system as much as possible. The structure of the model is presented in section 3.2 of the paper. Based on our proposed model, the detection function of the model is based on multiple layers and inherits one another, the detection results of this layer as input of the next layer. Then they form a complete system that has correlation, capable of quickly and accurately detecting signs and behaviors of APT attacks. Proposing a new method for APT attack detection based on the behavior profile using machine learning algorithms. We have proposed a method to build APT attack behavior profiles based on the process of analyzing and synthesizing different components in network traffic. In particular, in this paper, the behavior profiles of APT attacks are built by aggregating, calculating, and extracting from layers 1 and 2. Our proposal not only partly solves the problem of lack of data about APT attack behavior but also contains many abnormal behaviors of different components in network traffic. This helps the detection system have a basis to calculate the correlation and effects of abnormal behaviors in order to improve the efficiency of the detection process. The experimental results in this paper have shown the superior effectiveness of the APT attack detection method based on analyzing behavior profile compared to other individual detection methods.
Related work
Detection of APT attacks based on Network Traffic component analysis
APT attack detection based on abnormal behaviors of DNS
In [7, 8], the authors listed the feature sets which can be used for malicious DNS detection. The selected feature sets include lexical, link popularity, webpage content, DNS answers, DNS fluxiness, Network features, etc. Weina Niu et al. [9] introduced a method for APT attack detection based on Mobile DNS Logging using 4 sets of features, which are DNS request, answer-based features; Domain-based features, Time-based features and Whois-based features. With the selected feature sets, the authors used a number of machine learning algorithms such as Global Abnormal Forest, k-Nearest Neighbor (KNN) to detect APT Malware. G. Zhao et al. [10] used 5 main groups of features (Domain-based features, Time-based features, whois-based features, DNS answer-based features; Active probing features) and used J48 Decision Tree algorithm to detect APT malware command and control domains (C&C Domain). In [11], the authors used 3 main groups of features to detect the domain APT, which are Domain name lexical features; Ranking features; DNS query features and Random Forest (RF) algorithm.
Besides, Yan et al. [30] proposed the method of using the Convolutional Neural Networks (CNN) deep learning algorithm to detect APT attacks based on DNS Activities. Accordingly, the authors extracted two main groups of features: Domain Name-based Features; Feature of the Relationship between DNS Request Behavior and Response Behavior on a dataset of 4,907,147,146 pieces of initial data of 47 days DNS request records of Jilin University Education Network combined with CNN algorithm to detect APT attack behavior.
Zongyuan et al [51] proposed a method to detect APT attacks on mobile devices based on DNS log analysis using the machine learning algorithm. The authors indicate that there is a big difference between the DNS of APT attack malware on mobile devices and computers. Therefore, the authors proposed a process of detecting DNS APT including checking the difference between the DNS of the mobile device and the DNS of the computer; selecting and extracting following features: Total Number of Visits, Number of Accessing Hosts; Domain Length; Solitariness of Access; Repeated Request; Time of Connection; Domain Structure; Access Regularity; and Independent Access. In addition, the data used in the experimental section collected from the university network consisting of 174,011 mobile access points and 49,717 host access points. These access points generated approximately 26 million DNS request records every 24 h. The raw data contained only a few malicious samples. To enhance the reliability of the results, three types of malicious samples were added to the raw data. The samples included 121 samples with APT features, 94 samples with MAPT features, and 107 samples with mixed features.
There are also some other approaches for detecting malicious domains that support APT attack detection, including Vinayakumara et al. [41] used deep learning algorithms; and Nguyen [42] proposed using neutrosophic sets.
Ru Zhang et al. [63] proposed an APT attack detection method based on analyzing DNS and TCP traffic. Specifically, in their study, the authors proposed two features: C2Load_fluct (response packet load fluctuation) and Bad_rate (bad packet rate). Regarding the DNS features, in addition to adding the C2Load_fluct feature, the authors used 10 more features that had been published. Regarding the TCP features, the author added 4 features: Bad_rate, Upload_numRate, Upload_loadRate, Port_abnormal. In the experimental section, the author’s proposal achieved the F1-score of 0.98 and 0.94 respectively on 2 datasets.
APT attack detection based on other components of Network Traffic
G. Zhao et al. [10] used a method based on DNS log and Network Traffic to detect the computers which were attacked by APT in the system. After detecting the APT attack on both DNS log and Network Traffic, the authors used the correlation analysis technique to detect the addresses of computers infected by APT malware. However, in the paper, the authors did not explain the correlation method for determining the addresses of infected computers. Jiazhong Lu et al. [12] proposed a correlation analysis model using machine learning to detect the APT attacks based on Network Traffic. However, the proposed correlation model was mainly based on the calculation and normalization of flow features in Network Traffic. Ru Zhang et al. [13] also used a correlation analysis technique to analyze the correlation of events in the life cycle of APT attacks. Event groups detected by IDS then would be classified by using Intrusion Kill Chain and Fuzzy Clustering. However, this method requires a huge volume of data and a long time to collect enough data. Ivo Friedberg et al. [14] used a technique that analyzes the correlation between the signs of the Search-Pattern, Event Classes, Hypothesis and Rules of Network Traffic to detect the signs of APT attacks. In [4], the authors introduced a method to detect the APT attacks based on the correlation analysis of components of flow. In [11, 73], the authors used 3 sets of features to detect the domain APT, which are Domain name lexical features; Ranking features; DNS query features and RF machine learning. To calculate the correlation between malicious domains and APT domains, the authors used Suricata tool to monitor malicious domains. Wim Mees et al. [15] proposed a multi-agent anomaly-based APT detection method using the abnormal signs and behaviors set to detect the suspicious HTTP transactions, suspicious DNS requests.
Sharma et al. [16] proposed a distributed framework architecture for the detection of APT based on multiple classifiers. The classifiers performed the classifications on the events in distributed environments and the event correlation among those events. At the final step, the system applied a voting scheme to provide the result of APT attack prediction. The experiment results showed that the proposed approach achieved greater effectiveness. Wen-Lin Chu et al. [18] proposed a method for APT attack detection, experimented on the NSL-KDD data set, using the Support Vector Machine (SVM) algorithm. The authors also used the principal component analysis to optimize the experimental data set. However, the use of NSL-KDD to detect the APT attack is not appropriate, because this data set is only used for abnormal detection in networks [19]. The APT attack’s characteristics are much different from NSL-KDD [2, 21]. In the study [31], Nkiruka Eke et al. proposed an APT attack detection method based on KDD 99 data set and deep learning algorithms such as Long short term memory (LSTM), Recurrent Neural Network (RNN), Gated Recurrent Unit (GRU). The experimental results show that the deep learning algorithm has higher results than other machine learning algorithms such as SVM, KNN, RF Classifier, and Logistic Regression.
Some other approaches to anomaly detection in Network traffic used for general network attack detection and APT attack include: a network anomaly detection algorithm using the Mahout Classifier proposed by Peng et al. [43]; a clustering algorithm to optimize the network anomaly detection process proposed by Huang [44]; using Convolutional Neural Network deep learning algorithms for anomaly detection based on NSL-KDD dataset proposed by Wang et al. [45].
In study [66], Joloudari et al. proposed the Multi-layered neural network model for APT attack detection based on the NSL-KDD dataset. In the experimental section, the authors compared the Multi-layered neural network model with some machine learning algorithms such as C5.0 Decision Tree, Bayesian network. Experimental results showed that Multi-layered neural network model gave better results than the remaining algorithms. Specifically, Multi-layered neural network achieved the accuracy of 98.85%, and the false positive rate of 1.13%.
Several approaches of detecting APT attacks based on life cycle
Kexin et al. [32] proposed the HERCULE model for APT attack detection based on analysis and evaluation of automatically collected logs. Accordingly, the HERCULE system builds multidimensional weighted graphs by comparing the collected logs with the available ones. In the experimental section, the authors used a graph algorithm for monitoring and detecting 15 known attack campaigns.
Yonghwi et al. [33] proposed a model of detecting anomalies and APT attacks based on the Causality Inference method and results in the monitoring log. Accordingly, the MCI model proposed by the authors gave good results for analysis, evaluation, and detection. The experimental results showed that the generated models can recover causality with 0% false-positives (FP) and false-negatives (FN) for most programs and only 8.3% FP and 5.2% FN in the worst cases.
Ibrahim et al. [47] proposed the APT attack detection method based on multi-layer analysis technique using Hidden Markov Models. In that study, the authors used Hidden Markov Models to analyze and evaluate the correlation between warnings to draw a conclusion about APT attacks. The experimental results in the study show that the accuracy of the detection model is at least 91.80%. In addition, it predicts the next step of the APT campaign with an accuracy of 66.50%, 92.70%, and 100% based on two, three, and four correlated alerts, respectively.
Zimbra [48] proposed a model to detect APT attacks on multiple stages based on semi-supervised learning. In this study, the author used data from a large-scale enterprise network consisting of 17,684 hosts from the Los Alamos security lab. The final result is a ranked list of suspicious hosts participating in APT attack activities. The average detection precision of three APT stage is 90.5%. Lajevardi et al. [49] proposed the approach uses low-level interception and correlates operating system events with network events based on the semantic relationships that are defined between the entities in system ontology.
In the publication [53], Ghafir et al. proposed a MAPT model for APT detection using the machine learning algorithm. This model includes 3 main stages: Threat detection, Alert correlation, and Attack prediction. During the experiment, based on algorithms such as Decision Tree, KNN, SVM, Ensemble, and the network traffic dataset collected in the university, the authors proved that the accuracy of MAPT is 84.8%. Another solution for APT attack detection proposed by Alshamrani [55] is based on aggregating multi-source data to find out the abnormal behavior of suspicious users as well as optimally select an appropriate countermeasure.
Besides, in the studies [34–36], the authors proposed a model of detecting and tracking APT attacks based on the process of tracking and monitoring different components in the access log. In addition, Jethva et al. [46] proposed a multi-layer analysis model for detecting malware in the network.
Xueyuan Han et al. [56] proposed the UNICORN model for APT attack detection using graph analysis technique with time. Specifically, the authors used graph analysis techniques to analyze and evaluate in order to find out the abnormalities in data collected over a long period of time. These abnormalities are a basis for conclusions about APT attacks. In the experimental section, the authors experimented and evaluated the UNICORN model for APT attack detection based on scenarios of this attack technique.
In the study [58], Bodström et al first proposed an APT attack detection model based on the combination of deep learning networks. In this proposal, the authors used many different deep learning layers. Each layer has its own functions and tasks. Specifically, the model proposed by the author includes 5 layers. At the 4th layer, the author proposed using some deep learning algorithms such as Recurrent Neural Networks, LSTM; Growing Hierarchical Self-Organising Map; Graph-based Neural Network, Graph Database. However, in this paper, the authors did not conduct experiments to evaluate the effectiveness of the model as well as the effectiveness of each algorithm in the proposed model.
Chadza [64] proposed Hidden Markov Models to detect MultiStage Network attacks and APT attacks. In the experiment section based on the DARPA dataset, the authors showed the effectiveness of the Hidden Markov Models model combined with the snort rules: it improved the accuracy by 44.95%. In the study [65], the authors proposed an APT attack detection model based on observations and learning between two players on different stages and layers. Besides, the authors proposed an iterative algorithm to compute the perfect Bayesian Nash equilibrium and used the Tennessee Eastman.
Bhatt [66] proposed a framework for detecting and preventing APT attacks based on three main layers: Multi-stage Attack Model; Layered Security Architecture; Security Event Collection and Analysis System using Intrusion Kill-Chain and Apache Hadoop. Besides, the proposal [70] presented the method of detecting APT attack based on analyzing alerts correlation and causal. Accordingly, regarding the process of analyzing alerts correlation and causal, the authors proposed 4 layers. Regarding the detection process, the authors used real-time analysis methods. In the experimental section based on the experimental data, the authors proved that their proposal is better than MLAPT models [53].
APT attack detection based on flow analysis
In paper [3], the authors proposed a method for APT attack detection based on the analysis of abnormal behaviors of flow in Network Traffic. This method includes the process of extraction, normalization, and analysis of abnormal values of three groups of signs in flow which are numbytes, numflows, and numdst. Şerif Bahtiyar [5] proposed an APT attack detection method based on the process and life cycle of this attack. In this method, the author used a technique which analyzes the correlation of abnormal behaviors of flow in each period of an APT attack campaign. Also in [5], the author proposed an FD-APT model using 5 levels to analyze the abnormal behaviors of flow in order to detect the signs of APT attack. Besides, Andrew Vance et al. [6] used measures non-signature -based traffic and involved flow-based measurements and applied a statistical for detection APT attack.
Cho et al. [29] proposed an APT attack detection method based on the flow network using deep learning. Accordingly, in their research, the authors used some deep learning algorithms such as Multi-layer perceptron (MLP), Graph Convolutional Network (GCN), and BiLSTM-GCN model that combines Bidirectional Long Short-Term model Memory (BiLSTM) with GCN in order to analyze and re-represent information of APT attack IP based on network flow. The experimental results in the paper show that the BiLSTM-GCN combined deep learning model gives the best results on all metrics. We noticed that the approach of Cho et al. is good and reasonable, but it requires a large and cumbersome computational system to implement.
Some related works about APT attack detection technology
In the study [56], Ankang Ju et al. proposed an APT attack detection system in enterprises based on big data technology platforms. The author’s proposed method is based on the process of analyzing, evaluating and standardizing data and then using many different analysis layers to conclude about the attack in the system. In particular, the research team used different layers including Sensing, Event, Alert, Context, Scenario. Regarding the correlation analysis process, the authors use 4 main analysis methods: Event-Event Correlation; Alert-Alert Correlation; Pattern-Knowledge Correlation; Alert-Context Correlation. However, in their study, the authors did not evaluate the proposed model based on the experimental dataset. Phuong Cao [59] proposed the PULSAR framework for APT attack detection based on the probabilistic graphical model. Specifically, the PULSAR framework makes decisions about signs of APT attacks based on layers such as statistics of event patterns from attacks in the past and then aggregates these patterns into graphs. In experiments, the PULSAR framework achieved the accuracy of 91.7% on a dataset consisting of 120 APT attacks that took place over 10 years. In the study [60], Zhang et al. proposed a deep learning model combining Multiscale Convolutional Neural Network with Long Short-Term Memory (MSCNN-LSTM) to detect network anomalies and APT attacks based on the UNSW-NB15 dataset. In particular, the authors analyzed the UNSW-NB15 based on the Integration of Spatial-Temporal Features dataset in order to analyze and seek traces of APT attacks.
Cosimo [61] proposed the Autoencoder network for detecting cyber-attack based on the NSL-KDD dataset. To evaluate the efficiency of the Autoencoder network, the author compared and evaluated the Autoencoder network with some other studies using the algorithms such as LSTM, MLP, linear SVM, quadratic SVM, Discriminant Analysis with linear and quadratic discrimination function. Based on experimental results, the author proved that the Autoencoder network gives better results than other methods. Likewise, Alsaheel et al. [62] proposed the ATLAS model with a combination of causality analysis, natural language processing, and machine learning techniques to build a sequence-based model, which establishes key patterns of attack and non-attack behaviors from a causal graph. In the experimental section, the ATLAS model gave an average accuracy of 91.06% precision, 97.29% recall, and 93.76% F1-score when conducting experiments for detecting 10 APT attack campaigns in a virtual environment. In addition, Hana [68] proposed the APTMalInsight framework for APT attack detection based on the process of calculating and evaluating system call information of the APT malware. Specifically, in their research, the authors focused on comparing and evaluating the system calls of the APT malware and the normal malware in order to compute and present the context of the APT attack. In the experimental section, APTMalInsight provided high efficiency for detecting and clustering: accuracy can reach up to 99.28% and 98.85% respectively. In addition, the MAAC model [69] is proposed based on combining the process of computing correlation of alarms and multi-layer analysis with different stages in the attack scenario. Besides, the document [71] published a number of products, hardware and software solutions of security companies for APT attack detection based on the development of each phase of this attack. Specifically, according to the statistics of 2020, the APT attack detection solutions of companies such as Symantec, Cisco, Kaspersky, and Eset are used by many organizations. In addition, the document [71] also described the product’s properties as well as gave assessment of the advantages and disadvantages of each solution. We noticed that currently, most security solutions all try to detect and prevent APT attacks in each phase and life cycle of the attack campaign.
Our approach
In this paper, we propose to use the RF supervised machine learning algorithm to perform the process of detecting abnormal behavior of APT attacks in lay-ers 2 and 3. The RF algorithm is proposed in this study because this algorithm is proven to be one of the best supervised classification algorithms currently [2]. In addition, we also choose to use some other classification algorithms and models, namely SVM and MLP. In which, the SVM algorithm is also one of the widely used classification algorithms, and the MLP algorithm is one of the deep learning models early studied and applied in the classification prob-lems. Both of these algorithms have been proposed for use in the task of classifying and detecting APT attacks in other studies. We choose to use two addi-tional algorithms (SVM and MLP) in this study be-cause we want to compare and evaluate the effec-tiveness of these algorithms with the proposed algorithm. In the next content in section 3.3, we will describe in detail the mathematical basis and operating principle of algorithms SVM, RF, and MLP model.
Multi-layer analysis model for APT attack detection
Figure 1 shows the multi-layer model for APT attack detection based on Network Traffic.

Multi-layer analysis model for APT attack detection.
The main components of our proposed model are as follows (Fig. 1):
Based on the analysis and evaluation presented in the multi-layer analysis model for APT attack detection using Network Traffic in section 3.1, we designed the workflow of APT attack detection system as illustrated in Fig. 2.

Workflow of the multi-layer model for APT attack detection based on Network Traffic.
Figure 2 describes in detail the elements of the APT attack detection model based on multilayer analysis techniques. In Section 3.2, the components and principles of detecting APT attacks in each layer are discussed in details.
Suricata is a security tool that strongly supports for detection of network attacks, including APT attacks [22]. In this paper, we use the Suricata tool with the APT attack database collected previously as the base for the detection system at layer 1. In this layer, the Network Traffic data will be monitored and analyzed by Suricata tool. Suricata tool will analyze the Network Traffic into components and compare them with the pre-built signature database. In [11], the authors setup and configured the Suricata tool to monitor the Network Traffic to detect the signs of APT attacks, including malicious Domain, IP, and URL. If an APT attack is detected, the detection system will send the alerts to administrator. Furthermore, the collected Network Traffic components will be sent to the next layer to analyzed and detect the APT attack signs. The layer conducting the next analysis is the layer 3 in the multi-layer model.
Layer 2 - APT attack detection based on flow analysis
As illustrated in Fig. 2, to detect the abnormal connection, there are two main tasks as follows: Phase 1: Analyzing and extracting the features of flows. The Network Traffic data will be extracted into flows and behaviors of flows using the CICFlowMeter by IP pairs (source and destination IP addresses). The flow features then will be extracted from these flows. There are 78 features extracted from flows using CICFlowMeter tool [23, 29]. The features fall into the following types: FlowID, SourceIP, DestinationIP, SourcePort, DestinationPort, and Protocol, Total Packet Length/Max Length/Min Length/Average Length, etc. Phase 2: Detecting the abnormal flows. Based on the features extracted from the flows and the labels are derived from the IP addresses, the flows will be classified to be malicious or not using machine learning methods. From the flow classification result, for each IP, the malicious flow ratio will be calculated and if it is over a threshold, the IP will be determined as an APT attack and the detection system will send the alerts to the administrator. The malicious flow ration also will be used as an important feature for the prediction process in layer 3. The machine learning methods experimented in this task are RF, SVM, and MLP. Although there are many machine learning algorithms, and each of them may have better results on some specific cases, some methods, including the above three methods, have been experimented with and proved to produce good results in the same field [7–11, 29]. Therefore, we decided to use them in our paper. The details about these methods are explained in Section 3.3.
Layer 3 - APT attack detection based on behavioral profiles analysis technique
As illustrated in Fig. 2, Suricata log consists of components such as DNS log, Http log, TLS log, File log, Alert log. These components will be processed and extracted to generate the features.
a) DNS log
DNS log extracted from the Suricata log is the DNS query information that Suricata collected from Network Traffic. From the DNS log, the DNS features will be extracted. The typical features group are Domain name lexical features, Ranking features, DNS query features, etc. In this paper, we propose 25 features to detect APT domains based on the DNS log in Suricata [11]. In the study [11] has presented details about the concept of features as well as how to extract these features.
b) HTTP log
HTTP log extracted from the Suricata log is the data about the HTTP accesses in the network. There are some signs of APT attack which can be revealed in HTTP log data, such as the behaviors of C&C server communication or file download, etc. We used 8 features of this type, including the features related to protocol mismatch, strange HTTP user agent, max/min/average number of requests per day, max/min/average number of fail requests per day. Table 1 below lists features of HTTP that we recommend to use.
The list of features of HTTP log in Suricata
The list of features of HTTP log in Suricata
c) TLS log
TLS log contains the data about the TLS/SSL communication of HTTPS protocol. The features of this type will help to detect the unusual HTTPS connections. There are 4 TLS log features, including self-signed TLS or not, max/min/average number of TLS heartbleed malformed record, max/min/average number TLS handshake per day, Number fail TLS handshake per day. Table 2 below lists the features of TLS in the Suricata log that we suggest to use and extract.
The list of features of TLS log in Suricata
d) File log
File log in the Suricata log shows the data about the files, which are sent or received in the network. The information about the files includes hash, size, format, etc. The features that can help to reveal the unusual behaviors in the file sending or receiving are file extension mismatch or not, file executable or not, file in backlist or not (3 features). Table 3 below lists the features of the file extracted in the Suricata log.
The list of features of FILE log in Suricata
e) Alert log
Alert log in Suricata shows the warnings that Suricata detected. These warnings showed the abnormalities in Network Traffic. Based on these data, we extracted 27 features about the alert log, including the warnings type such as potentially bad traffic, information leak, and denial of service, etc. Details of Alert’s feature groups are described in Table 4.
The list of features of Alert log in Suricata
Comments: Based on the features extracted from the above Network Traffic components and the feature obtained from layer 2, we build a feature set which contains all of these features on each IP (corresponding to a local or external computer), which can be considered as behavioral profile of that IP (or computer). Table 5 shows the summary about the features used in layer 3.
Features summary
The final task in this layer is also using the machine learning methods for classify the behavioral profiles. The machine learning methods used in this section are also the methods which were used in layer 2 and are explained in the Section 3.3.
In this paper, we propose to use the RF supervised machine learning algorithm to perform the process of detecting abnormal behavior of APT attacks in lay-ers 2 and 3. The RF algorithm is proposed in this study because this algorithm is proven to be one of the best supervised classification algorithms currently [2]. In addition, we also choose to use some other classification algorithms and models, namely SVM and MLP. In which, the SVM algorithm is also one of the widely used classification algorithms, and the MLP algorithm is one of the deep learning models early studied and applied in the classification prob-lems. Both of these algorithms have been proposed for use in the task of classifying and detecting APT attacks in other studies. We choose to use two addi-tional algorithms (SVM and MLP) in this study be-cause we want to compare and evaluate the effec-tiveness of these algorithms with the one we propose to use. In the next content in Section 3.3, we will de-scribe in detail the mathematical basis and operating principle of algorithms SVM, RF, and MLP model.
SVM
SVM is a supervised machine learning method, introduced by Vapnik [25]. For simplicity, consider the binary classification first, then extend to multi-class classification problem. The basic idea of SVM is to construct a border which separate the data samples into two parts, corresponding to two classes, so that the distance from the training samples to the border are the farthest possible [25, 26].
A linear function which discriminates two classes will be in following form:
w ∈ R
m
is a weight vector or standard vector of hyperplane. b ∈ R is deviation. φ (x) ∈ R
m
is the feature vector, φ is the mapping function from input space to feature space.
Let’s say the input data set includes N samples {x1, x2,...,x N }, with the labels vector is {t1, ... ,tN}, in which t n ∈ { - 1, 1 }.
SVM approach to solve this problem is based on a margin concept. Margin is the minimum distance from the hyperplane to every data point or the distance from the hyperplane to the nearest point, and the best hyperplane is the one that has max margin.
The formula for distance from data point to the hyperplane is as follow:
Supposed that the hyperplane divides the training data set into two separate classes, then t
n
y (x
n
) > 0. Therefore, the distance from xn to the hyperplane can be rewritten as the follow:
Margin is the distance to the nearest point xn in the data set, and we want to find the optimal values of w and b by maximizing this distance. This problem can be rewritten as the below formula:
The problem of maximum optimization w-1 can be converted to the problem of minimum optimization of w2 and by adding the Largange factors, the above formula becomes:
In which a = (a1, …, a N ) T are Lagrange factors.
After a number of transformations, such as calculate the derivatives by w and b, then calculate w and b and do the substitution, it will lead to the following optimization problem:
In above formula, the kernel function is defined by k (x n , x m ) = φ (x n ) T φ (x m ). Note that all the points which are not on the margin will not affect to the value of objective function, because we can choose a n = 0. The remain data points (a n ≠ 0), called the support vectors, are interested in the process of SVM training. The classification of a new data point only depends on the support vectors.
We can determine the parameter b based on the support vectors. Although by using only one vector x
n
we can find out the value of b, but to ensure the stability, b will be calculated by average values based on all the support vectors.
In which, N S is the total number of support vectors.
In the case of multi-class classification, we can build the classification process based on a number of binary-class classifications or build k linear functions y k (x) similar to the above function.
SVM has a main advantage that it can process a huge number of features but no need to reduce them to avoid the over-fitting problem. This characteristic is very useful when solving problems which have big number of dimensions.
RF is an ensemble learning algorithm using the subsets of data and subsets of features to build the decision trees. RF build multiple trees and combine them to produce the final result which has better accuracy.
RF is derived from the tree bagging algorithm in the way to build the trees on the random subsets of data, then extended by using the random subsets of features [27].
For example, we have a training data set which includes N samples vectors {x1, x2,...,x
N
} with the corresponding labels {t1, ... ,t
N
}. The tree bagging algorithm will build the tree repeatedly from the randomly selected samples in the training data set. After the trees have been built, the prediction of new data will calculated by the average of predictions of built trees (or by voting mechanism) [26, 27].
In which, B is the number of repetitions, f b is the tree at step b, x’ the new data sample.
RF improved the above algorithm that each time a tree is built on the random subset of data, RF continue to select a random subset of features from original feature set. The number of selected features is arbitrary. However, the typical number of features for a classification problem with p original features is √p, while with the regression problem is p/3.
The studies [18, 29] presented the idea of applying MLP algorithm for APT attack detection. In this paper, we use MLP algorithm for analyzing and evaluating signs and behaviors of components of Network traffic. The MLP algorithm is one of the artificial neural networks used more frequently in feature extraction and classification. Daniel Svozil et al. [37] detailed the architecture of an MLP network built to simulate how neurons work in the human brain. MLP networks usually have 3 or more layers, with 1 input layer, 1 output layer and more than 1 hidden layer. Accordingly, the formula for hidden layer is defined as formula (8) [37].
Where
In addition, Hassan Ramchoun et al. [38] analyzed the method of optimizing the structure and training process of the MLP network. In this paper, we will use the MLP model with multiple hidden layers, modify the number of units, and simultaneously use regularization techniques such as drop out with different scales to find the best architectures. The output of the MLP network is a two-dimensional vector corresponding to 2 classes of flow (that are APT and Normal). This vector will be passed through a Softmax network. In which, the Softmax function [39] described by formula (10) is presented below:
Data
The positive experiment data (attack data) was collected from 29 Network Traffic files in the Malware Capture CTU-13 data set which contains 6 types of malware from the APT attacks, including Andromeda, Colbalt, Cridex, Dridex, Emotet, and Gh0stRAT [28].
The negative experiment data (normal data) was collected from E-Government server of Quang Nam province [17] according to the scientific research project N° KC.01.05/16-20 of the Ministry of Science and Technology of Vietnam. This dataset was collected on July 27, 2019.
Table 6 shows the statistic of eeriment data which we collected and used in this paper:
Statistic of experiment data
Statistic of experiment data
Evaluation method
In this paper, we use cross-validation evaluation method with 5-fold to assess the machine learning model. Using this method, the data will be divided into k subsets (5 subsets in our case). The process will be repeated k times, in which one of the k subsets will be used as a test set and the other k-1 subsets are put together to form a training set. The final result will be the average value of all the k times. This method is generally better than the train test split validation method because it can reduce the bias and the variance as most of data will be used in training as well as validation set.
Classification methods
As mentioned in the Section 3.2.3, in this task we used three machine learning methods for the experiments, which are RF, SVM, and MLP. To obtain the optimal results of these methods, we conducted the experiments with a number of parameters to find out the best one, as follows: RF with the number of trees: 10, 50, 100. The SVM and MLP algorithms with parameters configured as in the research [18].
Experiment scenarios
In this paper, we have set up experiments based on three scenarios as follows: Scenario 1: APT attack detection based on flow. In this scenario, we use machine learning methods to classify the flows based on the extracted flow features as described in section 3.2.2. Then, from results obtained, we calculate the malicious flow ratio for each IP and if it is over a threshold, we can conclude that it is an APT attack. Scenario 2: APT attack detection based on the Network Traffic components collected from Suricata log. The Suricata tool itself uses a rule-based method to detect the APT attack from their log. However, in this scenario, we want to use the machine learning methods on the features extracted from the Network Traffic components extracted from Suricata log, such as DNS log, TLS log, HTTP log, File log, Alert log as mentioned in the section 3.2.3. Scenario 3: APT attack detection based on the behavioral profiles. In this scenario, we used the machine learning methods to classify the behavioral profiles, which are the combination of all features used in Scenario 1 and Scenario 2, as listed in Table 5.
Installation requirements and classification measures
Installation requirements
Software requirements: Python version 3.6; Spark version 2.3.0; Hadoop version 2.7; Java (JDK) 8; Spark-Suricata version 4.1.0 beta 1; Weka version 3.8; Ubuntu 16.04.4. Hardware requirements: RAM 8GB; CPU Intel Core i5 3.50 GHz.
Classification measures:
The following measures will be used in this paper to evaluate the accuracy of models:
TP - True positive: The number of malicious samples classified correctly. FN - False negative: The number of malicious samples classified as normal. TN - True negative: The number of normal samples classified correctly. FP - False positive: The number of normal samples classified as malicious.
Experiment results for malicious flow prediction
From the flow features, we used machine learning to predict the malicious flows. Next step, we set a threshold. If the malicious flow ratio of an IP (number of malicious flows/total number of flows) is over this threshold, that IP will be considered as malicious IP. Table 7 below shows experimental results of detecting APT attacks based on network flow using machine learning.
Experiment results for malicious flow classification
Experiment results for malicious flow classification
The results in Table 7 show that the RF algorithm which we proposed to use has given better results than the results of SVM [18] and MLP [18] algorithms on all measures. The RF algorithm with 50 trees provides the best results. The accuracy of the model is 94.97% and the ability to predict malicious flow correctly is 88.21%, while the ratio of false alarm is 4.0%. With the number of trees of 10 and 100, the RF still gives better results than the other two algorithms. With the SVM algorithm, when choosing c = 1.0 and gamma = 0.01, the classification results with kernels as Poly and Sigmod are quite low (accuracy is 92.41% and 91.22% respectively). When choosing kernel as Linear and RBF, the accuracy increased (93.40% and 93.01%) but is still significantly lower than the RF algorithm. The accuracy of the model using MLP ranges from 92.76% to 93.14%. We noticed that with the MLP algorithm, the more complex the network architecture is, the more number of hidden layers and the corresponding number of nodes, the better the learning ability of the model is and the more accurate the test results are. However, these results are still lower than results of the RF.
However, if only evaluating Accuracy and Recall measures, the result of the RF algorithm is not good enough with these datasets. The reason for this problem is that the dataset of APT attack has many IP pairs in which contains very little flow or contains a very large amount of flow. This leads to an imbalance in the dataset. Next, we will use the model of the RF algorithm to assign labels for IPs. With threshold 0.8, we obtained the result for APT IP prediction as presented in Table 8. All of the measures are still good when we convert to APT IP classification based on thresholds. The experimental results in Table 8 show that the RF algorithm has well classified IPs into APT attack IP and clean IP, although the dataset has a huge difference in the number of IP (the number of clean IPs is nearly 10 times the number of APT IPs). With the results of the IP classification on this dataset, we think that it is possible to combine the RF algorithm with behavioral features of the flow in order to apply in APT attack detection and monitoring systems.
Results for APT IP prediction based on malicious flow ratio with threshold 0.8
Table 9 shows the results of APT attack detection based on the process of analyzing the Suricata log. From Table 9, it can be seen that the MLP [18] and SVM [18] algorithms still give the lowest results on all measures. Specifically, the model has the lowest accuracy when using SVM with Kernel as Sigmoid (89.60%). The MLP algorithm generally gives better results than the SVM algorithm but is still significantly lower than the RF algorithm. The RF algorithm with 50 trees still gives the best results. These results are consistent with recent surveys and evaluations on the application of RF algorithms in the classification problem. On the other hand, the results in this experiment have better accuracy than the accuracy when using the flow features. However, the Recall measure is lower, showing that the ability to detect malicious IP is worse. In contrast, the low FPR means that this model has a lower false alarm rate. This suggests that a combination of two types of features can produce better results.
Results for APT IP classification based on Suricata log
Results for APT IP classification based on Suricata log
The experimental results in Table 10 show that the RF algorithm gives better results than SVM [18] and MLP [18] algorithms on all measures. At the same time, comparing Tables 8–10, it is clear that the classification results when using the behavioral profiles, which are the combination of flow and Network Traffic components in Suricata log features, are better in most of measures, including the overall accuracy (Accuracy: 96.70%), ability to detect the APT IP (Recall: 89.29). The false alarm rate is at the acceptable rate (FPR: 02.17).
Results for APT IP classification based on the behavioral profiles
Results for APT IP classification based on the behavioral profiles
The experiment results with three algorithms and three scenarios showed that our approach for the problem of APT attack detection is correct and appropriate for the data set. Our proposed multi-layer model, in which each layer can produce the prediction result and also provide input for the final layer, can provide a reliable solution for the APT attack detection problem.
From the experiment results, we noticed that using a separate solution at each layer (based on flow or based on the Suricata log) has both advantages and disadvantages. The solution based on flow has advantages in the ability to detect the APT IP (Recall measure) but has disadvantages on false alarm rate (FPR measure). Meanwhile, the solution based on the Suricata log is not good at the ability to detect the APT IP (Recall) but better at false alarm rate (FPR). The combined solution can improve the weak points of individual solutions and produce better results (increase the ability to detect the APT IPs and keep the false alarm rate at an acceptable level).
Conclusion and future direction
In this study, we have solved the initial purposes. For the proposed model of APT attack detection, we use the combination of two detection methods that are based on sign set by Suricata tool and based on behavioral analysis by machine learning algorithms. With the flexible combination, the APT attack detection model has provided the ability to detect based on multiple layers and inherits each other, the detection results of this layer as input to the next layer form a complete system that has the correlation. The APT attack detection results of the model show that the proposal to apply a combination of detection methods based on the rule set and behavior analysis has brought good results. For the proposal of analyzing behavior profile for APT detection based on the multi-layer analysis technique, we have successfully built behavior profiles in Network traffic. The success in building the behavioral profiles helps to improve the ability to detect signs of ATP attack early. Especially, the experiment results in Table 10 showed the out-performance of behavioral profiles solution compared to the individual detection methods. With this result, our research has provided a novel approach to the problem of selecting and extracting features and abnormal behaviors of APT attack in the task of APT attack detection. This approach is that instead of trying to find typical abnormal behaviors of APT in the system, we should focus on analyzing, linking, combining, evaluating as much as possible what is in the system, and then from those discrete factors, should synthesize them into behavior vectors. From this argument, we believe that the APT attack detection model proposed in this paper can be studied and applied to the task of detecting other cyber-attacks as well as unauthorized intrusions such as DDOS, Botnet, spear phishing.
In the future, in order to improve the effectiveness of APT attack detection in this system, we will continue to research and apply deep learning algorithms for analyzing and evaluating behavior profiles. Besides, in addition to just using analysis and detection based on network traffic, we will seek ways to combine with other data collected from processes on the operating system kernel of user’s computers. This will help the system get a clearer and more detailed correlation.
Footnotes
Acknowledgements
This work has been sponsored by the Posts and Telecommunications Institute of Technology, Viet Nam.
