Abstract
In order to improve the accuracy of hierarchical network security situational awareness data fusion and shorten the fusion time, this paper proposes a hierarchical network security situational awareness data fusion method in cloud computing environment. The hierarchical model is established to obtain the hierarchical structure of data fusion. Hierarchical network security situational awareness data are collected and processed in parallel by cloud computing technology. According to the data collection results, the similarity between security events is calculated by using the clustering idea, and the similar security events are merged to achieve the purpose of removing redundant events. The hierarchical network security situational awareness data is fused by grey relational analysis. Finally, the simulation results show that the accuracy of data fusion of this method is high, up to 98%, and the fusion time is short, the longest is 13 s. Compared with the comparison method, this method has a better performance, indicating that this method is suitable for data fusion of hierarchical network security situation awareness.
Keywords
Introduction
While the network brings convenience to people, many security problems also follow, the number of intrusion security events is also rising, the ways of network attacks are constantly updated and increasingly complex, and a large number of purposeful network attacks are gradually emerging [1]. According to the “Research Report on Information Security of Chinese Internet Users in 2013” released by CNNIC, as of December 31, 2013, 438 million Internet users had encountered security incidents in the past year [2]. Network security situational awareness comes into being under this background. It grasps the current network operation status as a whole, senses the threats faced by the current network environment in real time, and predicts the future security trend [3].
As the current network information system is becoming larger and larger, decision makers need to use situational awareness tools to display the continuous changes of the current environment [4]. Data fusion under hierarchical network security situational awareness is to eliminate the redundancy of large-scale network data and reduce the dimension of large-scale data sets, so as to classify and integrate faster and better, so that network managers can quickly realize situational awareness. Therefore, how to establish an appropriate data fusion algorithm is an important topic of network security situational awareness. The data fusion algorithm is mainly data dimensionality reduction, and feature extraction and feature selection are two commonly used data dimensionality reduction methods [5].
Chen and Teng [6] proposed a wireless sensor network data fusion (LTDA) method based on hierarchical topology. The simulation data show that the proposed LTDA method reduces the average energy consumption, packet loss rate and end-to-end transmission delay. Wen and Li [7] proposed an information fusion method combining compressed sensing and global information data forwarding. It is found that the proposed method reduces the transmission volume of network nodes, the data recovery rate after fusion is high, and the network delay is low. However, the accuracy of the above two methods for network data fusion is low, resulting in poor fusion effect. Ge [8] proposed a grid system-based multi-channel information fusion method for wireless networks, which uses the correlation function in fuzzy theory to calculate the support of wireless network sensors. Wu et al. [9] proposed a BP neural network data fusion method based on heuristic firefly. However, the above two methods take a long time for network data fusion, resulting in low fusion efficiency.
Based on the above research results and their respective limitations, in order to improve the accuracy of hierarchical network security situational awareness data fusion and shorten the fusion time, this paper proposes a hierarchical network security situational awareness data fusion method in cloud computing environment, and the feasibility of the method is verified by simulation experiments. This paper first analyzes the hierarchical structure of data fusion and designs a hierarchical model to make it run stably under the management and coordination of cloud computing platform. By using cloud computing technology to collect network traffic, the distributed traffic collector can be managed in parallel to complete the collection process of network information resources. Using the idea of clustering, the similarity of security events is calculated, the similar security events are merged, and the redundant events are eliminated. The difference between events is calculated, the similarity of events is obtained, the logical relationship between events is mined, and the hierarchical network security situation awareness data is fused by grey correlation analysis method.
Method design
The overall design flow of the method
In order to improve the accuracy of hierarchical network security situational awareness data fusion and shorten the fusion time, this paper proposes a hierarchical network security situational awareness data fusion method in cloud computing environment [10]. The specific design process of this method is shown in Fig. 1.
Flowchart of method design in this paper.
As shown in Fig. 1, this paper first analyzes the hierarchical structure of data fusion and designs a hierarchical model to make it run stably under the management and coordination of cloud computing platform. By using cloud computing technology to collect network traffic, the distributed traffic collector can be managed in parallel to complete the collection process of network information resources. Using the idea of clustering, the similarity of security events is calculated, the similar security events are merged, and the redundant events are eliminated. The difference between events is calculated, the similarity of events is obtained, the logical relationship between events is mined, and hierarchical network security situation awareness data is fused by grey correlation analysis method.
According to the three levels of data fusion, a hierarchical model consisting of network traffic data acquisition layer, network traffic data feature fusion layer and network traffic decision fusion layer is designed. These three layers work under the management and coordination of cloud computing platform. The network traffic data acquisition layer is the basic layer of the model, which is responsible for collecting massive network data packets through multiple network node sensors distributed on large-scale complex networks, integrating them into network traffic according to five tuples, and calculating their statistical characteristics [11, 12]. The network traffic data feature fusion layer belongs to the feature layer fusion in the three-tier structure of data fusion, which is to reasonably and effectively reduce the dimension of the massive data collected by the network traffic data source layer, effectively eliminate the redundancy of large-scale network traffic data, reduce the complexity of classification model, and enhance the classification efficiency [13]. The network traffic decision fusion layer is decision-level fusion. Firstly, abnormal traffic is judged, and the attributes of each feature are extracted, identified or determined. After the preliminary conclusion is obtained, it is related and fused, and finally the attribute description of the target or environment is obtained, which can be used as the basis for making decisions. After the decision is made in the network traffic decision fusion layer, the data is transferred to the application layer, and the application situation is analyzed from the aspects of situation assessment, danger alarm, malicious flow detection, traffic engineering, etc. The hierarchical model is shown in Fig. 2.
Hierarchical model.
In order to better show the interaction process between various components in the hierarchical model, the corresponding sequence diagram is summarized, as shown in Fig. 3.
Hierarchical model component interaction sequence diagram.
Data layer fusion is the fusion carried out at the lowest layer. Before each sensor data is processed, it is directly carried out on the collected original data. The fused data information is required to come from homogeneous sensors. Data layer fusion can keep as much original data as possible, and provide the most accurate results. The corresponding limitations are also obvious. Due to the large number of processing, long time and poor real-time performance, the processing cost is higher. Data layer fusion is usually used for image analysis and understanding, multi-source image composition, multi-sensor remote sensing information fusion, l-homogeneous radar waveform synthesis, etc. [14]. The data layer fusion process is shown in Fig. 4.
Data layer fusion process.
Feature level fusion is the fusion at the middle level. It first extracts the features of the original data from the sensor (the feature information can be the target speed, direction, region and distance, edge, etc.), then associates, classifies and synthesizes the multi-sensor data according to the information features, and finally performs the fusion processing. Feature level fusion can be divided into two categories: target state fusion and target feature fusion. For target state fusion, the fusion system first completes the calibration of data through sensor data preprocessing, and then carries out fusion processing, mainly to complete the state estimation related to parameters [15]. For target feature fusion, the specific fusion technology is still the corresponding method of pattern recognition, but before fusion, the feature association processing must be realized to divide the feature vector into meaningful combinations.
Feature layer fusion process.
Decision level fusion gives the optimal decision according to certain criteria and decision reliability [16]. It first extracts, recognizes or determines the attributes of each feature to obtain a preliminary conclusion, then associates and fuses the results, and finally obtains the attribute description of the target or environment. The results are used as the basis for decision-making.
Decision level fusion process.
Because collecting massive network data on large-scale complex networks requires the use of scattered multiple network node sensors, and a variety of network traffic collection technologies are adopted, in order to effectively collect the required network information resources, cloud computing technology needs to be introduced to manage the distributed traffic collector in parallel [17]. The general process of parallel network information collection is to use the MapReduce parallelization architecture to make a single map function in the MapReduce programming model correspond to a single network information collection node, and call the traffic collection tool to complete the collection process of network information resources [18].
Cloud computing platform provides powerful and scalable computing power and is the supporting platform of the whole model [19]. As a parallel processing technology for massive data, MapReduce parallel computing framework can make efficient use of platform resources and realize automatic scheduling and allocation of tasks through mapping and reduction methods. Therefore, based on cloud computing technology, this paper uses bp-mi-mp algorithm to collect and process hierarchical network security situational awareness data in parallel, so as to realize the integration and consolidation of hierarchical network security situational awareness data with different data characteristics, different application purposes and different systems, so as to make further use of these information data [20, 21].
The specific steps of parallel collection and processing of hierarchical network security situational awareness data through cloud computing technology are as follows:
Discrete data. The discrete method used is equal distance dispersion method. Firstly, cluster division is implemented according to the actual distribution characteristics of proximity and eigenvalues, then cluster division is implemented based on the top-down splitting principle, and cluster consolidation is implemented based on the bottom-up merging principle until it is divided into a specified number. In the case of discretization, enumerations and switching variables do not need to be processed because they are originally discrete variables [22]. Matrix processing of monitoring data. If the acquisition and monitoring matrix of a sensor at a certain time is set as a vector, the expression of the matrix is:
In Eq. (1), Set the hierarchical network security situational awareness data monitoring matrix collected by a sensor at a certain time as
In Eq. (2), Set the value returned by multiple devices in a certain period of time to
Execution flow of map function and reduce function. The classification features are extracted by variable selection algorithm, and the input values are determined. The algorithm consists of following steps:
Select relevant variables. Firstly, the mutual information Remove the variables with weak correlation with the output in the relevant variable set through the backward algorithm [23]. The training prediction is implemented through the parallelization algorithm as follows: block all the samples to be trained, and receive the data blocks after blocking through the calculation node. The map function is used to batch train the hierarchical network security situational awareness data samples, and output the actual change of each network weight. Then, the overall change of the weight of each network is counted through the reduce function, the weight is adjusted, and whether to continue the iteration is determined [24].

The execution flow of map function and reduce function is shown in Fig. 7.
In order to reduce the time complexity of the collected hierarchical network security situation awareness data, the collected data can be preprocessed, the similarity of security events can be calculated by using the clustering idea, and similar security events can be merged to achieve the purpose of eliminating redundant events. The specific solution process is described in the following.
Using the idea of clustering, the merging of security events is realized by calculating the similarity between security events, so as to remove redundant events. The security events in the network do not exist in isolation, but connect each other via logical relationship. For example, the sequential logical relationship between the penetration stage and the attack stage in a continuous DDoS attack. Therefore, the key of data fusion is to find out the relationship between the security events.
Select source IP address, destination IP address, source port number, destination port number, event type, occurrence time and other attributes. The similarity of events is obtained by calculating the dissimilarity of events. In the calculation process, each attribute selects the corresponding attribute dissimilarity calculation formula, then calculates the dissimilarity between two security events through the security event dissimilarity function, and finally compares it with the set threshold to determine whether to generate a new event or merge with an existing event. The value range of dissimilarity is [0, 1]. Before streamlining events, a multivariate group needs to be defined to represent security events [25].
(1) Security event representation
Security events mainly include alarm events caused by intrusion or illegal operation of various security devices. It can be represented by a multivariate group:
where
(2) Calculation method of dissimilarity of different types of attributes
Arbitrarily select two security events
1) Enumerative variable
Select the
where
2) Boolean variable
Select the
where
3) Continuous variable
Let events
(3) Calculation method of dissimilarity of two events
Assuming that events
where
By calculating the dissimilarity between events, the similarity of events is obtained, and the logical relationship between events is determined. The closer the degree of dissimilarity is to 0, the more similar the events are. The closer the degree of dissimilarity is to 1, the less similar the events are. When a new event arrives, the correlation module will calculate the dissimilarity between the current event and each event in the whole security event database to mine the potential event relationship. If the dissimilarity meets a certain range, the correlation module will merge the new event into the existing event. On the contrary, a new event will be generated.
The grey correlation analysis method judges whether the relationship between the two series curves is close according to the similarity of the geometry of the series curves. Therefore, the grey correlation analysis method is adopted to fuse the hierarchical network security situational awareness data.
The specific calculation includes the following steps:
(1) Determine the evaluation index system. Collect the evaluation data and determine the reference sequence
Then determine the comparison sequence
(2) Dimensionless processing of reference sequence and comparison sequence.
To eliminate the influence of magnitude and facilitate calculation and comparative analysis, the indexes need to be dimensionless before using the grey correlation method.
(3) Find the correlation coefficient between reference sequence and comparison sequence.
The difference between curves can be used as a measure of the degree of correlation.
where
(4) Find the correlation degree
The correlation coefficient is the correlation degree value of the reference sequence and the comparison sequence at each time, so its number is more than one. Therefore, it needs to combine the correlation coefficients at each time into one value, and the correlation degree
where
(5) Relevance ranking.
The degree of correlation between factors can be expressed by the order of correlation degree, not just the value of correlation degree. The correlation degree values of M comparison sequences to the same reference sequence are arranged in order of magnitude to form the correlation sequence, which is recorded as
The grey correlation degree value obtained by the grey correlation model is the importance ranking of the influencing factors in the system, and the correlation degree value is normalized as the weight value of the influencing factors. According to the importance of attributes, the fusion processing of hierarchical network security situational awareness data is completed.
Simulation is conducted to validate the feasibility of the proposed hierarchical network security situational awareness data fusion method.
The simulation environment parameter settings are shown in Tables 1 and 2.
Hardware parameter settings
Hardware parameter settings
Software parameter setting
Simulation experiment environment.
Under the above simulation environment parameter settings, obtain hierarchical network security situational awareness data samples, including security events and redundant events, as shown in Fig. 9.
Data sample.
The proposed a fusion method, Literature [6] method and Literature [7] method are used to cluster the data samples. The results are shown in Fig. 10.
Sample clustering effect of three methods.
From the clustering results of the three methods in Fig. 10 and the data samples in Fig. 9, it can be seen that the proposed method can effectively separate the security events and redundant events in the sample data, while the security events obtained in Literature [6] method and Literature [7] method are still mixed with some redundant events. It shows that the hierarchical network security situational awareness data fusion method proposed in this paper has the best effect on sample clustering in cloud computing environment. The reason for this result is that the proposed method adopts the idea of clustering to realize the merging of security events by calculating the similarity between security events, so as to eliminate redundant events.
The proposed fusion method, Literature [6] method and Literature [7] method are comparatively analyzed in terms of fusion accuracy.
Comparison results of data fusion accuracy of hierarchical network security situational awareness.
By can be seen in Fig. 11, three methods of data fusion precision number not directly associated with the experiment, but on the whole, the precision of this method is always higher than 90%, far more than the Literature [6] method and the Literature [7] method of 70%, from the overall trend, the method of hierarchical network security situational awareness data fusion accuracy is higher. Specifically looking at the data under each method, when the number of experiments is 80, the proposed method achieves the highest accuracy of data fusion, which is 98%. When the number of experiments in Literature [6] method is 80, the accuracy of data fusion is the highest, which is 71%. When the number of experiments is 40, the data fusion accuracy of the method in Literature [7] is the highest, which is 70%, further indicating that the method in this paper has a high data fusion accuracy.
The proposed hierarchical network security situational awareness data fusion method, Literature [6] method and Literature [7] method were compared in terms of data fusion time.
Comparison results of data fusion time of hierarchical network security situational awareness.
It can be seen from Fig. 12 that the data fusion time of the three methods increases with the increase of the number of experiments, and the time consumed by the proposed method for data fusion is always lower than that of the method in Literature [6] method and Literature [7] method, and the time consumed by the method in Literature [7] method is always the longest. When the number of experiments is 20, the time consumed by the method in this paper is 6 s, while the time consumed by the method in Literature [6] method and Literature [7] method are 14 s and 17 s, respectively. When the number of experiments is 100, the time consumed by the proposed method is 13 s, while the time consumed by the method in Literature [6] method and Literature [7] method are 37 s and 42 s, respectively. The above data show that the proposed hierarchical network security situational awareness data fusion method consumes less time, indicating that compared with the comparison method, the data fusion time in this paper is generally shorter and has higher fusion efficiency.
The data of network security situational awareness comes from various types of security systems. These systems provide different types of security events and data, and there are too many redundant information in these massive information. If directly used for security assessment and prediction, it will bring unnecessary trouble to the data processing and have an influence on the real-time performance of the processing results. In this paper, in order to improve the hierarchical network security situational awareness data fusion accuracy, shorten the convergence time, this paper proposes a new cloud computing environment layered network security situational awareness data fusion method, analyzing the hierarchical structure of the data fusion, and designing the hierarchical model, in the cloud computing platform under the management and coordination of stable operation. Using cloud computing technology to collect network traffic, it can manage distributed traffic collector in parallel and complete the collection process of network information resources. Using the clustering idea, the similarity of security events is calculated, similar security events are merged, and redundant events are eliminated. The difference between events is calculated, the similarity of events is obtained, the logical relationship between events is mined, and hierarchical network security situation awareness data is fused by grey correlation analysis method. Finally, the feasibility of the proposed method is verified by simulation.
The final simulation results show that:
With this method, security events and redundant events in sample data can be effectively separated. This is because the proposed method adopts the idea of clustering to realize the merging of security events by calculating the similarity between security events, so as to eliminate redundant events. The fusion accuracy of the proposed method for hierarchical network security situation awareness data reaches 98%, which is much higher than the selected comparison method, indicating that the fusion accuracy of the proposed method for hierarchical network security situation awareness data is higher. In this paper, the layered network security situational awareness in the process of data fusion method in the simulation time in 13 s, and two methods of comparing the consumed time of 37 s, respectively and 42 s, its take significantly longer than the paper in the process of data fusion method, this paper data fusion time relative to the contrast method, generally shorter, have higher efficiency.
However, there are still some limitations in the research of this paper. For example, this paper does not specifically analyze the subtle differences in the results obtained by the same method for the fusion of hierarchical network security situational awareness data under different experimental times. In the subsequent research, we will take this as a starting point to deeply explore the reasons for the differences.
