Abstract
In order to overcome the existing abnormal big data intelligent detection method, the problem of low detection accuracy and poor convergence is not carried out without abnormal big data classification. A new Bayesian classification based heterogeneous network anomaly big data intelligent detection is proposed in this paper. method. Design an abnormal big data intelligent detection architecture, use TcpDump collection tool to collect and process heterogeneous network traffic data, and build the relationship between bottleneck traffic and abnormal big data based on the processed data, through Fourier transform The method obtains the data frequency information and uses the Bayesian network classification method to realize the intelligent detection of abnormal big data in heterogeneous networks. The experimental results show that compared with the traditional method, the proposed method greatly improves the detection accuracy, convergence and anti-interference, and fully demonstrates that the proposed method has better detection effect.
Introduction
Heterogeneous network is a project initiated by University of California-Berkeley, USA. It mainly overlaps and integrates different types of networks to form heterogeneous networks to meet the needs of terminal business diversification. In order to enable mobile terminals to connect with multiple network interfaces, mobile terminals are set up with multiple interfaces, which are called multi-mode mobile terminals. This type of network is called isomorphic networks. In recent years, with the rapid development of network technology, heterogeneous networks have been widely used in the field of wireless communication, which is called the future direction of network development. Heterogeneous networks can provide users with a variety of business services, which means that heterogeneous networks will contain a variety of frequencies and generate a large amount of data information. Due to the influence of network channel bandwidth and environmental factors, the interference data are contained. At the same time, due to the openness of the network, the network will often be attacked by abnormal data, which will have a great negative impact on the safe operation of heterogeneous networks. In order to ensure the normal operation of heterogeneous networks, the abnormal big data is detected intelligently [17].
As far as the existing research is concerned, although the domestic research on intelligent detection methods for abnormal big data is relatively late, many mature results have been achieved. Reference [9] proposes a method for big data fusion and anomaly detection based on deep learning. This method uses depth-constrained Boltzmann machine to map heterogeneous data of different formats into a unified embedded vector space, and achieves heterogeneous data fusion. In order to detect abnormal events in big data, a circular neural network is used to build a portrait of the embedded vector data. However, the detection accuracy of this method needs to be further improved. Reference [10] proposes a detection method for abnormal big data based on Multi-Evidence fusion decision. By introducing evidence theory, using principal focus element to distinguish false evidence and recalculate the weight of evidence, this method improves the processing method of conflict evidence, reduces the impact of conflict evidence on the decision-making results of Multi-Evidence fusion, and improves the accuracy of abnormal detection of measurement data in batch process, so as to construct an abnormal detection model for measurement data based on Multi-Evidence fusion. But the anti-interference performance of this method is poor. Reference [14] proposes a detection method for abnormal big data based on intrusion feature selection. This method extracts the data in network transmission channel by using the method of correlation dimension solution, and optimizes the extracted correlation dimension information features to realize the recognition and classification of intrusion information. By combining the fuzzy C-means clustering algorithm, the mining and detection of network anomaly data can be realized. However, the convergence of the method is only considered, which results in inaccurate detection results.
In order to solve the problems of the above methods, an intelligent detection method for abnormal big data in heterogeneous networks based on Bayesian classification is proposed. The overall scheme of the detection method is as follows:
An intelligent detection framework for abnormal big data is designed to determine the steps of abnormal big data detection; TcpDump is used to collect different kinds of network traffic data and analyze the characteristics of heterogeneous networks, which improves the efficiency of abnormal big data detection and prepares for intelligent detection of abnormal big data. The relationship between bottleneck and abnormal big data is built, to improve the detection accuracy of abnormal big data; Fourier method is used to extract frequency information of network traffic data to ensure the comprehensiveness of abnormal big data detection. The Bayesian classification method is used to classify and judge the abnormal big data in heterogeneous networks, so as to realize the accurate detection of abnormal big data in heterogeneous networks. Experimental results analysis and discussion.
According to the above detection scheme, realize the accurate detection of abnormal big data in heterogeneous network big data.
Intelligent detection method for abnormal big data in heterogeneous networks based on Bayesian classification
Architecture design of intelligent detection for abnormal big data
In order to solve the problems of existing intelligent detection methods for abnormal big data, an intelligent detection method for heterogeneous network big data based on Bayesian classification is proposed. Among them, the Bayesian classification algorithm used in this paper is a classification algorithm based on statistics, and the classification is completed through the knowledge of probability and statistics. This algorithm has a high practical application type, and has the advantages of simplicity, rapidity and high accuracy in the process of large-scale data operation.
Based on the multi frequency of heterogeneous network, the whole detection method collects and processes the traffic data of heterogeneous network, records the acquisition time of heterogeneous network data packets through acquisition tools, and processes the data packets accordingly, collects the characteristics of abnormal big data, and transforms the acquisition time of data packets into a unified time series, so as to reduce the number of data packets. At the same time, the relationship between bottleneck traffic and abnormal big data is built. The network traffic data is transformed into frequency representation by Fourier transform. Finally, the network traffic is detected by Bayesian classification method to separate the abnormal big data from bottleneck traffic. Then the architecture of the abnormal big data intelligent detection method is shown in Fig. 1.

Architecture of intelligent detection method for abnormal big data.
Satisfaction table for research object of heterogeneous networks
Through the above process, the architecture of intelligent detection method for abnormal big data is built to prepare for the intelligent detection of abnormal big data.
Based on the framework of intelligent detection of abnormal big data, TcpDump is used to collect and process traffic data of heterogeneous networks. The specific process is as follows.
In order to design a rigorous method for intelligent detection of abnormal big data, the preconditions for traffic data collection of heterogeneous network are first formulated. Assuming that the available resource node of heterogeneous networks can be represented by A, the size of heterogeneous networks is represented by the number of A, which is usually expressed by m. The heterogeneous network objects studied are represented by B, and the conditions that the heterogeneous network objects need to meet are shown in Table 1.
Under the precondition of heterogeneous network traffic data acquisition, TcpDump is used to collect different kinds of network traffic data.
Firstly, the TcpDump acquisition tool is introduced briefly. TcpDump collects traffic data in heterogeneous networks mainly through Linux. However, Linux is essentially a network server and exists in heterogeneous networks as a gateway. TcpDump acquisition tool is mainly used to intercept the data packets transmitted in the network. It can collect and filter ports, networks, hosts, network layer, protocol layer and so on. At the same time, it can filter the useless information, which are expressed in the data packet by !, && and other logical statements. The TcpDump acquisition tool can provide source code and make the interface public. It is convenient to use and has strong expansibility [2]. The interface schematic diagram of TcpDump acquisition tool is shown in Fig. 2.

Interface schematic of TcpDump acquisition tool.
The heterogeneous network traffic data collected by tcpdump collection tool is stored in the form of dump file. According to the distribution characteristics of heterogeneous network data and the principle of data demand, the parameter table of dump file is shown in Table 2.
The parameter table of dump file
Steps of reading dump file
Dump file uses binary format to store network traffic data. Therefore, it is necessary to use binary format to read the file. The steps of dump file reading are shown in Table 3.
The traffic data collection of heterogeneous network is completed by reading the above dump file. When the heterogeneous network has abnormal big data, the characteristics of heterogeneous network will change correspondingly, mainly in four aspects, as follows:
The first is that network traffic will increase dramatically [5]. Regardless of the rate of abnormal big data, the traffic will increase sharply in a specific interval. If the abnormal big data rate is low, the network traffic will increase sharply in a small time, but its average traffic is close to the normal data. Therefore, it is more difficult to detect abnormal big data with low rate.
The second is asymmetry [13]. When abnormal big data occurs in heterogeneous networks, the mapping relationship between source IP address and destination IP address becomes asymmetric, and multiple to one phenomena occur.
The third is that the source IP address is scattered [16]. In order to avoid network detection and tracking, abnormal big data will forge the source IP address, which will make the source IP address in heterogeneous networks more dispersed.
The fourth is that the destination IP address set is centralized [3]. Abnormal big data will result in the mapping relationship between source IP address and destination IP address becoming many-to-one, which will cause the destination IP address becoming more centralized.
According to the above description, it is necessary to simply process the collected traffic data of heterogeneous network. The network traffic data packet composed of the source IP address

Flow chart of traffic data processing of heterogeneous network.
Through the above process, the traffic data processing of heterogeneous network is completed, which greatly reduces the amount of data needed to be detected and narrows the scope of abnormal big data, thus improving the efficiency of abnormal big data detection, and preparing for intelligent detection of abnormal big data.
Based on the traffic data of heterogeneous network processed above, the bottleneck traffic is defined, and the relationship between the bottleneck traffic and abnormal big data is built to prepare for the intelligent detection of abnormal big data [18]. The specific process is shown below.
For heterogeneous networks, saturated connection means that the network traffic on the link exceeds the capacity. If this situation persists, it means that there is abnormal link, which is called bottleneck traffic. The cause of bottleneck traffic is the existence of abnormal big data. With the continuous development of network technology in China, bottleneck traffic has a huge restrictive effect on the development of network. Therefore, this paper mainly detects abnormal big data in the network through bottleneck traffic [19].
Usually, the widely used methods for bottleneck traffic detection are threshold method and principal component analysis method. After research, it is found that threshold method is more suitable for the detection of bottleneck traffic in heterogeneous networks, with high detection efficiency and accuracy. The process of establishing the relationship between bottleneck flow and abnormal big data by threshold method is shown in Fig. 4.

Flow chart of relationship building.
When abnormal big data occurs in heterogeneous networks, no matter how much abnormal data there are, a large number of connection queues will be established in the network, resulting in a large consumption of network resources and saturation of links. Therefore, we can express abnormal big data through bottleneck traffic data, and only the bottleneck traffic is needed to detect, so that it can achieve the purpose of abnormal big data detection [8].
Through the above process, the relationship between bottleneck flow and abnormal big data is built, which facilitates the intelligent detection of abnormal big data.

Flow chart of network traffic data partition.
Based on the relationship between bottleneck traffic and abnormal big data, Fourier transform method is used to convert network traffic data into frequency representation [11]. The specific process is shown below.
Assuming that the size of network traffic packet is the same, the packet sending time is
Where, T denotes the sending time of data packets;
Frequency is the reciprocal of period, so the formula for calculating frequency is expressed as
In heterogeneous networks, the normal size of data packets is the main component of network traffic. Assuming that the bottleneck traffic is caused by certain size of data packets, detecting the size of data packets can realize the detection of abnormal big data. However, the detection accuracy of this method is too low to meet the needs of today’s society [6]. Therefore, in order to increase the accuracy of abnormal big data detection, the network traffic data need to be partitioned first, and the specific process is shown in Fig. 5.
According to the above process, when the partition time is 1 s and the interval length is 30 s, the partition efficiency is the best.
Secondly, the Fourier transform method is used to convert the network traffic data into frequency representation, and the conversion formula is as follows:
Where,
Finally, the frequency information in the above conversion results is extracted and the result is obtained.
Where,
Through the above process, the frequency information of network traffic data is extracted, which provides accurate data support for intelligent detection of abnormal big data [7].
Intelligent detection of abnormal big data based on Bayesian classification
Based on the frequency information of network traffic data obtained above, the Bayesian network classification method is used to detect the abnormal big data of heterogeneous networks intelligently [15].
Bayesian network classification method mainly regards the logarithm of frequency information of network traffic data as random variable, and classifies Bayesian network based on it [12].
The detection rules of Bayesian network classification method are as follows: assuming that the contained class node of heterogeneous networks is C, whose set of values is
At the same time, the above probability must satisfy the following formula.
In order to simplify the computation of abnormal big data detection, i is set to 2, which means that the network data category is two types.
Where, the calculation formulas of
Where,
Where, and the calculation formula is:
Bayesian classification and detection method is mainly divided into two stages, namely training stage and detection stage. In training stage, the value of

Schematic diagram of the detection method based on Bayesian classification.

Flow chart of detection phase for Bayesian classification method.
In the training stage of Bayesian classification and detection method, the main purpose is to obtain the prior probability value of the relevant variables, which is divided into five steps. The specific steps are as follows:
Step 1: Two training sets are selected. The training set 1 contains no abnormal big data and the training set 2 contains abnormal big data.
Step 2: Frequency information is extracted from training set data according to formula (3) and formula (4).
Step 3: Taking the logarithm of the maximum frequency as a random variable, the logarithm of the maximum frequency of the training set is calculated and recorded accordingly.
Step 4: Perform steps 1 to 3 for other training sets and record the results accordingly until the training set is empty.
Step 5: The maximum frequency logarithm of two groups is calculated,obtain
In the detection stage of Bayesian classification method, the existence of abnormal big data in network traffic data is mainly detected by the prior probability value of relevant variables. It is mainly divided into four steps, the specific content of which is shown in Fig. 7.
Through the above process, the intelligent detection of abnormal big data in heterogeneous networks is realized, which provides more effective guarantee for the safe operation of heterogeneous networks.
In order to verify the overall performance of the intelligent detection method for abnormal big data in heterogeneous networks, a verification experiment is carried out. In order to fully prove the performance of the proposed method in detecting abnormal big data, the proposed method is compared with literature [9], literature [10] and literature [14] in terms of detection accuracy, detection convergence and anti-interference during detection. The original data source of the experiment is the sample data in the heterogeneous network of the Internet. The size of the sample data is 100 TB, the starting coordinate of the data detection is set to
Sample time needs to be determined before the experiment, which is directly related to the detection accuracy. The schematic diagram of the relationship between sample time and detection accuracy is shown in Fig. 8.

Diagram of the relationship between sample time and detection accuracy.
As shown in Fig. 8, when the sample time is 1.0 s, the detection accuracy is the highest. Therefore, the sample time is set to 1.0 s for experiment, and the detection effect is reflected by the detection accuracy, convergence and anti-interference. The detailed experimental results are analyzed as follows.
The comparison of detection accuracy is obtained by experiments as shown in Table 4.
Comparison of detection accuracy
Comparison of detection accuracy
As shown in Table 4, there is a significant gap between the proposed method and the existing three methods. In the 100th experiment, the detection accuracy of the proposed method is 90%, while that of reference [9], reference [10] and reference [14] is 59%, 59% and 45% respectively, which is far lower than that of the method in this paper, which fully shows that the method in this paper has high detection accuracy. Because this method uses Bayesian classification to classify the data in heterogeneous networks, which greatly improves the detection accuracy of abnormal data.
Convergence denotes the degree of data concentration in the detection method. Generally, the greater the convergence parameter is, the better the detection method is. Experiments are carried out in the case of network bandwidth of 100 bits and 200 bits respectively. The comparison of convergence parameters is obtained by experiments as shown in Fig. 9 and Fig. 10.

Comparison of convergence parameters in the case of network bandwidth of 100 bits.

Comparison of convergence parameters in the case of network bandwidth of 200 bits.
As shown in Fig. 9 and Fig. 10, under two different network bandwidth conditions, the convergence parameters of this method are much higher than those of the three literature comparison methods. Under the two bandwidth conditions, the maximum convergence parameters of this method are 9.2, while the maximum convergence parameters of reference [9], reference [10] and reference [14] are 7.9, 7 and 6, which are lower than that of this method.
Comparison of anti-interference parameters
Anti-interference refers to the degree to which the correct results are obtained by the detection method in the presence of interference. It is generally believed that the greater the anti-interference parameters are, the better the detection effect of the detection method is. The comparison of anti-interference parameters obtained by experiments is shown in Table 5.
As shown in the data in Table 5, there is a significant gap between the anti-interference parameters of the proposed method and the existing three methods. The anti-interference parameters of the proposed method are far higher than the existing three methods, and the maximum value can reach 9.23, while the maximum value of reference [9], reference [10], reference [14] is 7.12, 6.01, 6.59, which is far lower than the method in this paper. Therefore, this method has high anti-interference ability. Because in this paper, Fourier transform method is used to transform network traffic data into frequency representation, which greatly improves the anti-interference of data detection.
The experimental results show that the proposed intelligent detection method for abnormal big data in heterogeneous networks greatly improves the detection accuracy, convergence and anti-interference. It fully demonstrates that the proposed intelligent detection method for abnormal big data in heterogeneous networks has better detection effect.
Conclusions
Under the background of big data, the abnormal big data in heterogeneous networks continues to grow, and the detection accuracy of traditional methods of abnormal big data needs to be further improved. Therefore, this paper proposes an intelligent detection method of heterogeneous network abnormal big data based on Bayesian classification. The following conclusions are proved in theory and experiment. This method has high accuracy and anti-interference when detecting abnormal big data. Specifically, compared with the intrusion feature selection method, the detection accuracy is greatly improved, with the highest detection accuracy of 95%; compared with the multi evidence fusion decision-making method, the anti-interference performance is significantly improved, with the highest anti-interference coefficient of 9.23. Therefore, the proposed method based on Bayesian classification can better meet the needs of abnormal big data intelligent detection. In the future research work, we should further improve the detection accuracy of abnormal big data to improve the detection performance.
