Abstract
The 21st century is an era of rapid development of the Internet. Internet technology is widely used in various fields. With the rapid development of network, the importance of network information security is also highlighted. The traditional network information security technology has been difficult to ensure the security of network information. Therefore, we mainly study the application of machine learning feature extraction method in situational awareness system. A feature selection method based on machine learning is proposed to extract situational features.By analyzing whether the background of network information is safe or not, and according to the current research situation at home and abroad and the trend of Internet development, this paper tries out the practical application of machine learning feature extraction method in a certain perception system. Based on the above points, a selection method based on machine learning is proposed to extract situational features. The accuracy and timeliness of situational awareness system detection are seriously affected by the high dimension, noise and redundant features of massive network traffic data.Therefore, it is of great value to further study network intrusion detection technology on the basis of machine learning.
Keywords
Introduction
With the continuous progress of modern society and the rapid development of information technology, the rise of the Internet terminal era has also brought great convenience to people’s lives [1]. The global nature of extensive links is a major basic feature of Internet terminals [2, 3]. This extensive connection makes people play the role of information resources to the extreme. At the same time, it is precisely because of the wide application of this connection, which has led to the influx of many unsafe factors [4]. In the 2019 cyber security event, the world’s largest online social networking site has been viewed more than 9 million times, which poses a certain account risk to leaked users [5, 6]. A country’s power company is blackmailed by hacker viruses, databases, power system programs and so on are encrypted by hacker malicious viruses, so that the entire power system can not provide external services, resulting in the overall paralysis of the system. In summary, network information data security is a very broad problem, mainly including five categories, including confidentiality, integrity, controllability, availability and non-repudiation [7–11]. At the same time, the Internet terminal system is also a popular open system, its open characteristics and the security of network information data is both opposite and unified, that is, the more open the Internet, the greater its security risks [12]. Intentional attacks, malicious theft of confidential documents, random dissemination of computer viruses and illegal access to the most common network information system security risks, although the use of the Internet is increasingly common in people’s production and life, but there are also a large number of serious network system security problems.Its invasion attacks involve various fields such as finance, industry, transportation, energy, government agencies and so on, and the cyber attacks will become more and more serious and show an upward trend.
In order to solve the increasingly serious network security problems, domestic and foreign researchers put forward the security situation awareness technology [13–15]. Among them, network intrusion detection technology is to analyze network traffic information, detect whether there is abnormal data information in the network, and identify and analyze abnormal data information; secure encryption technology is to encrypt information data or file data. Process plaintext information into secret text [16]. Firewall technology is mainly to protect the security of the internal network of the computer host, so that the computer is not attacked by the external malicious virus program. Network security situational awareness is a modern intelligent network security monitoring technology, situational awareness technology can comprehensively detect the current network security situation. By understanding and extracting the situation characteristics in the multi-source data information, the situation elements are input into the machine learning model, and the security situation of the current network is evaluated by the method of model training. Situational awareness technology can not only reflect the current network security situation in real time, but also combine historical data information to predict the network security situation in the next stage. And provide a visual interface for network administrators to fully understand the current network environment, in order to formulate timely preventive measures [17–19].
Related work
Based on a series of knowledge and technologies, such as malicious intrusion detection and computer terminal system log, the paper analyzes the customer’s needs and communicates with customers effectively. Finally, the design idea of network information security monitoring system is given [20]. This paper mainly introduces the application of machine learning feature extraction method in situational awareness system and the hybrid model to extract the features of network traffic data set. By introducing the research background and significance of network intrusion detection terminal system, this paper briefly describes the current research situation at home and abroad, summarizes the theoretical knowledge of intrusion detection and machine learning, summarizes the common intrusion detection techniques, and analyzes the commonly used machine learning algorithms for intrusion detection [21, 22]. A new intrusion detection method based on attention mechanism network intrusion detection is proposed.Through the introduction of a PAB-based security detection method for an improved BI neural network information terminal system, the existing abnormal data can be analyzed and improved. This PAB-based improved BI neural network information terminal system security detection method in information security defense can quickly and efficiently solve the traditional abnormal data recognition method can not be proactive defense defects.This paper presents a Web attack detection model based on hidden Markov machine learning, which is based on the characteristics of Web attacks. The test results show that this method improves the recognition speed and accuracy of attack behavior compared with the current mainstream method.Intrusion detection system is introduced in detail.The intrusion system is mainly aimed at the host and network for a long time monitoring, on the basis of which to determine whether there are suspicious events, and can quickly respond to the discovery of suspicious events.From the perspective of architecture, the terminal system model of mobile banking software is designed on the basis of ensuring security and reliability, which effectively protects the security of information inside the network [23, 24]. Through SAL protocol, the security of data encryption and verification is guaranteed, and the precautions taken by middlemen in the process of attack and collision attack are analyzed in detail, and the two-way authentication of SAL and RAD is further realized. [Ref.9] Taking the practical features of the tax system as the starting point, the author combines the practical operation of the information technology work of the tax terminal system, considering the network transmission and the application of the infrastructure to the terminal, the protection of the external boundary should be done well, and the supervision and control of the information should also be done internally. Through the deep research of firewall technology, information detection technology, system vulnerability detection and repair technology, a set of network terminal security system which accords with the characteristics of the present stage is designed.
Research on information security technology of computer network
Research status and trends
Through the study of China’s Internet Center in June 2019,” China’s Internet Development Statistics “can be seen that the overall number of Internet users in China is accelerating growth trend, as shown in Fig. 1. Nowadays, the era of information network diversification, the continuous development of network technology has changed the original traditional mode of production and management, has become a new round of economic growth point, and plays an irreplaceable role in politics, economy, transportation, commerce, telecommunications and culture and education. The dependence of society on network information terminal system is increasing, and the problem of network information security has become the focus of attention. As we all know, in order to develop more effectively and efficiently, the first task is to ensure the security of network information system, which is also an important guarantee against information hegemonism. At the same time, the ability of network information security is also an important embodiment of the national comprehensive national strength, and it is also the highest point that countries in the modern world strive to climb.

Size of internet users and internet penetration.
The Security Connotation of the Information Terminal System Since the H.Eiffie and M.Hellan of the century, the public key cryptosystem and David, E.Hell, have been put forward Record put forward the computer terminal secrecy model, be continuously extended and expanded. It has gradually developed from the initial information confidentiality to the controllability, availability and non-arbitrability of information, Now it develops from information confidentiality to information integrity, availability, controllability and non-repudiation, And then developed into “attack (attack), prevention (prevention), measurement (detection), evaluation (evaluation), control (control), management (management)” and other aspects of the theoretical basis and practical technology. Hacker prevention system, information camouflage technology, information monitoring and control, intrusion detection principle and technology, emergency response system, computer virus, artificial immune system in anti-virus and anti-invasion system have become one of the international research hotspots.At the end of the 20th century, in order to meet the needs of the secrecy of military computers in the country, the British Ministry of Defense took the lead in formulating the “computer system security evaluation criteria (CSEC)”, which not only explained the network terminal system and computer database, but also formed a set of” rainbow series “specifications.
In order to better solve the current demand problem, the main research contents and application technology hot spots of network information security mainly include the following aspects: Network security defence technology and antivirus software Key Technologies of Information Content Supervision Based on Network Media Emerging Cryptography and Storage Technologies Security Performance Test and Evaluation Technology of Network Information Terminal System Identity authentication and identification techniques based on mathematical and non-mathematical features Security Technologies for Wireless Communication Networks Research on Information Security Theory and Model of Next Generation Internet
At present, the network information security technology develops rapidly, the product is numerous, mainly has four big classification namely the blocking leakage type protection product, the stratification type protection product, the active type protection product, the self-control type protection product. Comparison with basic requirements for information security level protection as show in Table 1.
Comparison with basic requirements for information security level protection
Through the continuous deepening and research on the application of network information security, it is found that simple security performance products can no longer meet the needs of users. At this stage, customers not only require products to be safe and reliable, but also require good integration between products. If the cooperation and communication between various security products can be realized, the network protection system of users can also become more solid, thus further ensuring the diversity, aggregation and interaction of network security technology, and finally protecting the computer more perfect.
Network security potential sensing and technical summary
In recent years, with the rise of traditional artificial intelligence technology, machine learning algorithm has been favored by many researchers in intrusion detection system, and its application scene has been widely concerned by various industries. At first, the research topic of situational awareness came from the cognitive research field of aerospace, and then foreign researchers Endsley put forward the meaning of situational awareness for the first time in 1988.
Situation perception refers to the current network environment and the changing trend of the network environment in the next time point. situational awareness is not clearly defined in existing academic discussions. The concept of situational awareness (Situational Awareness,SA) was first proposed by foreign scholars in 1988. Endsley defined situational awareness as “perceiving the surrounding environmental factors in a certain time and space, understanding their meanings, and using these factors to predict the situation in the future “. Endsley generalization has become the main cognitive model of situational awareness.
Cybersecurity Situation Awareness, of situational awareness based on three elements of situational awareness CSA) This concept was first put forward by Tim Bass at a given annual meeting based on the theoretical basis of security posture. Tim Bass pointed out that the next generation of network intrusion detection systems should be integrated from a large collection of data, And classify and filter the data, Select the appropriate value range. And then realize the situation perception of cyberspace (Cyberspace Situational Awareness)”. The main function of network security situational awareness technology is to monitor and prevent the security of current network environment in real time. It is mainly based on the security system to monitor network traffic in a specific network environment, Extracting key feature factors from huge network traffic that can affect situation change, Using data fusion technology to fuse features, And extract the important features of the fused data to build a situation prediction model, Finally, a visual network security real-time monitoring interface diagram is formed. The network security situational awareness system not only needs to monitor and predict the current network security situation in real time, Real-time and accurate response to network monitors, At the same time, monitoring the current network data flow, Using data mining, data fusion and other technologies to extract real-time traffic characteristics, Continuous re-evaluation and modeling, Data visualization technology is used to monitor network traffic and present it to network security monitors. Through network security situational awareness, Not only can the security technology monitor reflect the current network environment security situation in real time, You can also see which host ports and devices are vulnerable to attack, and which packets IP sources compare suspicious and attacker IP details. This allows network security monitors to understand the state of the entire network, Make timely decisions to reduce network risk.
Network security situation perception system model
The network security situational awareness system is a comprehensive network security analysis system composed of network firewall, network traffic security monitoring system, security audit system and so on. The system mainly extracts some key features from the current network environment to form an effective evaluation model, and analyzes the security status of the current network environment. Figure 2-2 briefly summarizes the conceptual model of the whole operation process of the network security situation awareness system, and only covers the main components of the security situation awareness system. The most important technical means of network security situational awareness is situational feature extraction, because the extraction of key features will directly affect the functional performance of the next three stages. In this stage, the existing data mining and feature extraction techniques are used to extract the key features of network traffic that have an impact on the situation.

Stages of situational awareness.
Through the concept model of security situational awareness, we know that the core of situational awareness is to fuse, analyze, extract, model and evaluate the security traffic data obtained by vulnerability scanning system, traffic security monitoring system, security audit system and so on. Finally, the security status of the current network is monitored in real time through the model. Through the analysis of the conceptual model of network security situational awareness, the general structure system of security situational awareness is established as shown in Fig. 3. The overall structure of the system is composed of multi-source data module, feature extraction module, event association module, security assessment module, topology module, and early warning module. Each module in the system cooperates with each other, completes the function in this module, divides the work to run, forms a huge comprehensive state security potential sensing system.

Total architecture model of network security potential sensing system.
Intrusion detection concept
Intrusion detection is to detect the computer terminal system to find out whether there is any behavior that attempts to destroy the confidentiality and integrity of the whole system, and to analyze and process the collected information through many key nodes in the computer network. Check for violations of security decisions or other signs of attack. Its working principle is mainly on the basis of network sharing to detect and analyze the original transmission data, and take intrusion warning for the final matching intrusion behavior characteristics and abnormal behavior, and record it on record. Information statistics of some websites in recent years as show in Table 2.
Information statistics of some websites in recent years
Information statistics of some websites in recent years
By using the integrity of the computer network and the perfection of the internal network system, the intrusion detection system detects whether there is an illegal attack on the information system in a broader sense. These include the detection of outside intruders’ illegal temptation and malicious attacks, and the illegal use of internal users. As an important part of the network terminal system, the main functions of intrusion detection system include monitoring and analyzing the security configuration of the active system within the user system, evaluating the main resource data of the vulnerability system, and further identifying the known attack behavior path, and finally identifying the user activities that violate the security policy. It is not only a supplement to the old security products, but also helps the system managers to improve their own security monitoring and identification response ability, and has achieved the purpose of improving the integrity of information security infrastructure.
Figure 4 is a schematic diagram of the whole framework of intrusion detection system, which mainly includes five aspects: knowledge base, data screening, data analysis and preprocessing, intrusion object source detection and analysis, and timely and effective response processing.

A schematic diagram of the overall framework of the intrusion detection system.
Because of the continuous development of information technology, the attack technology adopted by hackers is becoming more and more complex, and there are many different kinds of intrusion system monitoring. There are different classification methods for the classification of intrusion system monitoring types. Different types have different types of intrusion detection system characteristics.
Intrusion system monitoring and evaluation
Intrusion detection is essentially a classification problem, However, for the classification problem, there are the following basic evaluation indicators: true or false positive (True or False Positive,), and T F P, true or false negative (True or False Negative,); and T F N, false positive (False Positive,) FP) and false negative (False Negative,); and FN). And for intrusion monitoring, Their specific meanings are as follows: TP: intrusion system monitoring accurately detects differentiated attacks. TN: intrusion system monitoring does not misidentify benign traffic as an attack. FP: intrusion system monitoring incorrectly judges benign traffic as an attack. FN: intrusion system monitoring failed to detect intrusion after a particular attack.
Based on the different combinations and calculation methods of the above four basic evaluation indexes, the calculation method is shown in formula 1. As can be seen from the above, the FP in the formula indicates that the benign traffic is wrongly judged as the number of intrusions. TP represents the number of real intrusions correctly detected. The accuracy is very intuitive to reflect the detection performance of intrusion detection system. The higher the value, the more accurate the intrusion detected by the system, the more reliable the result is.
Recall rate, that is, the correct percentage of intrusion to all intrusion flows, is calculated as shown in formula 1-2. As can be seen above, the formula FN indicate the number of intrusions that the intrusion detection system fails to detect accurately after a specific attack. TP represents the number of real intrusions correctly detected. The higher the recall, the more comprehensive the system can detect the corresponding intrusion behavior and the more perfect the performance. Types of different intrusion detection as show in Table 3.
Types of different intrusion detection
In real life, the accuracy and recall rate often appear contradictory situation. In order to ensure high precision, the recall rate will be reduced, otherwise, the high recall rate will also cause the precision to be reduced. As a result, we need to consider them synthetically, and the concept of F value is introduced, and the precision and recall rate are weighted and averaged.Its calculation method is similar to formula 3.
Among them, the accuracy and recall of the β are usually set to 1.
the above three assessment criteria favor a separate analysis of each attack. In order to evaluate intrusion detection systems completely, we usually need to compare various intrusion detection systems more comprehensively. Therefore, we introduce the key evaluation index of accuracy, which is shown in formula 4. Referring to the above, it represents the ratio of all correctly judged flow numbers to total flow numbers, which is used to determine the detection effect of intrusion detection system macroscopically.
Overview of machine learning
Machine learning (Machine Learning) is based on the big data era and the ability of the system to simulate the process of human learning, through continuous updating and improvement to make a clear judgment. At present, machine learning is mainly divided into two forms: supervised learning and unsupervised learning.
(1) Supervised learning is an artificially marked goal, that is, the initial training set is labeled, and the label is used as the expected result to improve the accuracy of the prediction results by constantly perfecting and correcting the machine.
(2) The main idea of unsupervised learning is to use some similarity concepts to classify data without training. That is, its learning data is unmarked, the algorithm is mainly used for dimensionality reduction and clustering. Machine learning process as show in Fig. 5. Example diagram of the decision tree classification process as show in Fig. 6.

Machine learning process.

Example diagram of the decision tree classification process.
KNN algorithm
KNN (K-Nearest Neighbor,) is actually a very simple classification algorithm, its algorithm principle and process are very simple: in a given sample model, K nearest sample in the training sample is further calculated and screened by some distance interval difference measurement method. In general, the KNN algorithm uses the “voting method” to predict, and selects the most frequently marked data in K training samples as the final prediction result. besides the “voting screening method “, the” average method “is also a commonly used prediction algorithm in regression tasks, that is, the interval average of the output values of K training samples is taken as the final prediction result. Selective voting based on distance weighted average difference or weighted interval is also a prediction method. a sample is assumed to exist, where in KNN euclidean distance is usually used as a distance measure to measure the similarity between two vectors (points):
For K nearest neighbor classification, if the best classification effect is required, the best distribution of training samples is super spherical or elliptical. If the distribution edge of the sample is nonlinear, it is very likely to reduce the effect of classification. The classification effect can be improved by mapping the complex linear inseparable samples of low dimensional feature space to the difference space of high dimension.
Naive Bayesian classification (Navies Bayes, NB) has become one of the classical models in the field of classification because of its simplicity and high computational performance.Based on the full understanding and skillful application of Bes theorem, it is assumed that the influence of each characteristic parameter on a given type is independent of each other, and the classification results are recognized by known prior probability and conditional probability. Suppose there are samples where a belong to each feature of the x. And the features are independent of each other, and there is a set of categories. According to Bayesian theorem, the probability of each feature corresponding to the class can be calculated.
Because each sample feature is independent of each other, the probability of calculating the sample belonging to the class is: y
i
According to the above designed algorithm, the probability interval of each feature belongs to each category is calculated, in which the largest difference of probability interval is the category mode, The formula is as follows:
Each classification method has a very deep mathematical formula and statistical principle to support, the overall goal of the classification method is to minimize the risk of classification errors, which are also widely used two kinds of classifiers. This method can not only be applied to the case of linear separability and inseparability of eigenvectors, but also to bring the input eigenvectors into another higher dimensional vector space by nonlinear transformation. Finally, the eigenvectors are linearly separable in this spatial dimension.
If the interval of sample set is the dimension of sample space difference, the class label, the linear discriminant function and the classification surface equation in d dimension space are as follows
According to the geometric principle, we know that the distance interval between point and straight line is equivalent to that between two kinds of feature vectors. We know that the risk of classification error is inversely proportional to the distance interval between classification interval. Therefore, the objective of the solution should be minimized, and the objective function is:
By further optimizing the solution of the problem, we can obtain the following classification functions:
sign () xi ai xi Among them, it is a symbolic interval function, a support vector specific difference, a corresponding Lagrangian coefficient, a sample, and a threshold of classification. x b C δi For many cases where the vector interval can not be linearly separable, SVM add two parameter curves to the calculation formula: the relaxation variable curve and the penalty factor curve, respectively. Under the condition constraint of (13) formula, the minimum plane difference of the (14) formula can be obtained in the case of linear inseparability:
Through the continuous exploration of the average difference of multi-dimensional vector space of training features, support vector machines need to use point product algorithm to filter linear separability and linear inseparability. In order to reduce the calculation, we will use kernel function instead of the previous operation. The commonly used kernel functions are linear kernel function, quadratic kernel function, polynomial kernel function, radial basis kernel function and so on. The formulas of each kernel function are as follows: Linear kernel function
(2) Quadratic kernel function
|x-y| Where, is the distance between two vectors, is constant δ.
Although SVM is very effective in the application of two kinds of linear separable problems, it is not suitable for this multi-classification problem.
Decision tree is the most classical supervised learning classification algorithm in machine learning algorithm. Because it is easy to understand and the classification effect is good, it is commonly used for data classification in various fields. Decision tree algorithm as show in Table 4.
Decision tree algorithm
Decision tree algorithm
Decision tree is a classification model, which takes tree structure as the standard, and combines root node, intermediate node (usually also called decision node) and leaf node to form decision tree.Usually there is only one root node distribution value in a decision tree, but it can contain many intermediate nodes and leaf nodes. Each leaf node represents the uniform difference of a classification result, and each intermediate node represents a decision classification. As shown below:
Indicators of classification algorithms (a) Accuracy and coverage
Coverage refers to the proportion of samples identified to the total sample.The accuracy rate refers to the proportion of the correct number of samples to the total sample. His formula is as follows, assuming that the y is the number of real categories. y’ is the predicted value of the i sample:
(b) Recall and accuracy rates The recall rate is used to measure the ratio of the correct number of positive samples to the total positive samples. The accuracy is to measure the proportion of correct samples to all positive samples. Suppose the A is a positive sample set and the B is a negative sample set. The recall and accuracy rates are calculated as follows:
(c) AUC and ROC curves ROC curve is used to expose the comprehensive problems of the two kinds of classification, which indicates that the positive samples are correctly screened by the classifier and the proportion of the maximum difference between the classification and the number of the total positive samples. ROC the larger the area under the curve, the better the classification effect. AUC refers to the area under the ROC curve. That is to say, the AUC value is the embodiment of the area under the ROC curve. (d) Credibility and support Support is a measure of the rate at which two types of events occur at the same time. If the lead rate is large, the correlation between the two events is strong. Conversely, relevance is weak.Difference interval is a measure of the occurrence A events. Event B the probability of occurrence. The high confidence difference indicates that the probability of occurrence A the event is closely related to the occurrence B the event, otherwise, the relationship between the two is not significant.
(e) e F1-Scor
F1-Score represents the average of the proportional interval between recall and accuracy. Tend to call for a smaller relationship between abundance and accuracy.
(2) Indicators of the regression algorithm. (a) Expectations between the market value parameter estimate of the mean square error interval and the square rating of the difference between the true value. It reads as follows:
(b) Average absolute error If the average absolute error (Oaltesn Abrolate Eror,MAE) is 0, the accuracy of the calculation results can be proved more effectively.
Nowadays, the network security problems faced by computer network terminal system are becoming more and more obvious, people lack the protection consciousness of network security, the ability of monitoring, warning and handling security risks is not high, and the cooperative prevention mechanism across layers is not perfect enough. Therefore, the management information system is easy to become the attack target of the network hacker, the security protection inside the computer system focuses on the defense, but neglects the light depth defense, thus causes the computer inside to have the infection virus and the Trojan horse security risk. There are different security areas in the calculation security management system, and there are a lot of hidden dangers inside it. The cross-use of mobile storage media leads to a wide variety of internal application systems in computers, and the problem of inadequate security supervision in the development of software application systems. And the relevant departments can not quickly and effectively eliminate the new system itself security weaknesses, which increase the possibility of operating system security risks. In the maintenance of network information, the relevant staff should consider the problems from the application and terminal, not only to protect the outside world, but also to control the internal information. At the same time, we make full use of the existing platform technology, strengthen the firewall vulnerability detection and repair and information detection and other related technical capabilities, and design a set of network system security management system suitable for the current situation. At the same time, in view of the development of the new mobile Internet office industry, the relevant staff should put forward the development requirements from the aspects of security performance and other aspects, and discuss the components security, W indows security and payment security performance, so as to form a more comprehensive and effective solution. Build a computer network security for the information of the Great Wall of Steel. Fully guarantee the information security of state organs and people, and guarantee the property of the state and people.
Summary
With the popularity of computer and Internet applications, the global network users are also growing. Network security problems are also emerging day by day. In the face of diverse, complex and distributed network attacks, an effective defense technology is urgently needed. In view of this, by analyzing the current network environment, this paper first introduces the architecture model and related technical principles of the security situation awareness system, studies the existing intrusion detection technology, and introduces the machine learning algorithm into the field of intrusion detection. In the future, this technology will provide a more novel and effective method for the maintenance of computer network information seat belt.
