Abstract
In order to improve the energy consumption balance between wireless sensor nodes and reduce the energy consumption of nodes in the process of data fusion, a machine learning based data fusion method for wireless sensor networks is proposed. Through the establishment and training of wireless sensor network model, the compressed sensing method is used to collect wireless sensor network data, and the multi-dimensional de aggregation class analysis algorithm is used to de duplicate the collected data. Using the spatial correlation between the data collected by multiple sensor nodes, the DCS method is used to process the abnormal data of WSN network. In order to eliminate the influence of measurement error on the fusion accuracy, the WSN network data is preliminarily fused by combining the adaptive theory with the batch estimation fusion algorithm. Based on the preliminary fusion results of WSN network data, the Bayesian inference method in machine learning algorithm is used to further fuse WSN network data. The experimental results show that the number of surviving nodes is large and the energy consumption is low when using this method for data fusion. The energy consumption between wireless sensor nodes has a certain balance, which proves that this method has a good data fusion effect.
Keywords
Introduction
Wireless sensor network (WSN) has the characteristics of small node size, low cost, multi-hop ad hoc network and large sensing area [1]. WSN can effectively monitor and warn environmental information, and has been widely used in intelligent agriculture, environmental monitoring and national defense and military fields. WSN is a resource-constrained network, and its performance is restricted by factors such as node energy, computing power, and storage space [2]. Usually, a great amount of homogeneous sensor nodes is arranged in the sensing area to collect and transmit information periodically. WSN produces a large amount of repetitive redundant data while monitoring information. On the one hand, processing and transmitting redundant data wastes limited energy and network bandwidth. On the other hand, interference factors and sensor measurement accuracy lead to errors in the monitoring results of the system, and the random fault nodes also reduce the reliability of the system to a certain extent. How to reduce redundant data, reduce node energy consumption, improve network reliability and prolong its effective lifetime has become a very important issue in WSN research [3]. Multisensor data fusion technology fuses data with certain redundancy, which can effectively reduce data transmission, reduce node energy consumption and improve the accuracy of monitoring results. Therefore, WSN data fusion technology needs to be deeply studied [4].
To make the utmost of heterogeneous data on edge devices and solve the problem of “data communication barrier” caused by data privacy in edge computing, Mo et al. [5] proposed a multi-source heterogeneous data (MHD) fusion algorithm based on Tucker decomposition in federated learning. Aiming at the fusion of heterogeneous data without interaction, the algorithm introduces tensor Tucker decomposition theory, and constructs a high-order tensor with heterogeneous spatial dimension characteristics to capture the high-dimensional characteristics of heterogeneous data, so as to realize the fusion of MHD in federated learning. Finally, the effectiveness of the algorithm is verified on MOSI data set, but the algorithm has the problem of less survival number of wireless sensor network nodes. Li et al. [6] proposed a multi-attribute fusion method for heterogeneous data based on intuitionistic fuzzy sets. First, a new discount operator and combination rule for intuitionistic fuzzy numbers are defined, and the basic properties of the operator and combination rule are proved. Secondly, the distance measure between heterogeneous data is defined, and the coefficient matrix for decision-making is obtained based on the gray correlation method. Through the construction process of the alternative evidence, the intuitionistic fuzzy number is generated, and based on the intuitionistic fuzzy entropy, the attribute weight is given. calculation method. Finally, according to the intuitionistic fuzzy number synthesis rule and the comparison rule, the result judgment is realized. It was found that such method can effectively reduce the computing time and improve the efficiency of data fusion; however, in the process of data fusion, there will be a problem of large node energy consumption. To realize multi-source data fusion and ensure the privacy and security of fused data, Long et al. [7] proposed a data fusion method based on privacy protection in population intelligent sensing network. Users sign their perceptual data with a key and add noise to the data through differential privacy. The ECS combines the BGN encryption system and Shamir secret sharing to fuse the encrypted disturbed data and send it to the perceptual platform. The results of security and privacy analysis show that this method can resist the differential attacks from ECs and malicious users and ensure the security of users and ECs. However, the energy consumption between wireless sensor nodes is lack of balance, which leads to less survival of wireless sensor network nodes, and affects the effect of data fusion.
In order to solve the above problems, this paper proposes a WSN data fusion method based on machine learning. The following is the core line of this paper:
The machine learning algorithm is used to establish and train the wireless sensor network model, the compressed sensing method is used to collect the wireless sensor network data, the multidimensional de aggregation analysis algorithm is used to delete the duplicate data, and the abnormal data of the wireless sensor network is processed. The weighted data is preliminarily fused by adaptive estimation, and then the Bayesian inference method is used to further fuse the WSN network data to enhance the effect of data fusion. Taking the number of surviving nodes, node energy consumption and energy consumption balance as test indicators, the effectiveness of the design method is verified by experiments, which proves that the design method in this paper has good application performance in data fusion.
Wireless sensor network model construction and training
A WSN model is constructed, as shown in Fig. 1, which is composed of monitoring center PC, gateway and multiple wireless sensor nodes. The monitoring system takes the first-order radio model as the energy consumption model of WSN, and adopts the energy balanced adaptive clustering algorithm (EBACA) to divide the network into several clusters; each cluster contains a cluster head and several sensor nodes [8]. Sensor nodes are responsible for data collection, detection, transmission, and monitoring of node status and external events; the cluster head node collects sensor node data in the cluster, runs the fusion algorithm, and uploads the fusion results to the gateway; the gateway forwards the fusion results of each cluster to the Monitoring Center PC. If the user issues a data query command, the gateway and the cluster head receive and forward the query command to each sensor node in turn. The wireless sensor network model constructed in this paper is shown in Fig. 1.
WSN model.
For the WSN model shown in Fig. 1, it is trained. Since the deep autoencoder model and the WSN model have similar structures, they can be combined organically. First, the deep autoencoder is trained at the sink node. The training algorithm is as follows:
The aggregation node uses the data in the sample library as the original input data; Construct a deep autoencoder network, take the original data as the initial input data, and train the network weights in turn; Use the greedy algorithm to train the weights to obtain the input of the Softmax classifier, and use the label samples to train the classifier supervised; Use the weight combination as the initial weight to fine-tune the overall network; Use the label samples to test the network, and the classification accuracy meets the requirements, and then go to Step 6; otherwise, after readjusting the parameters, go to Step 2; The sink node sends the trained deep autoencoder network parameters to the corresponding node.
After the training is completed and the network parameters are obtained, the deep autoencoder model is layered and deployed in the corresponding layer of the WSN, thereby realizing the training of the WSN model.
In practical applications, considering that WSN consists of a great quantity of sensor nodes, an “information-aware cluster” can be formed through network protocols, and the cluster head distributes the sequence contained in the
Schematic diagram of WSN data collection.
As can be seen from Fig. 2, the WSN data collection method based on compressed sensing is based on a regionalized form [9]. The basic idea is to perform regionalized compressed sensing in different regions to disperse the load in the central region and save the energy consumption of the entire network transmission. For example, suppose a wireless sensor network consists of 3 areas
When collecting WSN data, the conventional node
A region measurement matrix
In the formula,
Obviously,
According to the above process, the number of data transmissions in the area is obtained as:
In the formula,
According to Eq. (2), the data transmission times of the entire WSN network are:
In the formula,
Through above analysis, the compressed sensing method combines the advantages of the direct transmission method, so that the transmission energy consumption in each individual area is lower than the energy consumption when the CS method is used alone. In other words, the compressed sensing method reduces the transmission energy consumption of the border nodes and makes up for the deficiency of the CS method. It should be noted that the boundary node in compressed sensing refers to the boundary node in the area. Because it expands the connotation of the boundary node, the WSN network will not form a large central area during the data transmission process, which makes its performance outperforms the CS method on the whole.
Data fusion refers to the distributed aggregation and fusion processing of data in the process of data transmission, removing redundant information, and combining into more effective, concise and accurate data. Therefore, on the basis of data collection, a multi-dimensional deduplication clustering analysis algorithm is used to deduplicate the data in the WSN network. The multi-dimensional deduplication clustering analysis algorithm studied in this paper uses the Bayesian network model to reveal the latent structure, and establishes the logical relationship between explicit and latent variables. The multi-dimensional deduplication clustering analysis algorithm can analyze and process unstructured data according to the relationship between probabilities [10]. The processing steps are as follows:
Data preprocessing. Data noise is reduced by data cleaning, thereby solving the problem of data loss. In data processing, discretization is performed first and then data conversion is performed; Training set. Use probability evaluation to preprocess the data, divide the data set obtained after processing into different sets, then use the classification algorithm to build a classifier, use the classifier to evaluate the prepared test set, and continue to use it; Vectorization. When combining consecutive sequences into a sequence set, the sequence must be converted into a format that can be recognized by the computer, and quantify it, and use the processed sequence as a feature vector to realize final data processing; Multidimensional clustering. Sequence features will cause a certain sparsity in the feature space of sequence vectors. A feature selection method can be used to reduce the dimension to improve the processing efficiency of the classifier. The data deduplication processing steps of the multi-dimensional deduplication clustering algorithm analysis model are shown in Fig. 3.
WSN data deduplication steps.
Using the spatial correlation between the data collected by multiple sensor nodes, through the joint sparse reconstruction of multiple sets of measurement values, the amount of data required for data fusion can be further reduced on the premise of ensuring the fusion’s performance. The following introduces the abnormal data processing method of wireless sensor network based on DCS.
For the first type of abnormal data model, the signal contains the same normal data component and different abnormal data components, and the raw data
Since the normal data components in the data recorded by each sensor are exactly the same, namely
In the formula,
In the formula,
The raw high-dimensional data recorded by multiple sensors is compressed and measured using a Gaussian random measurement matrix to obtain multiple sets of compressed measurement values. This process can be expressed as:
In the formula,
For the second type of abnormal data model, the signal contains normal data components and abnormal data components, wherein the normal data components are further divided into the same part and different parts. Similarly, a sparse representation of the original data
Of which:
In the formula,
Similarly, the Gaussian random measurement matrix is used to compress the original high-dimensional data recorded by multiple sensors to obtain multiple sets of compressed measurement values. The process can be expressed as:
Equation (12) is the DCS processing method of the second type of abnormal data model.
Combining Eqs (8) and (12) can realize the effective processing of abnormal data in WSN network, improve the accuracy of WSN network data fusion, and reduce energy consumption.
Adaptive estimation weighted fusion algorithm
To eliminate the impact of measurement error on fusion accuracy, the adaptive theory and batch estimation fusion algorithm are combined to calculate the relative variance of node data and batch estimation fusion results, adjust the weight of node data according to the relative variance, and carry out adaptive estimation weighted data fusion [11]. Figure 4 shows the flow of fusion algorithm.
Adaptive estimation weighted fusion algorithm flow.
According to Fig. 4, the algorithm steps are briefly described as:
Group the nodes, calculate the mean Calculate the relative variance of node data Perform secondary weighted fusion on the data with the revised weights
It can be seen from the multi-sensor weighted data fusion algorithm that in order to eliminate measurement errors, the fusion weights should be inversely proportional to the estimated variance, and the sum of the weights is always 1. Therefore,
Based on the above analysis, the correction factor
According to the adjusted weights
According to the adaptive estimation weighted fusion algorithm, the initial fusion of WSN network data is realized. To facilitate data fusion, the Bayesian inference algorithm in the machine learning algorithm is used to further fuse the WSN network data.
Based on the preliminary fusion and combination of the WSN network data obtained in Section 3.1, the Bayesian inference method is used to further fuse the WSN network data. The Bayesian method is an inductive reasoning method based on probability and statistics, and its conclusions depend not only on present measurements, but also on past experience and knowledge [12]. The basic principle is that given a previous prior probability, when there is a new measurement (sample information), the assumed prior probability is updated to the posterior probability [13].
Bayes’ theorem usually considers the conditional probability
In order to apply Bayes’ theorem for reasoning, the underlying probability needs to be ignored, that is, the event set is not considered, but the assumed probability of an event
The main idea of designing a multi-sensor data fusion method based on Bayesian reasoning is to set
Finally, the Bayesian formula is used to calculate the posterior density of various hypotheses
Then the maximum a posteriori density
To sum up, the data fusion algorithm based on Bayesian reasoning can infer a result according to the judgment obtained by each sensor according to the prior probability of different attributes of the observation object, and continuously update the prior probability according to the actual measurement information to put forward the reliability of reasoning [16].
The data fusion method of WSN based on machine learning is simulated through MATLAB simulation platform. The such method is compared with the MHD fusion algorithm based on Tucker decomposition and the data fusion method based on privacy protection. The evaluation of data fusion algorithm needs to be carried out from the aspects of node survival number, node energy consumption and energy consumption balance.
Setting of experimental environment
The simulation environment uses NSZ, the MAC protocol is 802.15.4, the routing protocol is DSDV, and the application layer protocol is ping. The nodes are evenly distributed in a 200 m
Experimental results
Comparison of the number of surviving nodes
Taking the number of surviving nodes as the experimental index, the application effects of MHD fusion algorithm based on Tucker decomposition, data fusion method based on privacy protection and this method are compared. Figure 5 reflects the comparison of the number of surviving nodes.
Comparison of the number of surviving nodes.
As can be seen from Fig. 5, the number of nodes in the three methods decreases with the increase of rounds. The number of nodes of the MHD fusion algorithm based on Tucker decomposition and the data fusion method based on privacy protection attenuated to 0 in the 275th and 300th rounds, respectively. At the end of the experiment, that is, the number of nodes in this method decays to 0 in the 500th round. In the process of overall data fusion, the number of surviving nodes is much more than the other two methods, which can improve the effect of data fusion.
A simple energy consumption model is adopted in the WSN, ignoring the energy consumed by nodes in the process of calculation and storage, and only calculating the communication energy consumption of nodes, that is, the energy consumption of nodes transmitting data in the network with the communication time. The transmission distance between the source node and the sink node in the free communication space is used to judge the trend of energy consumption by different methods under the same conditions. Assuming that the distance traveled by l bit data transmission is d (20
In the formula,
Change trend of node energy consumption.
It can be seen from Fig. 6 that the node energy consumption of the three methods increases with the increase of time. The node energy consumption of the MHD fusion algorithm based on Tucker decomposition and the data fusion method based on privacy protection is always higher than 0.2 J, while the node energy consumption of the method proposed in this paper is always not higher than 0.25 J, which shows that the node energy consumption of wireless sensor network data fusion using this method is low and the data fusion effect is good.
This is because this method uses the compressed sensing method to collect WSN data. The compressed sensing method fully considers the advantages of the direct transmission method, making the transmission energy consumption of each region lower than that of the CS method alone, that is, the compressed sensing method reduces the transmission energy consumption of the boundary nodes.
Taking the energy consumption balance as the experimental index, the application effects of MHD fusion algorithm based on Tucker decomposition, data fusion method based on privacy protection and this method are compared. Figure 7 reflects the comparison of energy consumption balance.
Comparison of energy consumption balance.
In Fig. 7, the smaller the coefficient of variation, the more balanced the energy consumption between wireless sensor nodes. It can be seen from the results in Fig. 7 that the coefficient of variation of the MHD fusion algorithm based on Tucker decomposition is between 0.3–0.7, with a large increase, indicating that the energy consumption balance between its wireless sensor nodes is poor; The coefficient of variation of data fusion method based on privacy protection is between 0.1–0.3, which is lower than that of MHD fusion algorithm based on Tucker decomposition. The data fusion algorithm proposed in this paper keeps the coefficient of variation between 0.02–0.1, which shows that the energy consumption of wireless sensor nodes in this method is relatively balanced, and the energy consumption balance performance is good.
The proposed data fusion algorithm can effectively reduce the energy consumption of the network. Therefore, it is suitable for data fusion of long-distance large-scale WSN on the premise of known large data sets.
A WSN data fusion method based on machine learning is proposed. Establish the wireless sensor network model, collect the WSN network data through the compressed sensing method, reprocess it, use the DCS method to process the WSN network abnormal data, combine the adaptive theory and the batch estimation fusion algorithm to preliminarily fuse the WSN network data, and finally use the Bayesian inference method in the machine learning algorithm to further fuse the WSN network data. In the experiment, the number of surviving nodes is large, the energy consumption of nodes is always not higher than 0.25 J, and the coefficient of variation remains between 0.02–0.1. The energy consumption between wireless sensor nodes is relatively balanced, which effectively improves the energy consumption balance between wireless sensor nodes, reduces the energy consumption of data fusion, and has certain practical application performance.
However, due to time constraints, there are still many deficiencies in this paper. Testing only in the simulation environment can reduce the energy consumption of nodes, but it ignores whether there will be data loss in practical applications. Therefore, in the following research, this method is applied to the actual network environment. Through practical application, the effect of data fusion of this method is verified to further ensure the security of wireless sensor network data transmission environment.
