Data fusion method for wireless sensor network based on machine learning

Abstract

In order to improve the energy consumption balance between wireless sensor nodes and reduce the energy consumption of nodes in the process of data fusion, a machine learning based data fusion method for wireless sensor networks is proposed. Through the establishment and training of wireless sensor network model, the compressed sensing method is used to collect wireless sensor network data, and the multi-dimensional de aggregation class analysis algorithm is used to de duplicate the collected data. Using the spatial correlation between the data collected by multiple sensor nodes, the DCS method is used to process the abnormal data of WSN network. In order to eliminate the influence of measurement error on the fusion accuracy, the WSN network data is preliminarily fused by combining the adaptive theory with the batch estimation fusion algorithm. Based on the preliminary fusion results of WSN network data, the Bayesian inference method in machine learning algorithm is used to further fuse WSN network data. The experimental results show that the number of surviving nodes is large and the energy consumption is low when using this method for data fusion. The energy consumption between wireless sensor nodes has a certain balance, which proves that this method has a good data fusion effect.

Keywords

Machine learning wireless sensor network data fusion compressed sensing DCS method Bayesian reasoning

1. Introduction

Wireless sensor network (WSN) has the characteristics of small node size, low cost, multi-hop ad hoc network and large sensing area [1]. WSN can effectively monitor and warn environmental information, and has been widely used in intelligent agriculture, environmental monitoring and national defense and military fields. WSN is a resource-constrained network, and its performance is restricted by factors such as node energy, computing power, and storage space [2]. Usually, a great amount of homogeneous sensor nodes is arranged in the sensing area to collect and transmit information periodically. WSN produces a large amount of repetitive redundant data while monitoring information. On the one hand, processing and transmitting redundant data wastes limited energy and network bandwidth. On the other hand, interference factors and sensor measurement accuracy lead to errors in the monitoring results of the system, and the random fault nodes also reduce the reliability of the system to a certain extent. How to reduce redundant data, reduce node energy consumption, improve network reliability and prolong its effective lifetime has become a very important issue in WSN research [3]. Multisensor data fusion technology fuses data with certain redundancy, which can effectively reduce data transmission, reduce node energy consumption and improve the accuracy of monitoring results. Therefore, WSN data fusion technology needs to be deeply studied [4].

To make the utmost of heterogeneous data on edge devices and solve the problem of “data communication barrier” caused by data privacy in edge computing, Mo et al. [5] proposed a multi-source heterogeneous data (MHD) fusion algorithm based on Tucker decomposition in federated learning. Aiming at the fusion of heterogeneous data without interaction, the algorithm introduces tensor Tucker decomposition theory, and constructs a high-order tensor with heterogeneous spatial dimension characteristics to capture the high-dimensional characteristics of heterogeneous data, so as to realize the fusion of MHD in federated learning. Finally, the effectiveness of the algorithm is verified on MOSI data set, but the algorithm has the problem of less survival number of wireless sensor network nodes. Li et al. [6] proposed a multi-attribute fusion method for heterogeneous data based on intuitionistic fuzzy sets. First, a new discount operator and combination rule for intuitionistic fuzzy numbers are defined, and the basic properties of the operator and combination rule are proved. Secondly, the distance measure between heterogeneous data is defined, and the coefficient matrix for decision-making is obtained based on the gray correlation method. Through the construction process of the alternative evidence, the intuitionistic fuzzy number is generated, and based on the intuitionistic fuzzy entropy, the attribute weight is given. calculation method. Finally, according to the intuitionistic fuzzy number synthesis rule and the comparison rule, the result judgment is realized. It was found that such method can effectively reduce the computing time and improve the efficiency of data fusion; however, in the process of data fusion, there will be a problem of large node energy consumption. To realize multi-source data fusion and ensure the privacy and security of fused data, Long et al. [7] proposed a data fusion method based on privacy protection in population intelligent sensing network. Users sign their perceptual data with a key and add noise to the data through differential privacy. The ECS combines the BGN encryption system and Shamir secret sharing to fuse the encrypted disturbed data and send it to the perceptual platform. The results of security and privacy analysis show that this method can resist the differential attacks from ECs and malicious users and ensure the security of users and ECs. However, the energy consumption between wireless sensor nodes is lack of balance, which leads to less survival of wireless sensor network nodes, and affects the effect of data fusion.

In order to solve the above problems, this paper proposes a WSN data fusion method based on machine learning. The following is the core line of this paper:

(1)
The machine learning algorithm is used to establish and train the wireless sensor network model, the compressed sensing method is used to collect the wireless sensor network data, the multidimensional de aggregation analysis algorithm is used to delete the duplicate data, and the abnormal data of the wireless sensor network is processed.
(2)
The weighted data is preliminarily fused by adaptive estimation, and then the Bayesian inference method is used to further fuse the WSN network data to enhance the effect of data fusion.
(3)
Taking the number of surviving nodes, node energy consumption and energy consumption balance as test indicators, the effectiveness of the design method is verified by experiments, which proves that the design method in this paper has good application performance in data fusion.

2. Wireless sensor network data preprocessing

2.1 Wireless sensor network model construction and training

A WSN model is constructed, as shown in Fig. 1, which is composed of monitoring center PC, gateway and multiple wireless sensor nodes. The monitoring system takes the first-order radio model as the energy consumption model of WSN, and adopts the energy balanced adaptive clustering algorithm (EBACA) to divide the network into several clusters; each cluster contains a cluster head and several sensor nodes [8]. Sensor nodes are responsible for data collection, detection, transmission, and monitoring of node status and external events; the cluster head node collects sensor node data in the cluster, runs the fusion algorithm, and uploads the fusion results to the gateway; the gateway forwards the fusion results of each cluster to the Monitoring Center PC. If the user issues a data query command, the gateway and the cluster head receive and forward the query command to each sensor node in turn. The wireless sensor network model constructed in this paper is shown in Fig. 1.

Figure 1.

WSN model.

For the WSN model shown in Fig. 1, it is trained. Since the deep autoencoder model and the WSN model have similar structures, they can be combined organically. First, the deep autoencoder is trained at the sink node. The training algorithm is as follows:

Step 1:

The aggregation node uses the data in the sample library as the original input data;

Step 2:

Construct a deep autoencoder network, take the original data as the initial input data, and train the network weights in turn;

Step 3:

Use the greedy algorithm to train the weights to obtain the input of the Softmax classifier, and use the label samples to train the classifier supervised;

Step 4:

Use the weight combination as the initial weight to fine-tune the overall network;

Step 5:

Use the label samples to test the network, and the classification accuracy meets the requirements, and then go to Step 6; otherwise, after readjusting the parameters, go to Step 2;

Step 6:

The sink node sends the trained deep autoencoder network parameters to the corresponding node.

After the training is completed and the network parameters are obtained, the deep autoencoder model is layered and deployed in the corresponding layer of the WSN, thereby realizing the training of the WSN model.

2.2 WSN data collection based on compressed sensing

In practical applications, considering that WSN consists of a great quantity of sensor nodes, an “information-aware cluster” can be formed through network protocols, and the cluster head distributes the sequence contained in the $D$ row vectors in the observation matrix to each cluster member, that is, the acquisition node. Each acquisition node samples the sampled data at a low rate, and then performs a simple operation with the received sequence to obtain a compressed observation value. Each acquisition node only needs to send a compressed observation data to the cluster head, and the cluster head will receive the compressed observation data, which is transmitted to the sink node, and the sink node completes the decompression and extraction of the information. The entire WSN data collection process is shown in Fig. 2.

Figure 2.

Schematic diagram of WSN data collection.

As can be seen from Fig. 2, the WSN data collection method based on compressed sensing is based on a regionalized form [9]. The basic idea is to perform regionalized compressed sensing in different regions to disperse the load in the central region and save the energy consumption of the entire network transmission. For example, suppose a wireless sensor network consists of 3 areas $R=\left\{{r_{i}}\right\}$ , where, $i=$ 1, 2, and 3. Each area has several sensor nodes. There is a sink node in the network, which collects the sensing data of the nodes in each area. In the area $r_{i}$ , a node is selected as the area center node $u_{i}$ , which is responsible for compressing and forwarding the data. The selection of regional center nodes generally adopts random scheduling of candidate sets, and the rotation criteria can be based on time, energy, real-time performance, and service quality assurance. The candidate set is composed of boundary nodes near the sink end in each region, and other nodes in the region are called regular nodes $N_{k}^{i}$ , $N_{k}^{i}\in r_{i}$ .

When collecting WSN data, the conventional node $N_{k}^{i}$ directly sends the collected raw signal $x\left(n\right)$ to the corresponding regional center node $u_{i}$ , and then $u_{i}$ performs regionalized compressed sensing measurement. The specific measurement steps are as follows:

(1)

A region measurement matrix $\phi_{i}$ of $Y\times r_{i}$ is generated by $u_{i}$ , where $Y$ is the sampling amount required for accurate reconstruction of the entire WSN;

(2)

$u_{i}$ uses the CS method to fuse the received data values in area $r_{i}$ to generate $Y$ area measurement values, namely:

$\displaystyle Y_{h}^{i}=\sum_{i=1}^{N}{\alpha_{h}\times N_{k}^{i}}$ (1)

In the formula, $h=1,2,\ldots,Y$ .

Obviously, $\alpha_{h}$ is only a part of the overall measurement value of the network. When the sink node receives all the regional measurement values, the complete measurement can be achieved.

According to the above process, the number of data transmissions in the area is obtained as:

$\displaystyle z_{i}\left(k\right)=\frac{\upsilon_{i}\left(k\right)}{\sqrt{x_{% ti}\left(k\right)+\left(y_{ti}\left(k\right)-y_{gi}\left(k\right)\right)^{2}+% \upsilon_{i}\left(k\right)}}$ (2)

In the formula, $\upsilon_{i}\left(k\right)$ represents the set of regional center nodes of the network; $x_{ti}\left(k\right)$ and $y_{ti}\left(k\right)$ both represent the number of hops from common nodes to the center node; $y_{gi}\left(k\right)$ represents the transmission times of the regional center node.

According to Eq. (2), the data transmission times of the entire WSN network are:

$\displaystyle Z_{i}\left(k\right)=f_{i}\left({x_{ti}\left(k\right)+y_{ti}\left% (k\right)}\right)\times\upsilon_{i}\left(k\right)$ (3)

In the formula, $f_{i}$ represents the number of hops from the regional center node to the sink node.

Through above analysis, the compressed sensing method combines the advantages of the direct transmission method, so that the transmission energy consumption in each individual area is lower than the energy consumption when the CS method is used alone. In other words, the compressed sensing method reduces the transmission energy consumption of the border nodes and makes up for the deficiency of the CS method. It should be noted that the boundary node in compressed sensing refers to the boundary node in the area. Because it expands the connotation of the boundary node, the WSN network will not form a large central area during the data transmission process, which makes its performance outperforms the CS method on the whole.

2.3 WSN data deduplication

Data fusion refers to the distributed aggregation and fusion processing of data in the process of data transmission, removing redundant information, and combining into more effective, concise and accurate data. Therefore, on the basis of data collection, a multi-dimensional deduplication clustering analysis algorithm is used to deduplicate the data in the WSN network. The multi-dimensional deduplication clustering analysis algorithm studied in this paper uses the Bayesian network model to reveal the latent structure, and establishes the logical relationship between explicit and latent variables. The multi-dimensional deduplication clustering analysis algorithm can analyze and process unstructured data according to the relationship between probabilities [10]. The processing steps are as follows:

(1)
Data preprocessing. Data noise is reduced by data cleaning, thereby solving the problem of data loss. In data processing, discretization is performed first and then data conversion is performed;
(2)
Training set. Use probability evaluation to preprocess the data, divide the data set obtained after processing into different sets, then use the classification algorithm to build a classifier, use the classifier to evaluate the prepared test set, and continue to use it;
(3)
Vectorization. When combining consecutive sequences into a sequence set, the sequence must be converted into a format that can be recognized by the computer, and quantify it, and use the processed sequence as a feature vector to realize final data processing;
(4)
Multidimensional clustering. Sequence features will cause a certain sparsity in the feature space of sequence vectors. A feature selection method can be used to reduce the dimension to improve the processing efficiency of the classifier. The data deduplication processing steps of the multi-dimensional deduplication clustering algorithm analysis model are shown in Fig. 3.

Figure 3.
WSN data deduplication steps.

2.4 WSN network abnormal data processing

Using the spatial correlation between the data collected by multiple sensor nodes, through the joint sparse reconstruction of multiple sets of measurement values, the amount of data required for data fusion can be further reduced on the premise of ensuring the fusion’s performance. The following introduces the abnormal data processing method of wireless sensor network based on DCS.

For the first type of abnormal data model, the signal contains the same normal data component and different abnormal data components, and the raw data $G$ recorded by a total of sensors can be written as:

$\displaystyle G=\left[{g_{1},g_{2},\ldots,g_{x}}\right]^{T}$ (4)

Since the normal data components in the data recorded by each sensor are exactly the same, namely $g_{1}^{n}=g_{2}^{n}=\ldots=g_{x}^{n}$ , the sparse representation of $G$ can be written as:

$\displaystyle G=\tilde{\Phi}s^{\prime}$ (5)

In the formula, $s^{\prime}$ is the frequency domain sparse representation; $\tilde{\Phi}$ is the time domain sparse representation. The expressions of the two are:

$\displaystyle s^{\prime}=\left[{s_{1}^{n},s_{2}^{n},\ldots,s_{X}^{n}}\right]^{T}$ (6) $\displaystyle\tilde{\Phi}=\left[{{\begin{array}[]{cccc}V&C&\ldots&0\\ V&0&\ldots&0\\ \vdots&\vdots&\ddots&\vdots\\ V&0&0&C\\ \end{array}}}\right]$ (7)

In the formula, $V$ denotes the $N\times N$ -dimensional fourier transform matrix; $C$ represents the $N\times N$ -dimensional identity matrix; $s_{i}^{n}$ represents the frequency-domain sparse representation of normal data components; $s_{X}^{n}$ represents the time-domain sparse representation of abnormal data components recorded by the sensor.

The raw high-dimensional data recorded by multiple sensors is compressed and measured using a Gaussian random measurement matrix to obtain multiple sets of compressed measurement values. This process can be expressed as:

$\displaystyle G=\tilde{\Phi}s=\tilde{\Phi}\tilde{\psi}s^{\prime}$ (8)

In the formula, $\tilde{\psi}$ is the compression measurement of the sensor. Equation (8) is the DCS processing method of the first type of abnormal data model.

For the second type of abnormal data model, the signal contains normal data components and abnormal data components, wherein the normal data components are further divided into the same part and different parts. Similarly, a sparse representation of the original data $G$ :

$\displaystyle G=\tilde{\Phi}s^{\prime\prime}$ (9)

Of which:

$\displaystyle s^{\prime\prime}=\left[{s_{1}^{N},s_{2}^{N},\ldots,s_{X}^{N}}% \right]^{T^{\prime}}$ (10) $\displaystyle\tilde{\Phi}=\left[{{\begin{array}[]{cccc}V&V&\ldots&0\\ V&C&\ldots&0\\ \vdots&\vdots&\ddots&\vdots\\ V&0&V&C\\ \end{array}}}\right]$ (11)

In the formula, $s_{i}^{N}$ represents the frequency domain sparse representation of the same part of the normal data component; $s_{X}^{N}$ represents the frequency domain sparse representation of different parts of the normal data component recorded by the sensor.

Similarly, the Gaussian random measurement matrix is used to compress the original high-dimensional data recorded by multiple sensors to obtain multiple sets of compressed measurement values. The process can be expressed as:

$\displaystyle G=\tilde{\Phi}s=\tilde{\Phi}\tilde{\psi}s^{\prime\prime}$ (12)

Equation (12) is the DCS processing method of the second type of abnormal data model.

Combining Eqs (8) and (12) can realize the effective processing of abnormal data in WSN network, improve the accuracy of WSN network data fusion, and reduce energy consumption.

3. Wireless sensor network data fusion methods

3.1 Adaptive estimation weighted fusion algorithm

To eliminate the impact of measurement error on fusion accuracy, the adaptive theory and batch estimation fusion algorithm are combined to calculate the relative variance of node data and batch estimation fusion results, adjust the weight of node data according to the relative variance, and carry out adaptive estimation weighted data fusion [11]. Figure 4 shows the flow of fusion algorithm.

Figure 4.

Adaptive estimation weighted fusion algorithm flow.

According to Fig. 4, the algorithm steps are briefly described as:

(1)

Group the nodes, calculate the mean $\overline{f}_{w}$ , standard deviation $\widehat{\varepsilon}_{w}$ and fusion weight $a_{p}$ of each group of data, and further solve the batch estimation fusion result $f^{\prime}$ ;

(2)

Calculate the relative variance of node data $f_{wi}$ and $f^{\prime}$ , and calculate the correction factor $t_{p}$ of each group of data according to $\overline{f}_{w}^{2}$ , $\widehat{\varepsilon}_{w}$ and $\overline{f}_{w}^{2}$ ;

(3)

Perform secondary weighted fusion on the data with the revised weights $a_{p}$ and $t_{p}$ to obtain the final fusion result $f$ . The data fusion weight $a_{p}$ of the $p$ group of nodes, the batch estimated fusion result $f^{\prime}$ , and the relative variance $\overline{f}_{w}^{2}$ are represented by Eqs (13) to (15):

$\displaystyle a_{p}=\frac{1}{\widehat{\varepsilon}_{w}}\left[{\sum\limits_{w=1% }^{N}{\frac{1}{\widehat{\varepsilon}_{w}}}}\right]^{2}$ (13) $\displaystyle f^{\prime}=\sqrt{\left({\frac{1}{\widehat{\varepsilon}_{w}}}% \right)^{2}}\times\sum\limits_{p=1}^{N}{a_{p}t_{p}}$ (14) $\displaystyle\overline{f}_{w}^{2}=\frac{\widehat{\varepsilon}_{w}}{\left({% \overline{f}_{w}+\widehat{\varepsilon}_{w}}\right)\times a_{p}t_{p}}$ (15)

It can be seen from the multi-sensor weighted data fusion algorithm that in order to eliminate measurement errors, the fusion weights should be inversely proportional to the estimated variance, and the sum of the weights is always 1. Therefore, $a_{p}$ and $t_{p}$ satisfy the following constraints:

$\displaystyle\left\{{\begin{array}[]{l}\sum\limits_{p=1}^{N}{a_{p}=1}\\ \sum\limits_{p=1}^{N}{t_{p}=1}\\ \sum\limits_{p=1}^{N}{a_{p}t_{p}=1}\\ \end{array}}\right.$ (16)

Based on the above analysis, the correction factor $t_{p}$ is shown in Eq. (17):

$\displaystyle t_{p}=\sum\limits_{p=1}^{N}{\left({\left\|{\widehat{p}_{g}-p_{s}% ^{i}}\right\|-t_{i}}\right)}^{2}$ (17)

According to the adjusted weights $a_{p}$ and $t_{p}$ , the adaptive estimation weighted fusion result can be obtained as shown in Eq. (18):

$\displaystyle f^{\prime}=\sum\limits_{p=1}^{N}{a_{p}t_{p}}$ (18)

According to the adaptive estimation weighted fusion algorithm, the initial fusion of WSN network data is realized. To facilitate data fusion, the Bayesian inference algorithm in the machine learning algorithm is used to further fuse the WSN network data.

3.2 WSN data fusion algorithm based on Bayesian inference

Based on the preliminary fusion and combination of the WSN network data obtained in Section 3.1, the Bayesian inference method is used to further fuse the WSN network data. The Bayesian method is an inductive reasoning method based on probability and statistics, and its conclusions depend not only on present measurements, but also on past experience and knowledge [12]. The basic principle is that given a previous prior probability, when there is a new measurement (sample information), the assumed prior probability is updated to the posterior probability [13].

Bayes’ theorem usually considers the conditional probability $P\left({\frac{L}{Q_{i}}}\right)$ of event $L$ under the condition that any event $Q_{i}$ in a known event set $Q_{i}\left({i=1,2,\ldots,n}\right)$ has occurred, so as to obtain the probability of occurrence under the assumption condition $L$ :

$\displaystyle P\left({\frac{Q_{i}}{L}}\right)=P\left({Q_{i}}\right)P\left({% \frac{L}{Q_{i}}}\right)\times\sqrt{P\left({\frac{L}{Q_{i}}}\right)}$ (19)

In order to apply Bayes’ theorem for reasoning, the underlying probability needs to be ignored, that is, the event set is not considered, but the assumed probability of an event $Q$ under $M$ conditional assumptions $M_{1},M_{2},\ldots,M_{m}$ is determined. $P\left({Q_{i}}\right)$ represents the prior probability of different hypotheses, and $P\left({L|Q_{i}}\right)$ is the probability of occurrence of $L$ when hypothesis $Q_{i}$ is established, that is, the posterior probability of event $L$ relative to hypothesis $Q_{i}$ , then:

$\displaystyle P\left({\frac{Q_{i}}{L}}\right)=\frac{P\left({Q_{i}}\right)}{% \int\limits_{i}^{\infty}{P\left({Q_{i}}\right)P\left({L|Q_{i}}\right)}}$ (20)

The main idea of designing a multi-sensor data fusion method based on Bayesian reasoning is to set $N$ sensors of different types to identify $k$ attributes of the same target. First, classify the data observed by $N$ sensors for $M$ hypotheses, get a set of target descriptions about different attributes [14]; Secondly, calculate the conditional probability (likelihood function) of each description when each hypothesis is true, then calculate the joint likelihood function of $W$ target descriptions under $M$ conditional assumptions:

$\displaystyle P\left({\frac{Q_{1},Q_{2},\ldots,Q_{n}}{L}}\right)=P\left({\frac% {Q_{1}}{L_{i}}}\right)P\left({\frac{Q_{2}}{L_{i}}}\right)\ldots P\left({\frac{% Q_{n}}{L_{i}}}\right)$ (21)

Finally, the Bayesian formula is used to calculate the posterior density of various hypotheses $Q_{i}$ under the description of $W$ target:

$\displaystyle P\left({Q_{i}}\right)^{2}=\left({Q_{i}\left(\kappa\right),Q_{j}% \left(\kappa\right)}\right)^{2}\times P\left({\frac{Q_{i}}{L}}\right)$ (22)

Then the maximum a posteriori density $\left[{P\left({Q_{i}}\right)^{2}}\right]_{\max}$ can be obtained, which can be used as the judgment basis to decide whether to accept or reject the $Q_{i}$ hypothesis, so as to discard redundant information and complete data fusion [15].

To sum up, the data fusion algorithm based on Bayesian reasoning can infer a result according to the judgment obtained by each sensor according to the prior probability of different attributes of the observation object, and continuously update the prior probability according to the actual measurement information to put forward the reliability of reasoning [16].

4. Experimental research

The data fusion method of WSN based on machine learning is simulated through MATLAB simulation platform. The such method is compared with the MHD fusion algorithm based on Tucker decomposition and the data fusion method based on privacy protection. The evaluation of data fusion algorithm needs to be carried out from the aspects of node survival number, node energy consumption and energy consumption balance.

4.1 Setting of experimental environment

The simulation environment uses NSZ, the MAC protocol is 802.15.4, the routing protocol is DSDV, and the application layer protocol is ping. The nodes are evenly distributed in a 200 m $\times$ 200 m rectangular area, and the nodes are deployed in the form of a grid. The total number is N, varing from 100 to 400 nodes, and N nodes have 1 unit of electricity, and the data packet transmission rate is 100 kb/s.

4.2 Experimental results

4.2.1 Comparison of the number of surviving nodes

Taking the number of surviving nodes as the experimental index, the application effects of MHD fusion algorithm based on Tucker decomposition, data fusion method based on privacy protection and this method are compared. Figure 5 reflects the comparison of the number of surviving nodes.

Figure 5.

Comparison of the number of surviving nodes.

As can be seen from Fig. 5, the number of nodes in the three methods decreases with the increase of rounds. The number of nodes of the MHD fusion algorithm based on Tucker decomposition and the data fusion method based on privacy protection attenuated to 0 in the 275th and 300th rounds, respectively. At the end of the experiment, that is, the number of nodes in this method decays to 0 in the 500th round. In the process of overall data fusion, the number of surviving nodes is much more than the other two methods, which can improve the effect of data fusion.

4.2.2 Node energy consumption

A simple energy consumption model is adopted in the WSN, ignoring the energy consumed by nodes in the process of calculation and storage, and only calculating the communication energy consumption of nodes, that is, the energy consumption of nodes transmitting data in the network with the communication time. The transmission distance between the source node and the sink node in the free communication space is used to judge the trend of energy consumption by different methods under the same conditions. Assuming that the distance traveled by l bit data transmission is d (20 $\leqslant$ d $\leqslant$ 25), the energy consumption formula for sensor nodes to complete data fusion processing is:

$\displaystyle E\left({d,d_{0}}\right)=E_{\tau}+E_{\vartheta}$

(23) $\displaystyle\quad=\left\{{\begin{array}[]{ll}E_{\tau}+E_{\vartheta}d^{2}&% \quad d<d_{0}\\ E_{\tau}+E_{\vartheta}d^{4}&\quad d\geqslant d_{0}\\ \end{array}}\right.$

In the formula, $d_{0}$ represents the threshold, which is assumed to be 25 m, and the free space communication method is adopted, that is, when the sender-receiver distance is less than the threshold, the energy consumption of the sender node is proportional to the square of the distance; otherwise, it is the fourth time of the distance. $E_{\tau}$ represents the energy consumption value of each bit of data sent or received by the node; $E_{\vartheta}d^{2}$ and $E_{\vartheta}d^{4}$ represent the energy consumption value of the signal amplifier when sending each bit of data. According to the above formula, the change trend of the energy consumption value of the sensor nodes participating in the fusion is calculated. Taking the energy consumption of a single node as an example, the obtained change curve is shown in Fig. 6.

Figure 6.
Change trend of node energy consumption.

It can be seen from Fig. 6 that the node energy consumption of the three methods increases with the increase of time. The node energy consumption of the MHD fusion algorithm based on Tucker decomposition and the data fusion method based on privacy protection is always higher than 0.2 J, while the node energy consumption of the method proposed in this paper is always not higher than 0.25 J, which shows that the node energy consumption of wireless sensor network data fusion using this method is low and the data fusion effect is good.

This is because this method uses the compressed sensing method to collect WSN data. The compressed sensing method fully considers the advantages of the direct transmission method, making the transmission energy consumption of each region lower than that of the CS method alone, that is, the compressed sensing method reduces the transmission energy consumption of the boundary nodes.
4.2.3 Energy consumption balance

Taking the energy consumption balance as the experimental index, the application effects of MHD fusion algorithm based on Tucker decomposition, data fusion method based on privacy protection and this method are compared. Figure 7 reflects the comparison of energy consumption balance.

Figure 7.

Comparison of energy consumption balance.

In Fig. 7, the smaller the coefficient of variation, the more balanced the energy consumption between wireless sensor nodes. It can be seen from the results in Fig. 7 that the coefficient of variation of the MHD fusion algorithm based on Tucker decomposition is between 0.3–0.7, with a large increase, indicating that the energy consumption balance between its wireless sensor nodes is poor; The coefficient of variation of data fusion method based on privacy protection is between 0.1–0.3, which is lower than that of MHD fusion algorithm based on Tucker decomposition. The data fusion algorithm proposed in this paper keeps the coefficient of variation between 0.02–0.1, which shows that the energy consumption of wireless sensor nodes in this method is relatively balanced, and the energy consumption balance performance is good.

The proposed data fusion algorithm can effectively reduce the energy consumption of the network. Therefore, it is suitable for data fusion of long-distance large-scale WSN on the premise of known large data sets.

5. Conclusion

A WSN data fusion method based on machine learning is proposed. Establish the wireless sensor network model, collect the WSN network data through the compressed sensing method, reprocess it, use the DCS method to process the WSN network abnormal data, combine the adaptive theory and the batch estimation fusion algorithm to preliminarily fuse the WSN network data, and finally use the Bayesian inference method in the machine learning algorithm to further fuse the WSN network data. In the experiment, the number of surviving nodes is large, the energy consumption of nodes is always not higher than 0.25 J, and the coefficient of variation remains between 0.02–0.1. The energy consumption between wireless sensor nodes is relatively balanced, which effectively improves the energy consumption balance between wireless sensor nodes, reduces the energy consumption of data fusion, and has certain practical application performance.

However, due to time constraints, there are still many deficiencies in this paper. Testing only in the simulation environment can reduce the energy consumption of nodes, but it ignores whether there will be data loss in practical applications. Therefore, in the following research, this method is applied to the actual network environment. Through practical application, the effect of data fusion of this method is verified to further ensure the security of wireless sensor network data transmission environment.

References

Gandhi

Vikas

Ratnam

Babu

. Grid clustering and fuzzy reinforcement-learning based energy-efficient data aggregation scheme for distributed WSN. IET Commun. 2020; 14(16): 2840-2848.

Kumar

Lal

Chaurasiya

. An energy efficient IPv6 packet delivery scheme for industrial IoT over G9959. protocol based wireless sensor network (WSN). Comput Netw. 2019; 154(8): 79-87.

Souissi

Nadia

Lamjed

. A multi-level study of information trust models in WSN-assisted IoT. Comput Netw. 2019; 151(14): 12-30.

Kumar

Chaurasiya

. A Strategy for elimination of data redundancy in internet of things (IoT) based wireless sensor network (WSN). IEEE Syst J. 2019; 13(2): 1650-1657.

Zheng

Gao

Feng

. Multi-source heterogeneous data fusion based on federated learning. J Comput Res Develop. 2022; 59(2): 478-487.

Guan

Liu

. Heterogeneous data fusion method based on intuitionistic fuzzy discount operator. Syst Eng Electron. 2021; 43(2): 311-317.

Long

Zhang

. Data fusion method based on privacy preserving in crowd sensing network. Comput Eng Design. 2020; 41(12), 3346-3352.

Chakraborty

Goyal

Mahapatra

Soh

. Minimal path-based reliability model for wireless sensor networks with multistate nodes. IEEE Trans Relia. 2020; 69(1): 382-400.

Shi

. Singular value watermarking algorithm in color image based on compressed sensing. Comput Simul. 2019; 36(10): 238-242.

10.

Meng

Jiang

. Secure data deduplication with reliable data deletion in cloud. Int J Found Comput ENCE. 2019; 30(4): 551-570.

11.

Lamti

Khelifa

Hugel

. Cerebral and gaze data fusion for wheelchair navigation enhancement: Case of distracted users. Robotica. 2019; 37(2): 246-263.

12.

Zarei

Khakzad

Cozzani

Reniers

. Safety analysis of process systems using Fuzzy Bayesian Network (FBN). J Loss Prevent Proc Ind. 2019; 57(1): 7-16.

13.

Masmoudi

. A new class of continuous Bayesian networks. Int J Approx Reasoning. 2019; 109(1): 125-138.

14.

Guo

Zhan

. Influence of different data fusion methods on the accuracy of three-dimensional displacements field. Adv Space Res. 2020; 65(6): 1580-1590.

15.

Poggi

Agresti

Tosi

Zanuttigh

Mattoccia

. Confidence estimation for TOF and stereo sensors and its application to depth data fusion. IEEE Sens J. 2020; 20(3): 1411-1421.

16.

Meng

Mao

Wei

Zhang

. Probabilistic water body mapping of GF-3 images based on prior probability estimation. Acta Geod Cartographica Sin. 2019; 48(4): 439-447.