Abstract
In order to improve the classification accuracy and shorten the classification time of mass data, a fast classification method of mass data in the Internet of things based on fuzzy clustering maximum tree algorithm is proposed. Reduce the dimension to process the mass data of the Internet of things, establish the time series of the mass data of the Internet of things, and complete the preprocessing of the mass data of the Internet of things. Extract the feature vector of the Internet of things mass data, and use the fuzzy clustering maximum tree algorithm to perform fuzzy clustering analysis on the Internet of things mass data, so as to realize the classification of the Internet of things mass data. The results show that the recall rate of the proposed method is as high as 97.5%, the root mean square error is only 0.030, and the classification time is only 12.3 ms.
Keywords
Introduction
In recent years, the rapid development of the Internet of things and mobile Internet has injected new vitality into various industries. The era of intelligence based on big data has arrived. In this era of interconnection of all things, data inevitably begins to grow in geometric progression, the speed of data generation is faster and faster, and the amount of data increases explosively [5,11,13]. Especially in the Internet of things, in the environment of interconnected things, mass data is complex and the transmission process is dynamic and unstable. Therefore, a large number of different RFID readers and sensors must be deployed to the Internet of things environment to monitor the mass data generated in the Internet of things in real time. In the era of big data, mass IOT data contains infinite value. How to mine the value of mass IOT data has become a new industry hotspot [1,3,20]. Especially in the Internet of things, facing tens of thousands of mass data, how to effectively classify and extract mass data is a very important research topic. Therefore, it is of great significance to quickly classify the mass data of the Internet of things.
At present, scholars in related fields have carried out research on the classification of mass data of the Internet of Things. Reference [10] proposed a classification method for noisy and incomplete IoT datasets based on time series data analysis. This paper presents results using topological data analysis to understand a dataset of hundreds of interacting IoT devices located in multiple residential environments captured over 9 months. Data sets are noisy, incomplete, and subject to fluctuations in multiple life patterns. We treat the dataset as a collection of multi-attribute time series and conduct several types of IoT classification experiments. The results are compared with other single-attribute and multi-attribute techniques for time series analysis. The results show that topological data analysis does a particularly good job of classifying incomplete, noisy, and life-pattern-related IoT data compared to these other standard methods. Reference [6] proposed a temporal distribution feature learning method in IoT network traffic classification. Based on a new representation of network data, traffic data is treated as a series of images. Therefore, network data is implemented as a video stream to employ temporal distribution feature learning. Temporal information in network statistics is learned using convolutional neural networks and long short-term memory, and pseudo-temporal features between streams are learned using temporally distributed multilayer perceptrons. Experiment with a larger dataset with more classes. Experimental results show that temporal distribution feature learning improves network classification performance by 10%. Reference [8] proposes an energy- and trust-aware big data classification and secure routing algorithm based on MapReduce framework in the Internet of Things. The proposed model is based on the traditional energy harvesting trust-aware routing algorithm and is designed with a link lifetime model. The model utilizes cost metric and comprehensively considers factors such as delay, link life, energy and trust, and effectively selects the optimal safe routing path. The big data classification process is performed at the base station using the MapReduce framework. Therefore, big data classification is performed using stacked autoencoders trained by the adaptive E-Bat algorithm. The adaptive E-Bat algorithm is developed by combining the adaptive concept with the Bat algorithm and exponentially weighted moving average. The proposed model energy harvesting trust-aware routing algorithm exhibits better performance by obtaining a maximum energy of 0.9855. However, the above methods still have the problems of poor classification effect, low accuracy and long time.
To solve the above problems, this paper proposes a fast classification method for mass data in the Internet of things based on fuzzy clustering maximum tree algorithm. The specific technical route is as follows:
Step 1: use the principal component analysis method to reduce the dimension and process the mass data of the Internet of things. Based on the interval number theory, the time series of mass data in the Internet of things is established.
Step 2: select the feature items of the mass data of the Internet of things, build the feature vector space model of the mass data of the Internet of things, and form the feature vector of the mass data of the Internet of things.
Step 3: use the fuzzy clustering maximum tree algorithm to perform fuzzy clustering analysis on the mass data set of the Internet of things to realize the classification of mass data of the Internet of things.
Step 4: experimental analysis
Step 5: conclusion.
Design of fast classification method for mass data in Internet of things
Mass data preprocessing of Internet of things
In order to speed up the classification of the mass data of the Internet of Things and improve the classification accuracy of the mass data of the Internet of Things. Using the unsupervised principal component analysis method [4,12,16], the mass IoT data is mapped to the obtained optimal subspace of IoT mass data distribution, and the dimensionality reduction processing of the IoT mass data is completed.
Assuming that the IoT mass data set consisting of q IoT mass data features w is
From this, the attribute covariance matrix [2] expression of the mass IoT dataset W is derived, as follows:
Orthogonal decomposition of the covariance matrix
In formula (3), R is an orthogonal matrix, which is composed of eigenvectors in the covariance matrix, and
Arrange the eigenvalues of the IoT mass data corresponding to each vector in descending order, and obtain the first few eigenvectors with the largest values. According to the obtained vector, the optimal subspace for the distribution of IoT mass data is determined, in which the projection operation of the initial IoT mass data is completed, and the new IoT mass data after dimensionality reduction processing is obtained.
Since the characteristic values of IoT mass data are in a state of continuous change, the time series of IoT mass data is established based on interval number theory [7,18] to better suppress the fuzzy uncertainty factors in IoT mass data. The fuzzy Internet of Things mass data attributes and the change state have a proportional function relationship, and the absolute change rate is used to describe the fuzzy attributes of the Internet of Things mass data.
Assuming that the mass IoT data corresponding to time t is
If the first-order difference of IoT mass data
In formula (5), A represents the total number of time series.
From this, the interval intuitionistic fuzzy number used to describe the
In the same way, the interval intuitionistic fuzzy numbers of all IoT mass data are obtained, and a time series
Construction of feature vector space model for mass data of Internet of things
After the mass data preprocessing of the Internet of things, the mass data feature items of the Internet of things are selected to build the feature vector space model of the mass data of the Internet of things, so as to form the feature vector of the mass data of the Internet of things.
A bit is the smallest data unit in a computer, and a bit is the characteristic item of the mass data of the Internet of Things. After the feature statistics of the IoT mass data set, if a certain IoT mass data feature item does not appear in each sample. Then the feature item is removed from the feature vector of the mass data of the Internet of Things, so as to reduce the dimension of the feature vector of the mass data of the Internet of Things.
Assume that the set of z IoT mass data to be classified is
Use relevant data to describe the weight of feature items in the mass data of the Internet of Things, and use TF-IDF expression to express:
In formula (8),
In formula (9),
Based on the above process, the feature vector of the mass data of the Internet of Things is formed.
Classification of Internet of things mass data based on fuzzy clustering maximum tree algorithm
After the feature vector space model of IoT mass data is constructed, z IoT mass data sets
The fuzzy similarity matrix for establishing the mass data set
That is to say calibration, the key task is to determine the similarity coefficient that represents the similarity between the IoT mass data set
For the determination of
In order to improve the accuracy of similarity measurement, a geometric arithmetic mean method is used for calculation:
The connection between any two nodes
And the weight of the edge
The resulting fuzzy graph is expressed as:
The construction steps of the fuzzy clustering maximum tree algorithm are as follows:
Step 1: Initialize the IoT mass data weight set ϵ, the IoT mass data node set θ, and the IoT mass data set ϑ connected between two nodes are all empty sets;
Step 2: Find the edge
Step 3: Put the weight
Step 4: Check the weights of the edges connecting the two nodes in the IoT mass data composed of each node in θ and nodes other than θ, and find the connection between the two nodes in the IoT mass data with the largest weight edge
Step 5: End, at this time, the edge connected between two nodes in the IoT mass data in ϑ constitutes the largest fuzzy clustering tree of M.
A certain threshold μ is selected to make a horizontal cut set for the fuzzy clustering maximum tree algorithm, and the edge connecting between two nodes in the IoT mass data less than μ in the fuzzy clustering maximum tree algorithm is disconnected. Then a forest containing several subtrees is obtained, in which each node of each subtree constitutes a class, and the number of trees is the corresponding number of classes. Through the above steps, the classification of mass data of the Internet of Things is realized.
Experimental analysis
In order to verify the effectiveness of the fast classification method of IoT mass data based on the fuzzy clustering maximum tree algorithm, this experiment is carried out under the Linux Centos operating system, the programming environment is Python ide, the code is written using the Numpy package, and the hardware environment is CPU Intel i5 2.8 Ghz, memory is 8 G.
Experimental data
The data used in the experiment comes from the Function dataset, and the properties of the simulated dataset include age, salary, vocation, elevel and other attributes. The mass data volume of IoT is selected as 5000 PB, and the mass data of IoT is quickly classified.
Experimental protocols and indicators
The classification recall, classification root mean square error and classification time were used as performance indicators for analysis. The method of reference [10], the method of reference [6] and the proposed method were used to compare the proposed method to verify the effectiveness of the proposed method.
Classification recall rate: classification recall rate refers to the proportion of the number of positive examples of correctly classified IOT mass data to the number of positive examples of actual IOT mass data. The higher the classification recall rate, the better the classification effect of the method’s IOT mass data. The calculation formula of classification recall rate is:
In formula (18),
Root mean square error of classification: the smaller the root mean square error of classification, the higher the classification accuracy of the method’s mass data of the Internet of things. The calculation formula is as follows:
In formula (19), ρ is the amount of classified IOT mass data, and
Classification time: classification time refers to the time consumed by different methods to complete the classification of mass data. The shorter the classification time, the higher the classification efficiency of the method.
Mass data classification recall rate of Internet of things
In order to verify the classification effect of the proposed method on the mass data of the Internet of things, the classification recall rate is taken as the evaluation index. Using reference [10] method, reference [6] method and the proposed method for comparison, we get the comparison results of the mass data classification recall rate of the Internet of things with different methods, as shown in Fig. 1.

Comparison of recall rates of IoT mass data classification by different methods.
Analysis of Fig. 1 shows that when the amount of IoT mass data is 5000 GB, the average IoT mass data classification recall rate of the method of reference [10] is 88.3%, the average IoT mass data classification recall rate of the method of reference [6] is 79.2%. The average IoT mass data classification recall rate of the proposed method is as high as 97.5%. It can be seen that the proposed method has a high recall rate of IoT mass data classification, indicating that the proposed method has a good IoT mass data classification effect.
On this basis, the accuracy of the proposed method is verified, and the root mean square error of classification is used as the evaluation index. By comparing the methods of reference [10], reference [6] and the proposed methods, the comparison results of root mean square error of mass data classification of the Internet of things with different methods are shown in Table 1.
Comparison results of the root mean square error value of IoT mass data classification by different methods
Comparison results of the root mean square error value of IoT mass data classification by different methods
According to the data in Table 1, when the amount of IoT mass data reaches 5000 GB, the average root mean square error of the IoT mass data classification the method of reference [10] is 0.054, the average root mean square error of the IoT mass data classification the method of reference [6] is 0.069. The root mean square error value of the proposed method for the classification of mass IoT data is only 0.030. It can be seen that, compared with the method of reference [10] and the method of reference [6], the proposed method has a smaller root mean square error value for the classification of the Internet of Things mass data, indicating that the proposed method has a higher classification accuracy for the Internet of Things mass data.
The proposed method is further validated for the classification time of IoT mass data. Comparing the method of reference [10] and the method of reference [6] with the proposed method, the comparison results of the classification time of the mass data of the Internet of Things of different methods are obtained as shown in Table 2.
Comparison results of IoT mass data classification time by different methods
Comparison results of IoT mass data classification time by different methods
According to the data in Table 2, with the increase in the amount of IoT mass data, the classification time of IoT mass data by different methods increases. When the IoT mass data volume is 5000 GB, the IoT mass data classification time of the method of reference [10] is 19.5 ms, the classification time of the Internet of Things mass data of the method of reference [6] is 24.7 ms, while the classification time of the Internet of Things mass data of the proposed method is only 12.3 ms. It can be seen that the proposed method can shorten the classification time of mass data in the Internet of Things.
This paper proposes a fast classification method for mass data in the Internet of Things based on the fuzzy clustering maximum tree algorithm. By preprocessing the mass data of the Internet of Things, select the characteristic items of the mass data of the Internet of things, and construct the feature vector space model of the mass data of the Internet of things. On this basis, using the fuzzy clustering maximum tree algorithm, fuzzy clustering analyzes the mass data collection of the Internet of Things, and realizes the classification of the mass data of the Internet of Things. The following conclusions are drawn from the experiments:
(1) The average IoT mass data classification recall rate of the proposed method is as high as 97.5%, which has a good IoT mass data classification effect.
(2) The root mean square error value of the proposed method for the classification of mass IoT data is only 0.030, which can effectively improve the classification accuracy of mass IoT data.
(3) The classification time of the IoT mass data of the proposed method is only 12.3 ms, which can shorten the IoT mass data classification time.
Footnotes
Acknowledgements
This paper is supported by Soft science Research Project of Henan Province ‘Research on the implementation of ‘three evaluations’ reform in Henan Province under the background of scientific and technological system reform’ (Grant no. 222400410238).
