Abstract
In order to overcome the problems of the traditional power big data mining methods, such as the low integrity of data mining and the long time-consuming of data mining, this paper realizes multi-dimensional power big data mining by improving the grey clustering algorithm. Firstly, a relay multi hop network is established to collect power big data through the collector; Secondly, Lagrange interpolation method is used to fill the missing data of power data mining; Standardized processing of power consumption data; Finally, according to the grey theory and FCM clustering algorithm, the multi-dimensional power big data mining is realized. The experimental results show that the integrity of power big data mining in this method is up to 0.996, the mining time is not more than 3.05 s, and the mining integrity is up to 0.992, which indicates that this method can effectively improve the effect of power big data mining.
Keywords
Introduction
With the deepening of intelligence into the construction of power grid, four types of data center platforms have been successfully built: massive historical and quasi real-time, unstructured, structured, and grid space; The platform that has accumulated rich big data resources can provide effective support for data analysis and decision-making system construction. Multi dimensional power big data mining is to mine the required power data from the central platform, and use the mined data as reference data to improve the stability of power grid operation [8]. Traditional data management methods only analyze the acquired data simply, and can not help power enterprises obtain valuable and meaningful information from these data, resulting in a great waste of power grid resources [1,14]. Therefore, it is urgent to use data mining technology to deeply mine and analyze these huge power data, and it is extremely important and meaningful to obtain the power consumption information that is instructive and helpful to the operation and management of power enterprises.
Reference [6] proposes a power big data mining method based on GRU-MMD. The feature difference of power data is obtained through the maximum mean difference method. The GRU-MMD method is used to realize power data mining. This method can improve the integrity of data mining, but the efficiency of data mining is poor. Reference [15] proposes a power data mining algorithm based on clustering analysis, which obtains the attribute characteristics of power data through data clustering method, determines the difference of the attribute characteristics of power data according to the dissimilarity matrix, divides the attribute characteristics of power data through analytic hierarchy process, calculates the confidence of the attribute characteristics of power data, and obtains the final power data mining result. This method can effectively improve the integrity of data mining, But the mining algorithm is complex and the mining process is long. Reference [4] proposes an intelligent mining method of power big data based on web crawler technology. It uses web crawler technology to crawl power grid big data, obtains feature clusters of power grid big data according to web crawler clustering, realizes power grid big data sorting according to decision tree method, and realizes mining data transmission control according to mining permission matching. This method can effectively improve the integrity of data mining, but the efficiency of power big data mining is low.
In view of the above problems, this paper proposes a multi-dimensional power big data mining method based on improved grey clustering algorithm. The specific research ideas of this paper are as follows:
First, power big data acquisition. Establish a relay multi hop network, collect the power data read by the meter node through the collector, and send it to the server through 4G/5G, Ethernet and other remote communication networks.
Secondly, multi-dimensional power big data feature processing. Obtain the location of power missing data, fill the multi-dimensional power consumption data missing according to Lagrange interpolation method, normalize the interpolated power consumption data through linear transformation, and complete the multi-dimensional power big data feature missing value processing.
Then, multi-dimensional power big data mining. According to FCM clustering algorithm mining multi-dimensional power big data features, build multi-dimensional power big data mining function, calculate relevant parameters according to the least square method, combined with gray theory to improve the results of multi-dimensional power big data mining, and complete multi-dimensional power big data mining.
Finally, the effect of multi-dimensional power big data mining is verified by the indicators of data mining integrity, data mining time and completeness, and a conclusion is drawn.
Multi dimensional power big data processing
Multi dimensional power big data collection
As the central link of power data collection, the collector node not only needs to establish a relay multi hop network to communicate with all the meter nodes in its network to obtain the power data they read, but also needs to communicate with the server through 4G/5G and other remote communication networks to obtain the instructions issued by the server or forward the power data to the server [7]. Its hardware structure is shown in Fig. 1.

Hardware block diagram of collector node.
Transmit the data obtained from Lora communication network to the collector base table for storage and processing. MCU module also selects STM32F103 series chip as the main control chip to support and control the data processing, sending, receiving and other functions of the whole Lora module. The RF module uses SX1278 chip, which mainly supports the sending and receiving function of Lora wireless communication. It is the core part of Lora for wireless packet sending and receiving.
The main process of collector work is: with the support of RF module and MCU module, establish relay multi hop network, collect the power data read by all meter nodes in the network, process and save the data, and send it to the server through 4G/5G, Ethernet and other remote communication networks [5]. It can be seen that the collector is somewhat different from the meter node in terms of workflow. The collector does not only need to communicate with the meter node under it through the Lora communication network. The uplink communication module is pluggable. The shell has communication module components, which can be 4G, WiFi, etc. At the same time, only one can be used. Considering that the working voltage of the module may be different, there is a special power chip on the circuit to supply power to the module, and the level can be adjusted.
In the process of extracting user electricity data from the power user electricity information collection system, there are missing values in the collected data due to human or natural factors. If these missing data are deleted directly, it will lead to incomplete power consumption information of users. This will directly affect the abnormal detection model to judge the user’s power consumption behavior, and even lead to missed judgment, misjudgment and so on. Therefore, it is necessary to deal with the missing data in the collected data [10]. Due to the compact structure of Lagrange interpolation method, it is convenient to calculate. Therefore, this paper uses Lagrange interpolation method to interpolate and fill the missing user electricity data. The specific steps are as follows:
Firstly, the missing data location of the collected multi-dimensional power big data is extracted, and the five data before and after the location are divided to obtain a multi-dimensional power big data group with 10 data. The missing data of power big data is interpolated and filled according to the Lagrange interpolation method. The specific formula is shown in formula (1) and formula (2):
Among them, x represents the serial number of missing load data of multi-dimensional power data,
There are usually different dimensions and orders of magnitude for different power consumption indicators. When the level difference between indicators is great, if the original data is directly used for power consumption anomaly detection and analysis [9,11,16].
There are three methods for data standardization: Z-score standardization, log function standardization, and max min standardization. In this paper, the minimum maximum standardization method is used to normalize the collected power consumption data, that is, the power consumption data after missing data interpolation is linearly transformed, so that the processed power consumption data is between 0 and 1. The specific calculation formula is as follows:
Among them,
In order to obtain a large number of multi-dimensional power big data, the collection, storage and transmission of power data are completed through data collectors and communication networks. In order to improve the mining accuracy of multi-dimensional power big data, the Lagrange interpolation method is used to fill the missing power data, and the multi-dimensional power data feature processing is completed through linear changes.
Multi dimensional power big data improved grey clustering mining
Multi dimensional power big data clustering mining
FCM clustering algorithm is a flexible fuzzy division method. Through hard division of C-means (HCM), multi-dimensional power big data clustering mining objective function is:
Where,
Membership
Calculation formula of cluster center
The steps of FCM clustering algorithm are:
input the multi-dimensional power big data clustering number c and the iteration termination condition; initialize cluster center calculate membership matrix if
Grey theory improvement of multi-dimensional power big data mining
Grey system refers to a system with grey nature, that is, the indicators of system data can not fully reflect the data information or have uncertainty. By combining grey theory, FCM clustering algorithm is improved to effectively mine power data [13]. The improvement process of power data mining function is as follows:
The power equipment collects
In the above formula,
The original form of
Where:
Build an approximate differential equation to solve the original form of multi-dimensional power big data gray mining function:
In the above formula, a represents the development coefficient of grey mining function [12], that is, the sequence development trend of multi-dimensional power big data. b represents the grey action of the grey mining function, that is, the relationship between multi-dimensional power big data changes; t represents the time utilization factor
According to equations (11) and (12), the time response function
The time response sequence of
At this time, the final mining result
The solution of equation (15) is the final mining result of multi-dimensional power big data. Therefore, the multi-dimensional power big data mining research based on improved grey clustering algorithm is realized.
Experiment
The above process completes the theoretical research of multi-dimensional power big data mining. In order to verify the practical application performance of this method, the performance test of the method is carried out.
Experimental data
In order to meet the experimental requirements of multi-dimensional power big data mining, the multi-dimensional power big data used in this experiment comes from the publicly available data of the city’s power grid, including power generation data, power consumption data, power loss data, electricity price data and many other power big data. data, the overall data volume reaches 250 GB.
This section is based on the MATLAB platform to verify the feasibility and effectiveness of the multi-dimensional power big data mining method described above. For the multi-dimensional power big data mining problem, the Yamip toolbox is used to solve the multi-dimensional power big data mining. This paper improves the gray clustering algorithm under the Tensorflower2.0 framework in the Python3.7 environment.
Experimental scheme
In order to fully verify the mining performance of the proposed mining method, the experimental scheme is strictly set before the experiment starts. Comparing the indicators, the method in this paper is compared and verified with the methods in reference [6] and reference [15].
Experimental result
Integrity of power big data mining
In order to verify the mining effect of this method on power big data, GRU-MMD method (reference [6] method), cluster analysis method and this method are used to obtain the integrity of power big data mining. The results are shown in Fig. 2.

Integrity of power big data mining.
According to the results in Fig. 2, when the amount of power data to be mined is 50 GB, the integrity of power big data mining of reference [6] method is 0.265, the integrity of power big data mining of reference [15] method is 0.200, and the integrity of power big data mining of this method is 0.950; When the amount of power data to be mined is 100 GB, the integrity of power big data mining of reference [6] method is 0.432, the integrity of power big data mining of reference [15] method is 0.351, and the integrity of power big data mining of this method is 0.953; When the amount of power data to be mined is 300gb, the integrity of power big data mining of reference [6] method is 0.682, the integrity of power big data mining of reference [15] method is 0.478, and the integrity of power big data mining of this method is 0.982; This method can effectively improve the integrity of power big data mining and the effect of power big data mining.
In order to verify the mining efficiency of this method for power big data, the reference [6] method, reference [15] method and this method are used to obtain the time-consuming of power big data mining. The results are shown in Table 1.
Time consumption of power big data mining with different methods (s)
Time consumption of power big data mining with different methods (s)
According to Table 1, when the amount of power data to be mined is 50 GB, the power big data mining time of reference [6] method is 16.65 s, the power big data mining time of reference [15] method is 8.92 s, and the power big data mining time of this method is 0.62 s; When the amount of power data to be mined increases to 200 GB, the power big data mining time of reference [6] method is 88.25 s, the power big data mining time of reference [15] method is 69.25 s, and the power big data mining time of this method is 2.32 s; When the amount of power data to be mined is 250 GB, the power big data mining time of reference [6] method is 129.92 s, the power big data mining time of reference [15] method is 99.63 s, and the power big data mining time of this method is 3.05 s; This method always has a shorter time-consuming of power big data mining with different methods, which shows that the efficiency of power big data mining has been effectively improved after the improvement of grey clustering algorithm.
Compare the completeness of power big data mining of this method, reference [6] method and reference [15] method, and the comparison results are shown in Table 2.
Comparison of completeness of power big data mining with different methods
Comparison of completeness of power big data mining with different methods
By analyzing the results shown in Table 2, we can see that the power big data mining completeness of reference [6] method is between 0.701–0.826, that of reference [15] method is between 0.688–0.806, and that of this method is between 0.969–0.992. Compared with the two traditional evaluation methods, the result of this method is closer to 1, which shows that its power big data mining integrity has stronger ability to analyze the collected power big data. This is because this method uses Lagrange interpolation to interpolate and fill the missing power consumption data, which effectively avoids the problem of missing information in power big data mining and improves the completeness of power big data mining.
This paper presents a multi-dimensional power big data mining method based on improved grey clustering algorithm. Collect power big data through the collector, fill in the missing power consumption data for interpolation, mine the characteristics of multi-dimensional power big data according to FCM clustering algorithm, construct the objective function of power big data mining, and obtain the final mining results of multi-dimensional power big data according to the improved grey clustering algorithm. The following conclusions are drawn through experiments:
The integrity of power big data mining of this method is up to 0.996, which shows that this method can effectively improve the effect of power big data mining.
The power big data mining time of this method is always less than 3.05 s, which shows that the efficiency of power big data mining has been effectively improved after the improved grey clustering algorithm.
The completeness of power big data mining in this method can reach 0.992, which shows that this method effectively improves and avoids the lack of information in power big data mining, and improves the effect of power big data mining.
