Application of cloud computing in power security monitoring

Abstract

The purpose of this study is to analyze the application of cloud computing in power security monitoring. In this study, firstly, cloud computing technology, open source Hadoop platform and HDFS distributed file system are introduced. Secondly, based on the Gap Statistic Algorithm (GSA) for adverse data identification in power system, an optimization scheme of GSA based on cloud computing is designed. Finally, the implementation process based on cloud computing, map process and reduce process of GSA algorithm are introduced. The results show that the application of cloud computing to power system can improve the existing power grid data computing and storage capacity, and improve the status quo of state estimation of power system. Therefore, it is of practical engineering significance to study the adverse data identification algorithm of power system based on cloud computing, which conforms to the development demand of power system informatization.

Keywords

Cloud computing power system GSA Hadoop platform HDFS

1. Introduction

Power equipment in operation is divided into normal state, abnormal state and failure state [1]. State monitoring of power equipment refers to the use of sensor technology and computer technology to monitor the running state of equipment in real time. It judges the state of the equipment through the collection of operating state parameters of the equipment and timely diagnose and maintain the equipment before the abnormal state or failure occurs, so as to ensure the safe and stable operation of the power grid [2]. State monitoring of power equipment is mainly divided into online monitoring and offline monitoring [3, 4]. Online monitoring is a technology that uses monitoring equipment to record the running state of the equipment in real time. The online data collected have obvious dynamic time sequence characteristics, mainly including pulsed quantities such as temperature, voltage, and current [5, 6]. Offline monitoring is an important supplement to online monitoring. Offline data are stored in the database of power company in a static way and remain unchanged in the process of equipment status monitoring, mainly including configuration data during substation installation, specific information of equipment, fault diagnosis and maintenance records of equipment, etc. [7, 8].

Cloud computing is a technology based on Internet technology that virtualizes resources to provide dynamically scalable storage and computing services. It is the result of the fusion development of virtualization technology, cluster technology, distributed computing and parallel computing [9, 10, 11]. Cloud computing virtualizes distributed computing resources and storage resources, and provides users with super-large data processing and storage services through unified management and scheduling [12, 13]. Cloud computing is the development and extension of distributed computing, parallel computing and grid computing, and its rapid development provides new solutions for data sharing and processing in power system [14, 15]. The introduction of cloud computing technology into the power system can fully integrate the data resources and computing resources distributed in different places in the system under the condition of unchanged infrastructure, so that they can work together and greatly improve the data analysis ability of power grid, which has important research value for information integration in the smart power grid environment.

2. Methodology

2.1 Cloud computing technology

Cloud computing is an emerging business computing model. It distributes the computing tasks on the resource pool composed of a large number of computers, so that various application systems can obtain computing power, storage space and various software services as required. Because the uncertainty of its computing position is similar to the motion of electrons, this resource pool is called “cloud”. Cloud computing centralizes all computing resources to achieve automatic management, and its operation details are shielded from users. This enables application providers to focus more on their own business, which is conducive to innovation and cost reduction. Cloud computing can be roughly divided into three types of services: infrastructure as a service (laaS), platform as a service (PaaS) and software as a service (SaaS), as shown in Fig. 1.

Figure 1.

Service type of cloud computing.

2.2 Open source Hadoop platform

Hadoop is a distributed computing platform developed by Apache foundation based on Google cloud computing platform. It is a distributed system focusing on the storage and processing of massive data. The Hadoop framework is shown in Fig. 2, and its core design is MapReduce and Hadoop Distributed File System (HDFS). HDFS provides the underlying support for distributed computing and storage.

Figure 2.

Technical architecture of Hadoop.

2.3 HDFS distributed file system

HDFS is an open source implementation of the Google file system (GFS), a file system that reliably stores large data sets on large clusters. At the same time, the HDFS system is very fault tolerant and can be deployed on cheap machines to realize large-scale data set applications, which can provide high throughput data access, reduce maintenance costs and improve data fault tolerance ability. HDFS adopts a master/slave architecture. The three main components of HDFS are: NameNode, DataNode, and Client.

An HDFS file system consists of a Namenode and multiple Datanodes. The Namenode is a scheduling hub that manages the namespace of the file system, cluster configuration information, replication of storage blocks, and client access to files. HDFS essentially divides a file into one or more data blocks that are stored on a set of datanodes. The Namenode performs the operations of opening, closing, and renaming a file or directory of the file system’s namespace. The Namenode is also responsible for determining the mapping of a data block to a specific Datanode node.

The Datanode is responsible for handling read and write requests from file system clients. It creates, deletes, and copies data blocks under the unified scheduling of the Namenode. Client is an application that obtains files on HDFS. HDFS is developed in the Java language, so any Java-enabled machine can deploy Namenode or Datanode. Its Namenode and Datanode are designed as nodes that can run on normal commercial machines. And a single Namenode structure simplifies the system architecture.

3. Results and discussion

3.1 GSA optimization scheme based on cloud computing

GSA is an algorithm used to determine the optimal number $k$ of clustering. It compares the clustering dispersion value of the data set to be tested with the expectation of the clustering dispersion value of the reference data set. The cloud-based strategy of GSA algorithm is shown in Fig. 3.

Figure 3.

Cloud computing implementation idea of GSA.

Referring to Fig. 1, the parallel feature of the maximum and minimum distance method is utilized to design and optimize the cloud-based GSA scheme as follows.

Firstly, the data to be detected is divided once, the data block is represented as {split ${}_{\text{i}}$ }, and $i$ is the number of data blocks.

Secondly, the GSA algorithm is executed on the i compute nodes in accordance with the $k$ value ( $k=$ 1, 2, …, n). When $k\leqslant$ 2, the reference data file is called and the sum of squares of Euclidean distances in the reference data set at $k=$ 1 and 2 is calculated to obtain E (lnW). When $k>$ 2, the reference data file is no longer called.

Thirdly, when the compute node executes the GSA algorithm to select the initial cluster center: the first cluster center Z is randomly selected and stored; the data is divided into blocks for the second time, expressed as {split ${}_{\text{ij}}$ }, which is assigned to $j$ nodes for comparison, and each node outputs the minimum distance between the data object and the cluster center. The results of $j$ nodes are merged, and the data object with the maximum output distance is taken as the second clustering center Z ${}_{2}$ and stored in the clustering center file together with Z ${}_{1}$ ; if $k>$ 2, the comparison is continued on the node, and the maximum value of the minimum distance between each data object and the cluster center is output. The results of $j$ nodes are merged, and the data object corresponding to the maximum value of $j$ results is selected as the next clustering center and the results are stored, otherwise the end is achieved; the previous step is repeated until the number of cluster center is $k$ . The initial cluster center is stored for iterative process calls; when the computing node executes the GSA algorithm to the iterative part and $k\geqslant$ 2 (if $k=$ l, there is no iterative process), the data corresponding to $i$ nodes are divided into blocks for the second time, the number of blocks is $j$ , and the data block is expressed as {split ${}_{\text{ij}}$ }. {split ${}_{\text{ij}}$ } is allocated to $j$ computing nodes to perform classification algorithm. After each iteration, the clustering center is recalculated based on the classification results. The classification results and the new clustering center are stored on the distributed file system for use in the next iteration until the clustering center is unchanged; the clustering result of {split ${}_{\text{ij}}$ } in the previous step is returned to the ith node to calculate the lnWk of the data to be tested under $k$ value. On $i$ nodes, when $k=$ 1, it is judged whether gap(1) $\geqslant$ gap(2) – s(2) holds. If it is true, the optimal number of clustering is 1, there is no adverse data, and the result is output. If not, $k=k+1$ , the third, fourth, and fifth steps are executed cyclically until the minimum $k$ value satisfies $\theta(k)<\theta(k+1)$ , the loop is terminated, and the clustering result is output. The output results of $i$ nodes are combined, and the output format is $<$ cluster number, data object attribute value $>$ . The cluster number is denoted as 0 and 1 according to normal data and adverse data.

Figure 4.

Flow chart of the map function.

Figure 5.

Flow chart of the Reduce function.

3.2 Implementation process of cloud computing algorithm of GSA

Firstly, divide the data. For a one-dimensional data set, the data file is split into multiple splits blocks, and then each splits block is split into key-value pairs $<$ k1, v1 $>$ . K1 is the number of each data (denoted as num), and v1 is the data value (denoted as val). For a cube, it can be split into multiple files before partitioning, each attribute value corresponds to a file, and then each data file can be similarly partitioned.

Secondly, a compute node is assigned to each split, and a Map function is executed on the corresponding node. The Map function inputs a key-value pair $<$ k1, v1 $>$ , and outputs a key-value pair $<$ k2, list (num, val) $>$ . K2 is the cluster number (Cid), list (num, val) is the set of corresponding data values (val), and Cid $=$ 0, 1.

Thirdly, the Reduce function is executed, and the result set of the Map process is combined by the k2 value and the result is output.

3.3 Map process

The Map process is the main part of the implementation of the GSA cloud computing model. The role of this part is to apply the GSA algorithm to the data set $<$ num, val $>$ , determine the optimal number of clustering and output the clustering results. The algorithm flow chart of the Map function is as shown in Fig. 4.

3.4 Reduce process

When executing the Reduce process, the nodes assigned to perform the Reduce task are processed according to the record of the same key value from the Map result. The default HashPartitioner class in the MapReduce framework can be used to send key-value pairs of the same key (Cid) value to the corresponding Reduce node for processing. The Reduce task flow of GSA cloud computing model is as shown in Fig. 5.

4. Conclusion

In view of the similarity between cloud computing and power system in operation mechanism, cloud computing can be applied to the security monitoring of power system to improve the power grid’s information processing and data storage capacity. In this research, based on the Hadoop cloud computing platform and the MapReduce software framework, the implementation process of the existing cloud computing technology and the open source Hadoop platform computing model are studied. Combined with the adverse identification algorithm of GSA power system, a cloud computing method for adverse data identification in power system is propose. In this method, in order to avoid the influence of random selection of reference distribution and initial clustering value on the accuracy of GSA algorithm, the maximum and minimum distance methods are used to optimize the GSA algorithm. The research results provide a basis for the practical application of cloud computing in power system and also prove the feasibility of the application of cloud computing technology in power system. However, this study has not completely solved the dependence of GSA algorithm on the reference distribution, and it still needs to be calculated when judging whether there is adverse data. On the research of the new GSA algorithm, there is no real large database environment to verify its performance in the environment of large data sets. Therefore, in future studies, the dependence of GSA algorithm on the reference distribution should be analyzed, and the performance of the optimized GSA algorithm in the big data set environment should be verified.

References

Silva

L.V.

Barbosa

Marinho

and Brito

, Security and privacy aware data aggregation on cloud computing, Journal of Internet Services & Applications9(1) (2018), 6.

Hairong

Yahui

Xiaochen

and Yongbo

, Research on monitoring and early-warning system of marine organisms for the intake of nuclear power plants, Animal Husbandry and Feed Science10(4) (2018), 26–30.

Lee

Park

Kim

and Jin

, De-identification of metering data for smart grid personal security in intelligent cctv-based p2p cloud computing environment, Peer-to-Peer Networking and Applications11(1) (2018), 1–10.

Reddy

K.S.

and Balaraju

, Comparative study on trustee of third party auditor to provide integrity and security in cloud computing, Materials Today Proceedings5(1) (2018), 557–564.

Jaatun

M.G.

Lambrinoudakis

and Rong

, Special issue on security in cloud computing, Journal of Network & Computer Applications1(1) (2018), 1–2.

Kamil

S.N.S.

and Thomas

, Investigating the cost of transfer delay on the performance of security in cloud computing, Electronic Notes in Theoretical Computer Science337 (2018), 105–117.

Hussin

Salleh

N.A.

Suhaimi

M.A.

Rahman

M.M.

and Ali

A.M.

, A model to assess the impacts of cloud computing use on sme performance: a resource-based view, Advanced Science Letters24(3) (2018), 1800–1804.

Venkatesan

Karthigaikumar

and Satheeskumaran

, Mobile cloud computing for ecg telemonitoring and real-time coronary heart disease risk detection, Biomedical Signal Processing & Control44 (2018), 138–145.

Alamer

Yong

Wei

and Lin

, Collaborative security in vehicular cloud computing: a game theoretic view, IEEE Network32(3) (2018), 72–77.

10.

Rahimi

M.R.

Venkatasubramanian

Mehrotra

and Vasilakos

A.V.

, On optimal and fair service allocation in mobile cloud computing, IEEE Transactions on Cloud Computing6(3) (2018), 815–828.

11.

Vostokin

Artamonov

and Tsarev

, Templet web: the use of volunteer computing approach in paas-style cloud, Open Engineering8(1) (2018), 50–56.

12.

Zhang

Chang

and Townend

, Guest editor’s introduction: special section on virtualization and services for cloud-based application systems, IEEE Transactions on Services Computing12(1) (2019), 88–90.

13.

Guo

Liu

Yang

Xiao

and Li

, Energy-efficient dynamic computation offloading and cooperative task scheduling in mobile cloud computing, IEEE Transactions on Mobile Computing18(2) (2019), 319–333.

14.

Misic

and Mahmoud

, Security and privacy of connected vehicular cloud computing, IEEE Network32(3) (2018), 4–6.

15.

B.W.

J.H.

Khan

Kim

J.H.

and Yun

S.L

, Development of a cloud computing-based pier type port structure stability evaluation platform using fiber bragg grating sensors, Sensors18(6) (2018), 1681.