Abstract
Smart grid is proposed as a solution to the problems of production, distribution, monitoring, and control of the electricity in traditional power grids. Smart grid networks place IoT sensor nodes at various grid lines and collect large volume of data about power flow, usage etc. The collected data are analyzed for various applications like demand forecasting, fault diagnosis and fault prediction etc. The sensor nodes and the communication links can be compromised affecting the privacy of consumers. False data can be propagated with malicious intentions. This work proposes a secure and privacy preserving framework for smart grid IoT networks to secure the data and decision at sensor nodes and communication links. The work proposes a novel Data and Decision rules Secure Efficient Smart Grid (DDSESG) framework integrating secure compressive sensing technique with blockchain and interplanetary file system (IPFS) for securing both data and decision. Through experimental analysis, the proposed solution is found to provide higher resiliency against data security attacks at comparative 12.4% lower computation cost, 15% lower communication cost, 19.9% lower storage cost. Forecasting on transformed data in proposed solution had only a marginal 1.08 % difference in accuracy compared to forecasting on original data.
Introduction
The power grid network transmits the alternate current electrical power from generation center to the distribution center [1]. Over this transmission, there is lot of wastage of energy. This wastage are caused mainly due to in effective routing without knowledge of load and power generation capacities. In addition, factors like faults, electrical thefts etc make the traditional power grid network inefficient [2]. Due to this ineffectiveness, the traditional power grids are not able to meet the growing energy demands and consumers experiences frequency power outrages. Smart grid provides a promising solution to these problems with use of information and communication technologies [3].
Smart grid is poised to become the largest application of IoT. It is predicted that all most all electrical devices will be connected to internet via IoT [4]. Smart grid is way and almost all electrical lines in most countries are shifting towards this transition. IoT is enabler technology for smart grid network. Even though smart grids have various advantages, it is insecure against attacks. Attackers can exploit the loop holes in IoT integrated Smart grid and disrupt the entire distribution of electricity and even affect the billing cycles causing huge financial losses [5].
IoT communication is over public networks and the sensor communication towards sink is also through open wireless channels. Open communication channels have security threats and this makes smart grid network insecure [6]. The data at the sensor nodes and the decision rules can be compromised for various malicious intensions. Thus it is necessary to protect privacy and security for data/ decision rules in smart grid networks. Many works have be proposed on cryptography and authentication for securing the smart grid communication. These approaches have higher communication overhead and security compromises at sensor nodes. The key management complexity is higher in these approaches as the smart grid networks scale up. The decision logic at sensor nodes can be compromised with malicious intention.
This work proposes a Blockchain and IPFS assisted secure privacy preserving framework for smart grid networks addressing the problems of high communication overhead, security compromise on data and decision rules at sensor nodes, key management complexity in existing works. Though securing sensor data with additional requirements of energy optimization were considered in the works [27–30], these works lacked consideration of key management complexity and securing decision rules. Motivated by this observation, a novel secure compressive sensing assisted data transfer scheme is proposed for secure data exchange between the sensor nodes to central station. The decision rules in sensor nodes are privacy preserved and executed on release of keys from IPFS after block chain based authentication. Key management for sensor nodes is simplified with smart contract solution at block chain. Following are the novel contributions of this work. A novel secure compressive sensing data exchange scheme is proposed for transferring the data from sensor node to sink at comparatively lower communication overhead. Effective key management scheme assisted by blockchain with IPFS which reduces the key management complexity and able to scale well for large networks like smart grid networks. A novel blockchain and IPFS assisted decision making is facilitated at sensor nodes which is secure against decision rule compromise at sensor nodes and resilient against false decision rule injection by the attackers. Proof of concept of the proposed secure and privacy preserving framework for fault diagnosis application in smart grid networks is implemented and the results are compared.
The rest of the paper is organized as follows. Section II presents the survey on existing solutions for privacy preservation and security in smart grid networks. The proposed blockchain and IPFS assisted secure privacy preserving framework and the proof of concept of proposed solution for the case of fault diagnosis in smart grid networks is detailed in Section III. The results of the proposed solution and comparison to existing works are presented in Section IV. Section V presents the concluding remarks and future scope of work.
Related work
Ali et al. [7] combined homomorphic Paillier encryption (PE), Chinese remainder theorem, and one-way hash function to ensure privacy in smart grid networks. Every IoT device node is associated with a key and the one way hash functions to update the key. IoT device node encrypts the data using AES encryption and forwards the data to sink. At sink, the sensor node is first authenticated and the data is decrypted using AES. The sink node aggregates data, encrypts it using PE before passing to central base station for analysis. The computation complexity and network overhead is higher in this approach. Guan et al. [8] proposed a flexible privacy preserving aggregation scheme for smart grid. The aggregation scheme is dynamic and it adjusted based on energy consumption information. Homomorphic encryption is used to encrypt the aggregated information. Aggregation scheme proposed in this approach is not suitable for large volume of continuous information. Abdallah et al. [9] proposed a lightweight lattice-based homomorphic cryptosystem for ensuring privacy in smart grid. The scheme is not suitable for large volume of continuous information. Akila et al. [10] used tree topology along with homomorphic encryption based aggregation to secure the data in smart grid networks. Due to multiple levels of encryption, the computational complexity is high and scheme is not suitable for large volumes of continuous information. Liu et al. [11] proposed a aggregation scheme for privacy preservation of energy consumption data from user devices. The user devices select the area of aggregation in a distributed manner without need for a trusted third party. The aggregated data is further encrypted using Lifted EC-ElGamal encryption algorithm. The communication overhead is higher in this approach. Kong et al. [12] used RSA for security and privacy of data from smart meter to collection center in smart grid. Authentication of smart meters is done using blind signature. The security scheme is complex and not suitable for resource constrained IoT devices. Wagh et al. [13] addressed the problem of single point of failure of data aggregation based privacy preservation schemes in Smart grid networks. Privacy preserving framework proposed in this work used Shamir secret sharing scheme. The data is split to shares and propagated in multiple paths. Shamir share provide data confidentiality. But the communication overhead is high in this approach. Moghadam et al. [14] proposed a block chain based privacy preservation scheme for smart grid networks. The technique is for securing the demand response data communicated between smart meter and service provider. The solution involves frequent interaction with blockchain and not suitable for continuous communication needed for applications like power fluctuation monitoring. Singh et al. [15] proposed a block chain based privacy preservation framework for smart grids. The data from smart meters are encrypted with symmetric homomorphic encryption keys. The encrypted data is sent to aggregation center where aggregation is done and aggregated data is uploaded to blockchain as transactions. Due to large volume of transactions over blockchain, the solution creates huge storage overhead over blockchain. Chen et al. [16] combined blockchain with group signature for privacy preservation of data in smart grids. Group signature is distributed to smart meter nodes periodically. Smart meters encrypt the data and send to control center. Control center uploads the data to blockchain. The complexity is higher for large volume of continuous sensor data. Sikeridis et al. [17] proposed a block chain based distributed architecture for secure data exchange in smart grids. Authors used tired blockchain to address the low latency when smart grid network is scaled. The data from sensor node are encrypted with sink’s public key. Digital signature for the message is found using secure hash algorithm (SHA). Encrypted message along with signature is sent to sink via multi hop routing. At sink the signature is verified, and encrypted messages are added to transaction block. Transactions are added to block chain. The communication overhead on smart grid network and storage overhead on blockchain is very high due to use of raw data without compression.
From the survey, most solutions proposed for secure data exchange did not consider the high volume of power quality data that needs to be transferred from sensor node to the collection center. The existing works though considered security; they were not scalable for large networks. The existing works used cryptographic algorithms to secure the data, but they increased the data volume. This large volume of data creates higher communication overhead and also storage overhead at clouds. Key management complexity is also higher in these approaches. The compromise of key stores makes the security of data at higher risk. Many works used homomorphic techniques to make the transformed data suitable for data mining operations, but they were not secure against data inference and analysis attacks. Most solutions stored the data in blockchain adding to the block chain retrieval complexity.
Proposed solution
The proposed DDSESG framework addresses the problem of higher communication overhead, storage overhead and blockchain complexity using blockchain and IPFS storage system. The network layout for the solution is given in Fig. 1.

Network layout.
Sensor nodes are attached to each of the substations. Sensors have dual interface of 802.11 for wireless multi hop communication to sink and LTE internet interface for connecting to blockchain. The sensors keep maximal load on the wireless multi hop communication and only minimal and very sensitive traffic over LTE interface. The sensor nodes collect power quality data frequency at a configured sampling rate and send the data via wireless multihop interface to the sink node using secure compressive sensing coding. Sink node authenticates and verifies the integrity of the packet, reconstructs the compressive sensing coded packet and uploads it to IPFS for any data analytics. Data analytics process the data and mine knowledge using machine learning algorithms like support vector machine classifiers. The classifier output is transformed to decision rules. These decision rules are encrypted with public key of sensor node, stored in IPFS and notified to sensor node. Sensor node can download the rules from IPFS and decrypt it using its private key. It then applies the decision rules on the data and report failure over internet interface to the fault monitoring stations. Fault monitoring stations authenticate the sensor node and process the faults. In case of false events being reported by sensor node, fault monitoring stations consider sensor node as compromised and revoke its keys from blockchain.
Every sensor node creates a public (PUSID a ) and private key (PRSID a ) and executes the smart contract updateK to store the public key of RID a into blockchain. The updateK function using the public key information table (PKIT) for updating in smart contract is given below.
Input: PUSID a , SID a
PKI[i]:PK ← PUSID a
PKI[i].SID ← SID a
Tk ← generate random key
Add (SID a , Tk) to IPFS
ETk = E (Tk, PUSID a )
Return ETk
Sensor node keys the private key with itself. Execution of smart contract happens through the LTE internet interface. The smart contract generates a random key and stores in the IPFS hashed by sensor node id (SID a ). The Tk is encrypted with public key of the sensor node and the result is sent back to the sensor node. Sensor node can decrypt it to know the Tk.
Sensor node collects power quality data measurements at sampling rate and this data must be sent in secure way to sink over the wireless interface. This work proposed a secure compressive sensing scheme for secure exchange of data over the communication channel between source and the sink node. The process flow of secure compressive sensing at sensor node is given in Fig. 2.

Process flow at sensing node.
Compressive Sensing (CS) and transform coding (TC) are important components of secure compressive sensing.
CS reconstructs a continuous analog signal by sampling with few observations less than Nyquist rate. The deficiency in sampling with few observations is compensated by effective reconstruction techniques. The advantage in under sampling is that information is represented with reduced number of bits and when they are sent across network, it reduces the communication overhead. It also helps to achieve higher data rate. CS uses techniques which are non adaptive and linear projective for sampling. Reconstruction is done using regression based optimization techniques. CS achieves maximal performance gain only when the signal to be sampled in sparse.
Sparsity in signal is directly proportional to the compression effectiveness. Existing works on compressive sensing has shown that CS achieves maximal performance gain in terms reduced reconstruction loss when the random measurements on the signal x (which is already sparse) or spare representation φ (when the original signal is not spare) are done from Gaussian distribution.
When the original signal x is not spare it can be converted to spare by transforming it to Fourier domain using discrete Fourier transform and thresholding the coefficients based on average value of the coefficients. Over the spare representation, sampling or random measurements are done which is given as
Where φ is the measurement matrix of size M × N where M is the number of measurements for signal length of N. ∅ is the spare representation of the signal x. The values of measurement matrix are usually Gaussian random values. This matrix is known at both transmitter and receiver end. The same matrix is needed at receiver end to de-transform the original signal after reconstruction of spare coefficients using regression based optimization methods. In absence of the measurement matrix at receiver end, it is not possible to retrieve the original signal. This work uses this property of measurement matrix to provide security to the data.
In this work, the measurement matrix is generated based on a seed secret key and known only between sensor node and sink. The seed secret key is generated and issued to sensor node on execution of smart contract (updateK).
Gaussian random values based measurement matrix are denser and has higher computational and storage overhead. Multiplication of sparse data representation with measurement matrix would be computational efficient, if the measurement matrix is a binary matrix. ArunSankar et al. [18] proposed a algorithm to generate binary measurement matrix for compressing sensing applications. The binary measurement matrix generated in [18] has less number of non zero values and so matrix multiplication operation with the measurement matrix has reduced complexity. But the problem with this method is it, the reconstruction error was higher and original data cannot be reconstructed accurately. To solve this problem, this work modified the measurement matrix generation algorithm proposed in [18] with two major changes. (i)The values were generated based on seed key as input, such that for a same seed key, the measurement matrix generated wase.ii) The measurement matrix generated was constrained in terms of minimum number of 1’s in a row, so that sampling does n loss significant values and as result reconstruction error is minimized. The algorithm is given as Algorithm2: measurement matrix generation.
The overall flow of secure compressive sensing in detailed in Fig. 3.

Secure compressive sensing flow.
The compressed data is reconstructed at sink end to get the original data. The reconstruction flow at receiver end is shown in Fig. 4.

Reconstruction flow of secure compressive sensing.
Reconstruction of the compressive sensed signal can be done using many optimization methods like Orthogonal matching pursuit [19], Compressive sampling matching pursuit [20], Fast Iterative Shrinkage-Thresholding algorithm [21], Iterative Hard Thresholding algorithms for compressive sensing [22], Iteratively Reweighted Least Square [23], Iterative Shrinkage-Thresholding algorithm and L1 Minimization algorithm. Reconstruction error and the reconstruction time is measured for sample data and the results are given in Figs. 4 and 5.

RMSE across solutions.
Among all the solution, reconstruction error in terms of RMSE is lower in L1 minimization compared to other reconstruction methods.
The time for reconstruction is lower in L1 minimization compared to other reconstruction techniques which is shown in Fig. 6. Due to lower reconstruction time and RMSE, L1 minimization algorithm is used in this work.

Reconstruction time.
The reconstruction process happens at sink. Sink provides the sensor node id to a smart contract FetchK to get the encrypted Tk value.
Input: SID a
Tk = get for (SID a ,) from IPFS
ETk = E (Tk, PUSID sink )
Return ETk
The ETk returned by

Reconstruction process at sink.
Data analytics applications fetch data from IPFS and execute machine learning algorithms to classify various classes. A sample fault diagnosis application over power grids based on sensor measurements is illustrated below.
The sensor nodes observes the current signals at a sampling rate and sends the current signals via multi hop transmission to the centralized station. Sending the current signals continuously consumes higher bandwidth and it can create congestion in the network. To reduce it and send in secure manner, secure compressive sensing proposed in this work is applied on the current signals. The resulting CS signal is sent via multi hop forwarding to sink. At sink reconstruction process is done to get back the original current signal.
The decision of fault or not fault is made at the data analytics module. Data analytics module, does the spectrogram plot of the data in each window interval. The spectrogram plot shows the difference between the signals during impeding faults (Fig. 8) and no fault scenario (Fig. 9). The spectrogram is got by executing the continuous wavelet transform on the current signal.

Current signal spectrogram during impeding fault.

Current signal spectrogram during no fault.
The continuous wavelet transform for a mother wavelet φ (t) with scale a and translation b is given as
Clearly there is large differnce in images between impedding fault and no fault. The image is split to equal size grids of size 5*5. The average intensity of pixels in each grid is found. A feature vector of dimension 25 (5*5) elements with each element being the average intenisty of pixels in the grid is formed.
A support vector machine classifier is constcuted taking the feature vector of dimension 25 as input and to provide the output class: faulty or not faulty.
From the trianing dataset of faulty and non faulty scenarios, the SVM classifier is constructed. An SVM classifer constucts a hyperplane seperaitng the fault and non fault data points. The hyperplane is converted to decision rules using the Pedagogical rule extraction algorithm [24].
The decsion rules are encrypted with the public key of the sensor node and sent to sink node for rule diffusion to the sink node. The rules can be tampered with mailicous intention and to avoid it mututal authentication between sensor rule diffusion(SRD) module and sensor node is first intiatited and after successful authentication, rules are exchanged by encrypting with session key.
The flow of mutual authentication is as follows. Mutual autentication is intiated by the SRD module. It calculates two numbers (S1andS2) with ID of source node embeeded into S2.
In the above equations a are two random number. G is the prime number gerator, h1 is the hash function and P is the public key of sink. The hash function and prime number generator are preinstalled in SRD and sink.
Sink decipher the ID hidden in S2 as below
After deciphering ID, sink echo back the ID for verification at SRD by embedding tD into a number R2 and the random number b generated by SRD is echo back into a number R1. The numbers are calculated as
On receiving R1 and R2, SRD deciphers the ID and b and verifies it is same as the value it sent. If the value is same, the authentication of Sk to SRD is success. The authentication process is as follows
From S3, Sink authenticates SRD by validating the below relation
Once authenticated, the encrypted rules are sent to sink. Sensor node forwards the rules of each sensor node to it via multi hop forwarding. At sensor node, rules are decrypted using its private key and rules are applied on the current signal data to recognize faults. Once faults are recognized, they are reported by encrypting with public key of sink and forwarding to sink. Sink decrypts and uploads the faults information to IPFS. Fault information are analyzed from IPFS by the fault monitoring station (FMS). When the fault dignosed by the sensor deviates by large factor from real fault observations, FMS executes the smart contract to remove the sensor node certification from block chain.
Input: SID a
PKI[SID a ]:PK ← null
Since the attack node key is remvoed in the blocchain, sink verification will fail on reception of data. Sink drops the messages from sensor node whose verification failed. By this way malicious attackers sending fale messages in the network are barred.
The performance of the proposed DDSESG is tested varying the number of sampling rate. The performance parameters of computation cost, communication overhead and storage overhead are measured. The performance is compared against block chain based homomorphic scheme proposed by Singh et al. [15] and block chain based secure data exchange scheme proposed by Sikeridis et al. [17] and block chain based data privacy preservation framework by Liang et al. [25].
The computation cost is measured for various sampling rate in terms of time taken for data transformation (milli seconds) and the result is given in Table 1.
Computation cost
Computation cost
The average computation cost in proposed solution (Fig. 10) is 12.4% lower compared to Singh et al, 39.8% lower compared to Sikeridis et al. and 31.8% lower compared to Liang et al. The computation cost is marginally lower in proposed solution due to u of compressive sensing based data transformation. Sing et al. used homomorphic encryption and aggregation together so the computational cost increased compared to proposed solution. Sikeridis and Liang et al. used public key encryption systems whose computational cost was higher.

Average computation cost.
The communication cost is measured in terms of number of bytes sent out by the sensor node by varying the sensing rate and result is given below in Table 2.
Communication cost
The average communication cost in proposed solution (Fig. 11) is 15% lower compared to Singh et al, 20% lower compared to Sikeridis et al. and 18% lower compared to Liang et al. The communication cost is marginally lower in proposed solution due to use of compressive sesning. Compressive sensing is able to reduce the size of the data while other solution could not achieve better compression ratio compared to compressive sensing.

Average communication cost.
The storage overhead is measured in terms of storage utilization (Kb) at blockchain or IPFS by varying the sampling rate and the result is given below in Table 3.
Storage overhead
The storage utilization in proposed solution (Fig. 12) is on average 19.9% lower compared to Singh et al, 23.43% lower compared to Sikeridis et al. and 27.76% lower compared to Liang et al. The storage utilization has reduced in proposed solution due to higher compression ratio. But the existing works did not attempt any compression of data and stored the data after encryption and aggregation.

Storage Utilization.
The difficult level of confidentiality is measured in terms of variance of difference (VOD) between the original data and predicted data by compromising communication links. Let X
i
be a random variable representing the data from sensor at time i,
A guess is launched for 10 minutes to predict sensor data and privacy measure (pm) is measured for every 1-minute interval and plotted in Fig. 13.

Privacy measure comparisons.
The average privacy measure in proposed solution is 53% higher compared to Singh et al, 64% higher compared Sikeridis et al. and 66% higher compared to Liang et al. Use of transform coding has increased the privacy in proposed solution. But Singh et al. used homomorphic encryption, Sikeridis and Liang used homomorphic along with aggregation. Both of these have higher probability being compromised compared to transform coding along with compressive sensing.
The suitability of data even after compressive sensing and reconstruction for forecasting faults is tested for power quality classification dataset [26]. The dataset has continuous recording of voltage/current levels and the recordings are classified to five types of flts and 1 normal case. The accuracy was measured on oginal da and reconstructed data and the result is given in Table 4.
Data accuracy
The proposed compressive sensing scheme reduced the accuracy by only a marginal value of 1.08 % compared to classification with original data.
The original data and the reconstructed data are analyzed in Fast Fourier transform (FFT) level and Normalized FFFT and output is given in Fig. 10 for original data and Fig. 11 for reconstructed data. Comparing Figs. 14 and 15, there is small difference in raw form, but in fast Fourier normalized level, there is not much difference. Hence the features extracted from normalized FFT provide better accuracy. The confusion matrix of classifier classifying original (Fig. 16) and reconstructed (Fig. 17) shows very few differences demonstrating that proposed compressive sensing scheme creates little distortion to the data.

Original data.

reconstructed data.

Confusion matrix with original data.

Confusion matrix with reconstructed data.
Though the proposed solution addressed the problem of securing the data with lower communication, storage overhead and reduced key management complexity, the solution transfer the computational complexity from sensor nodes to sink. But this complexity at sink can be effectively managed by parallel servers and load balancing architecture. Exploring the load balancing architecture for reducing the computational overhead at sink is in scope of future work.
A secure privacy preserving framework using blockchain and IPFS referred as DDSESG is proposed in this work. As the part of work, secure compressive sensing scheme is proposed for secure transfer of data from nodes to sink at comparatively lower communication and storage cost. The proposed work used smart contract for authentication and key management. In addition to data, the proposed work also proposed a mechanism to secure decision rules mined from data for fault diagnosis. The proposed solution has at least 12.4% lower computation cost, 15% lower communication cost, 19.9% lower storage cost and at least 53% higher privacy compared to existing works.
