Composite load modeling by spatial-temporal deep attention network based on wide-area monitoring systems

Abstract

With the development of the wide-area monitoring system (WAMS), power system operators are capable of providing an accurate and fast estimation of time-varying load parameters. This study proposes a spatial-temporal deep network-based new attention concept to capture the dynamic and static patterns of electrical load consumption through modeling complicated and non-stationary interdependencies between time sequences. The designed deep attention-based network benefits from long short-term memory (LSTM) based component to learning temporal features in time and frequency-domains as encoder-decoder based recurrent neural network. Furthermore, to inherently learn spatial features, a convolutional neural network (CNN) based attention mechanism is developed. Besides, this paper develops a loss function based on a pseudo-Huber concept to enhance the robustness of the proposed network in noisy conditions as well as improve the training performance. The simulation results on IEEE 68-bus demonstrates the effectiveness and superiority of the proposed network through comparison with several previously presented and state-of-the-art methods.

Keywords

Composite load modeling deep attention neural network encoder-decoder long short-term memory convolutional neural network wide-area monitoring system

1 Introduction

Load modeling is a crucial task in power system studies, e.g, voltage stability analysis [1, 2], planning programs [3], power quality studies [4], etc. Load modeling is a tool to represent consumption patterns by mathematical expression throughout a specific time interval. The emergence of smart grid technologies and renewable energies as a potential solution to prevent overexploitation of fossil fuel-based energies has led to new challenges in the reconstructed power systems such as the complex behavior of electricity consumption patterns [5].

A two-stage process is required to accurately model load energy consumption profile, including 1) Selecting a comprehensive and practical model, and 2) Designing a fast and accurate parameter identification. In terms of load models, there are two main categories of load models including physical and measurement-based models. The physical load models refer to a detailed model consists of a large number of components to provide a comprehensive description of the physical behaviors of electrical loads. The physical load model is modeled based on the aggregation of several individual load models or might be obtained by the experimental studies. It is almost impossible to provide accurate and comprehensive data for modeling and aggregating individual loads [6]. Moreover, wide-area load modeling expression based on physical models are too complex and characterization of both time-variant and wide-area relationship in a physical expression is too difficult.

To this end, measurement-based load models are more preferred than physical load models. The measurement-based load models are divided into three main subcategories including static, dynamic, and composite load models. Static load models expressed a connected load to the buss’ voltage and frequency based on measured active and reactive power. The impedance-current-power (ZIP), exponential, frequency-dependent model, and LOADSYN (presented by electric power research institute (EPRI)) are the most conventional type of static load models [5]. Dynamic load models are only a representation of the dynamic behavior of electrical loads connected to a specific bus and in a specific time interval. The most common dynamic load models are induction motor (IM) and exponential recovery load models. However, a single statistic or dynamic model cannot show a realistic load model. Thus, the composite load models (CLM) are highly preferred to model dynamic and static behavior of load consumption patterns simultaneously and, therefore, the CLM is selected in this paper. As reported in [7], CLM composed from the IM and ZIP is the most suitable load model due to the ability to model conditions, locations, and compositions. Thus, this paper investigates IM + ZIP as a composite time-varying load model.

In terms of parameter identification, we can divide the CLM parameter identification methods into state-space models, optimization-based, and artificial intelligence (AI)-based methods.

The state-space based methods use the state-space equations as a least-square problem [8, 9], or using Kalman filter methods such as extended Kalman filter (EKF) [10] and unscented Kalman filter (UKF) [11] to identify parameters of electrical loads. Although state-estimation based load parameter identification methods are fast and can be easily implemented, these type of methods cannot capture the inherent correlation between loads at different locations and only useful for electrical load modeling on the corresponding bus while considering the correlations between loads at different location can enhance the modeling accuracy.

An error based objective function is defined based on the difference between the actual and estimated values in the optimization-based load parameter identification. To find the optimal solution of this objective function, several optimization algorithms have been studied in the previous investigations. For instance, a particle swarm optimization (PSO) has been presented to identify parameters of a dynamic load model (IM model) in [12]. In [13], a Lagrangian coefficient based optimization algorithm has been presented to estimate the CLM parameters. In [14], four different optimization algorithms including differential evolution algorithm (DEA), grid search algorithm (GSA), interior-point algorithm (IPA), and active-set algorithm (ASA) are used to solve the error-based optimization problem to identify the CLM parameters. However, the error-based objective function parameter optimization algorithms are suffering from two major disadvantages, i) high computational cost, and ii) ignoring the dependency of the load electrical consumption time series on the previous time step.

AI-based are performed fast, are easily implemented methods, and can capture the correlation of electrical loads at different buses in the power systems. AI-based methods perform based on historical data in which can be provided by systems such as WAMS in modern power systems. The AI-based method is trained by a learning process based on historical data and then tested based on measurement data and estimated parameters in previous time steps. AI-based methods generally categorized into shallow and deep based methods [15]. The shallow-based methods such as artificial neural networks (ANN) [16], fuzzy logic combination with ANN [17], and support vector machine (SVM) [18] have been presented to identify load parameters in the power systems. The shallow-based methods perform poorly in capturing complicated time-varying signals due to the disability to characterize the power system raw measurement data. Moreover, the shallow-based methods are highly sensitive to measurement noise and suffer from a lack of generality due to their small hypothesis space [19, 20]. The deep neural network is a revolutionary concept in machine learning and data science and is widely used in different areas. Power system studies is no exception and deep learning has already shown great performance in short-term load forecasting [19, 21], renewable energy forecasting [22, 23], electricity price forecasting [24, 25], electrical machines fault detection [26], power transformer protection [27], power quality assessment [28, 29], load flow [30], wind turbine monitoring [31], etc. The long short-term memory (LSTM) as a deep gated recurrent neural network-based structure has been presented in [6]. However, LSTM, generally recurrent neural networks suffer from the disability in learning spatial features. Moreover, a large share of the previous investigation only focuses on the electrical load model connected to a specific bus. Hence, these methods cannot perform when a local measurement device is interrupted.

Despite all efforts conducted by researchers in the electrical load parameter identification in recent years, different challenges remain. 1) It is still challenging to design a method to characterize load parameters with complex spatial and temporal features. 2) The measurement and process noises are unavoidable phenomena in the power system, therefore, it is essential to design a structure that can show robust performance in noisy conditions, 3) Learning and take into account spatial features play a pivotal role to accurately characterize load parameters, and 4) The multi-variant structure is required to handle a large number of the unknown parameters.

To address these challenges, this study proposes a spatial-temporal deep attention network. The deep attention concept is developed in this paper to capture nonlinear, nonstationary, and complicated interdependencies between previous time steps as well as extracting robust and spatial features. In the proposed method, LSTM with an attention mechanism is used to understand dependency and temporal features in raw time-domain signals, and also capturing frequency-domain features without any additional feature extraction technique. Then, a convolutional neural network (CNN) based attention network is added to the network to learn the spatial feature between measurement signals at different locations. Furthermore, a CNN-based attention network can learn time interdependencies between multiple parameters with different dynamic behaviors. Consequently, a pseudo-Huber loss function is developed to enhance the robustness of the designed deep attention network as well as improving the training performance. The efficiency and superiority of the proposed method are verified by a numerical study on the IEEE 68-bus system and a comparison with several previously presented and state-of-the-art methods.

Thus, the key contribution of this paper can be summarized as:

A new deep attention based structure is designed to identify CLM parameters by understanding Spatio-temporal features.

A designed deep network can capture slow and fast dynamic behaviors of the CLM parameter and capture the interdependencies between multi-variant signals throughout a time interval.

A pseudo-Huber loss function is developed to enhance the robustness of the proposed deep network against noisy conditions and improve the training performance.

The proposed deep network considers the correlation between the loads at different locations through a CNN-based attention mechanism.

The remainder of the paper organized as follows: Section II describes the wide-area load modeling. The structure of the designed deep network is described in detail in Section III. Section IV discusses the numerical results of the proposed deep-based CLM parameter identification. Finally, the conclusion of the paper is given in Section V.

2 Wide-area load model

The time-varying load model consists of two main components including static (ZIP model) and dynamic (IM model). The ZIP model is a conventional static load model with three components, constant impedance (Z), constant current (I), and constant power (P). In the ZIP model, the active and reactive power follows a quadratic functions. To identify the ZIP model, a set of time-varying coefficients should be estimated. The ZIP model is described as follows: $P_{t}^{ZIP} = α_{t}^{P} {(\frac{V_{t}}{V_{b}})}^{2} + β_{t}^{P} (\frac{V_{t}}{V_{b}}) + γ_{t}^{P}$ (1) $Q_{t}^{ZIP} = α_{t}^{Q} {(\frac{V_{t}}{V_{b}})}^{2} + β_{t}^{Q} (\frac{V_{t}}{V_{b}}) + γ_{t}^{Q}$ (2)

Note that in ZIP load parameter identification, the constants should be: $α_{t}^{P / Q} + β_{t}^{P / Q} + γ_{t}^{P / Q} = 1$ (3)

To address the dynamic model, a three-order IM model is utilized considering meteorological impacts and consuming load patterns. A three-order state-space model is: ${\dot{v}}_{d}^{t} = \frac{- r_{R}^{t}}{x_{R}^{t} + x_{m}^{t}} (v_{d}^{t} + \frac{{(x_{m}^{t})}^{2}}{x_{m}^{t} + x_{R}^{t}} i_{q}^{t}) + s^{t} v_{q}^{t}$ (4) ${\dot{v}}_{q}^{t} = \frac{- r_{R}^{t}}{x_{R}^{t} + x_{m}^{t}} (v_{q}^{t} - \frac{{(x_{m}^{t})}^{2}}{x_{m}^{t} + x_{R}^{t}} i_{d}^{t}) - s^{t} v_{d}^{t}$ (5) ${\dot{s}}^{t} = \frac{1}{2 H^{t}} (T_{m} {(1 - s^{t})}^{2} - v_{d}^{t} i_{d}^{t} - v_{q}^{t} i_{q}^{t})$ (6)

The d/q-axis stator currents $i_{d / q}^{t}$ are obtained as follows: $i_{d}^{t} = \frac{r_{S}^{t} (u_{d}^{t} - v_{d}^{t}) + x_{sc}^{t} (u_{q}^{t} - v_{q}^{t})}{{(r_{S}^{t})}^{2} + {(x_{sc}^{t})}^{2}}$ (7) $i_{q}^{t} = \frac{r_{S}^{t} (u_{q}^{t} - v_{q}^{t}) - x_{sh}^{t} (u_{d}^{t} - v_{d}^{t})}{{(r_{S}^{t})}^{2} + {(x_{sc}^{t})}^{2}}$ (8)

The d and q-axis component of bus voltage is computed based on the measured bus voltage as: ${(V_{b}^{t})}^{2} = {(u_{d}^{t})}^{2} + {(u_{q}^{t})}^{2}$ (9)

In addition, $x_{sh}^{t}$ is computed as: $x_{sh}^{t} = x_{S}^{t} + \frac{x_{m}^{t} x_{R}^{t}}{x_{m}^{t} + x_{R}^{t}}$ (10)

The active/reactive of the IMS are:

$\begin{matrix} P_{IM}^{t} = & \frac{{r_{S}^{t} [{(u_{d}^{t})}^{2} + {(u_{q}^{t})}^{2} - u_{d}^{t} v_{d}^{t} - u_{q}^{t} v_{q}^{t}]}{[{(r_{S}^{t})}^{2} + {(x_{sh}^{t})}^{2}]} \\ - \frac{- x_{sh}^{t} (u_{d}^{t} v_{q}^{t} - u_{q}^{t} v_{d}^{t})}{[{(r_{S}^{t})}^{2} + {(x_{sh}^{t})}^{2}]}} \end{matrix}$ (11)

$\begin{matrix} Q_{IM}^{t} = & \frac{{x_{sh}^{t} [{(u_{d}^{t})}^{2} + {(u_{q}^{t})}^{2} - u_{d}^{t} v_{d}^{t} - u_{q}^{t} v_{q}^{t}]}{[{(r_{S}^{t})}^{2} + {(x_{sh}^{t})}^{2}]} \\ - \frac{- r_{S}^{t} (u_{d}^{t} v_{q}^{t} - u_{q}^{t} v_{d}^{t})}{[{(r_{S}^{t})}^{2} + {(x_{sh}^{t})}^{2}]}} \end{matrix}$ (12)

Thus, consumed active and reactive power based on CLM are: $P_{com}^{t} = P_{ZIP}^{t} + P_{IM}^{t}$ (13) $Q_{com}^{t} = Q_{ZIP}^{t} + Q_{IM}^{t}$ (14)

To model a load connected to the single bus, let be denote to set of parameters in the CLM and measurable variables respectively $Θ_{d}^{t}$ and $Υ_{d}^{t}$ to estimated parameters, which are $Θ_{d^{Θ}}^{t} = [r_{S}^{t}, x_{S}^{t}, x_{m}^{t}, x_{R}^{t}, r_{R}^{t}, H^{t}, α_{t}^{P}, α_{t}^{Q}, β_{t}^{P}, β_{t}^{Q}$ $γ_{t}^{P} γ_{t}^{Q}]$ and $Υ_{d^{Υ}}^{t} = [P_{com}^{t}, Q_{com}^{t}, V_{b}^{t}]$ , where d^Θ and d^Υ show the dimension of the CLM and measurement parameters, respectively.

Generally, the CLM is defined as a function of measurement parameters and corresponding noise, $Θ_{d^{Θ}}^{t} = f (Υ_{d^{Υ}}^{t}) + e_{Υ}$ . Load parameters at each kth-time step follow the following simple relationship:

$\begin{matrix} Θ_{d^{Θ}}^{t} & = Θ_{d^{Θ}}^{t - 1} + e_{Θ^{1}} = Θ_{d^{Θ}}^{t - 2} + e_{Θ^{2}} \\ = \dots = Θ_{d^{Θ}}^{t - k_{Θ}} + e_{Θ^{k_{Θ}}} \end{matrix}$ (15) where k_Θ shows the window length for parameter estimation. Similarly, a relationship between time-varying parameters of CLM and measurement parameters are as follows:

$\begin{matrix} Θ_{d^{Θ}}^{t} & = f (Υ_{d^{Υ}}^{t}) + e_{Υ} = f_{1} (Υ_{d^{Υ}}^{t - 1}) + e_{Υ^{1}} \\ = \dots = f_{k_{Υ}} (Υ_{d^{Υ}}^{t - k_{Υ}}) + e_{Υ^{k_{Υ}}} \end{matrix}$ (16)

In conventional load modeling, the aim is the identification of parameters for a single load connected to a specific bus. However, it would be possible that the measurement devices of a single bus would be interrupted or missed. Therefore, wide-area load modeling is preferred. In wide-are load modeling, a centralized controller can simultaneously estimate the parameters of multiple loads. Besides, analysis of real case scenarios indicates that electrical loads are dependent variables at various bus located in a power network. Wide-area load modeling incorporates the correlations of electrical loads into load modeling. Thus, a wide-area measurement load model expresses by a nonlinear function, $F_{d_{Υ}}^{t}$ , as: $χ_{d_{χ}, i}^{t} = F_{d_{Υ}}^{t} (χ_{d_{χ}, 1}^{t}, \dots χ_{d_{χ}, N}^{t})$ (17) where subscripts i and N shows the bus number and the total number of the bus in the power systems

By replacing the conventional load model in (16) with wide-area load modeling in (15), the wide-area load model is written as (22).

where Θ_{d^Θ×k_Θ}, χ_{d_χ×(k_Υ+1),i}, and χ_{d_χ×(k_Υ+1)} are: $Θ_{d^{Θ} \times k_{Θ}} = (Θ_{d^{Θ}}^{t - 1}, Θ_{d^{Θ}}^{t - 2}, \dots Θ_{d^{Θ}}^{t - k_{Θ}})$ (18) $χ_{d_{χ} \times (k_{Υ} + 1)} = (\dots, χ_{d_{χ} \times (k_{Υ} + 1), i}, \dots, χ_{d_{χ} \times (k_{Υ} + 1), N})$ (19) $χ_{d_{χ} \times (k_{Υ} + 1), i} = (χ_{d_{χ}, i}^{t}, χ_{d_{χ}, i}^{t - 1}, \dots, χ_{d_{χ}, i}^{t - k_{Υ}})$ (20)

An analytical model cannot estimate the parameters in (22), therefore, this paper proposes a data-driven method to identify the time-varying parameters.

3 Proposed spatial-temporal deep attention network for composite load parameter identification

A historical dataset consists of $Θ_{d^{Θ} \times k_{Θ}, i} \in ℝ^{d^{Θ} \times k_{Θ}}$ and $χ_{d_{χ} \times (k_{Υ} + 1)} \in ℝ^{d_{χ} \times (k_{Υ} + 1)}$ is used to construct a model to project a set of ${Θ^{'}}_{d^{Θ} \times k_{Θ}, i} \in ℝ^{d^{Θ} \times k_{Θ}}$ with minimum difference Θ_{d^Θ×k_Θ,i} using measurement set χ_{d_χ×(k_Υ+1)}. The input dataset defines as $X = {(Θ_{d^{Θ}}^{t - k_{Θ}}, \dots, Θ_{d^{Θ}}^{t - 1}), (χ_{d_{χ}, i}^{t - k_{Υ}}, χ_{d_{χ}, i}^{t - 1})}$ and the output is $Y = {({Θ^{'}}_{d^{Θ}}^{t})}$ in the data-driven load parameter estimation.

This section presents the background and proposed deep attention network for wide-area CLM parameter identification. Firstly a background from a deep attention network is provided, then a proposed network is described.

3.1 Background of deepttention network

The attention mechanism is originally integrated with deep neural networks in [32] for multi-variant time series forecasting. To identify the CLM parameters with deep conventional deep attention, a recurrent neural network (RNN) constructs a time-varying vector v^t using a set of hidden states H^t ={ h¹, h², ⋯ , h^t-1 }. The output vector, v^t is produced based hⁱ, ∀ i ∈ 1, ⋯ , t - 1, and each of these output vectors includes features associated with a t time interval. To estimate parameters of a time-varying model, v^t integrates into the hidden state at t time interval, h^t. The output vectors are obtained based on the scoring function $f^{sc} : ℝ^{m} \times ℝ^{m} \to ℝ^{m}$ through learning the inherent features among all inputs, measurement values at current and previous time steps, and parameter values at previous time steps. Overall, v^t is computed as:

$v^{t} = \sum_{i = 1}^{t - 1} h^{i} \underset{{e^{1}, \dots, e^{t}}}{\underset{︸}{\frac{\exp [f^{sc} (h^{i}, h^{t})]}{\sum_{j = 1}^{t - 1} \exp [f^{sc} (h^{j}, h^{t})]}}}$ (21) where {e¹, ⋯ , e^t } show attention weights corresponding to each hidden state.

$θ_{d^{θ}}^{t} = F_{χ} (\underset{\underset{︸}{Θ_{d^{Θ}}^{t - 1}, \dots Θ_{d^{Θ}}^{t - k_{Θ}}}}{}, \dots \underset{χ_{d_{χ}} \times (k_{ϒ} + 1)}{\underset{︸}{\underset{χ_{d_{χ}} \times (k_{ϒ} 9 + 1), 1}{\underset{︸}{χ_{d_{χ^{, 1}}}^{t}, \dots χ_{d_{χ^{, 1}}}^{t - k ϒ}}}, \underset{χ_{d_{χ}} \times (k_{ϒ} + 1), 2}{\underset{︸}{χ_{d_{χ^{, 2}}}^{t}, \dots χ_{d_{χ^{, 2}}}^{t - k ϒ}}}, \dots, \underset{χ_{d_{χ}} \times (k_{ϒ} + 1), N}{\underset{︸}{χ_{d_{χ^{, N}}}^{t}, \dots χ_{d_{χ^{, N}}}^{t - k ϒ}}}}})$ (22)

The deep attention networks rely on the decoder and encoder process. To estimate time-varying parameters, the following steps require to follow:

Calculation of score of each encoder hidden state.

Calculation attention weights.

Calculation of the output vectors based on (22).

Concatenate output vectors at the current time with outputs in the previous time steps.

Decoding the final outputs.

The main principle of the conventional deep attention network for the CLM parameter estimation is illustrated in Fig. 1. As can be seen from this Fig, all five mentioned steps are shown in detail.

Fig. 1

Overall procedure of conventional deep attention network.

Although the deep attention network provides the ability to model CLM parameter identification as a multi-variant identification problem for the power system operators, the convention multi-variant deep attention network cannot capture the fully spatial-temporal feature from the measurement and parameters data. To this end, an LSTM is integrated to understand temporal interdependencies features, and CNN is: $v^{t} = \sum_{i = 1}^{t - 1} h^{i} \underset{{e^{1}, \dots, e^{t}}}{\underset{︸}{\frac{\exp [f^{sc} (h^{i}, h^{t})]}{\sum_{j = 1}^{t - 1} \exp [f^{sc} (h^{j}, h^{t})]}}}$ (23)

3.2 Proposed deep attention network

To implement a deep attention network for the CLM parameter identification problem, the typical attention mechanism cannot understand fully temporal and spatial features due to the dependency of the output vector to the hidden states in the previous time steps. Thus, in the CLM parameter identification task in which each time step corresponds to multiple parameters and measurement values, it fails to estimate parameters that are noisy in terms of measurement devices. Besides, the conventional deep attention network can only track average the parameters across a long sequence, therefore, conventional deep attention network is vulnerable in noisy conditions and weakened to capture full features in the learning process. To address these problems, firstly a modified deep attention network is proposed in this paper. Then, a pseudo- Huber loss function is developed to enhance the robustness in noisy conditions.

The design of the proposed deep attention network is illustrated in Fig. 2. As can be realized from this Fig, in the proposed structure, the scoring function does not feed directly from hidden states, hidden layers are transposed and the row of hidden states are separated. Then, the attention weights are calculated based on the transposed hidden states and scoring function. The output vectors are the summation of the row vectors and included temporal features from the current and previous time steps.

Fig. 2

Structure of the developed attention mechanism.

As we mentioned before, CNN is implemented in the designed network to capture temporal features. In wide-area load monitoring, it is essential to understand spatial features between measurement and load model parameters at the different locations. To this end, CNN filters are utilized on the row vectors and generated, $h_{c}^{C, k}$ . Consider $C \in ℝ^{1 \times T}$ filters, where T shows the maximum length that consider for the attention mechanism. Convolutional layers convert hidden states to $h_{c}^{C, k}$ , as: $h_{m, n}^{C} = \sum_{le = 1}^{L} h^{n, (t - L - 1 + le)} \times c^{m, L - le}$ (24)

Thus, based on scoring function f^sc (•), the output vectors of the designed deep attention network is defined as: $f^{sc} (h^{c, i}, h^{t}) = {(h^{c, i})}^{T} ω^{AT} h^{t}$ (25)

In the conventional deep attention network, softmax function, $\frac{\exp [f^{sc} (h^{i}, h^{t})]}{\sum_{j = 1}^{t - 1} \exp [f^{sc} (h^{j}, h^{t})]}$ is used to generate output vectors, while softmax cannot model the impact of various inputs. Therefore, we change the softmax function with the sigmoid function. Thus, the attention weights in the designed deep attention network is: ${\overset{´}{e}}^{i} = sigmoid (f^{sc} (h^{c, i}, h^{t}))$ (26)

The output vector is determined as: $v^{t} = \sum_{i = 1}^{n} {\overset{´}{e}}^{i} h^{c, i}$ (27)

The new hidden states (shown in Fig. 2) obtain as: ${\overset{´}{h}}^{t} = ω^{h} h^{t} + ω^{v} v^{t}$ (28) where $h^{t}, {\overset{´}{h}}^{t} \in ℝ^{m}$ , $ω^{h} \in ℝ^{m \times m}$ , and $ω^{v} \in ℝ^{m \times k}$ .

Consequently, the parameters of the CLM model are computed as the output of the designed deep attention network: $y^{t} = ω {e^{'}}^{\overset{´}{h}} {\overset{´}{h}}^{t}$ (29) where $ω^{\overset{´}{h}} \in ℝ^{n \times m}$ and $y^{t} \in ℝ^{n}$ .

3.3 Developed loss function

To design a structure to estimate CLM parameters, the squared error loss function is usually used. However, squared error loss functions might lead to a mean-unbiased and minimum-variance estimator during the training process. In noisy conditions, this feature can lead to inaccurate results in the CLM parameter identification. To tackle this possible problem, a modified loss function is formulated in this paper. To this end, a pseudo-Huber loss function is adopted from [33] to form the following loss function: $f_{loss}^{pH} (Y) = \sum_{t = 1}^{T} \begin{matrix} [φ^{2} \sqrt{\frac{φ^{2} + {(y_{t})}^{2}}{φ^{2}}}] - φ^{2} \end{matrix}$ (30)

In CLM parameter identification, the pseudo-Huber loss function constructs the values of $\frac{{(y)}^{2}}{2}$ during the minimization of the modified loss function. Thereby, the pseudo-Huber loss function restricts the error values caused by noises through constructing a straight line with slope $\frac{{(y)}^{2}}{2}$ . Therefore, the modified loss function prevents errors caused by measurement and process noises in wide-area measurement systems, and mean-biased and enhances the robustness of the proposed data-driven in the high variant time-series in the power systems. To train a network for CLM, the learning weights, θ_l of each l^th layer is: $θ_{l}^{pH} (X) = \frac{θ_{l} (X)}{\sqrt{1 + {(\frac{{\tilde{Y}}_{pH -} Y_{l}}{C})}^{2}}}$ (31)

Also, ${\tilde{Y}}_{pH}$ as the pseudo-Huber estimator is: ${\tilde{Y}}_{pH} = \frac{\sum_{t = 1}^{T} θ_{l}^{pH} (X) y_{l}}{\sum_{t = 1}^{T} θ_{l}^{pH} (X)}$ (32)

To obtain learning weights in (30), an optimization process based on Adam optimization algorithm is applied [34] (more information is provided in [27]) Besides, the presented pseudo-Huber loss function enhances the robustness against measurement and process noises in CLM.

3.4 Input dataset preparation

The measurement and load parameters in the generated dataset is normalized $X = {(Θ_{d^{Θ}}^{t - k_{Θ}}, \dots, Θ_{d^{Θ}}^{t - 1}), (χ_{d_{χ}, i}^{t - k_{Υ}}, \dots, χ_{d_{χ}, i}^{t - 1})}$ , based on: $X^{i} = \frac{X^{i} - X_{\min}^{i}}{X_{\max}^{i} - X_{\min}^{i}}$ (33) where $X_{\max}^{i}$ and $X_{\min}^{i}$ show respectively the maximum and minimum input values. The input to feed the designed network in the CLM parameter identification is formed as $input (samples, 10, (measurement parmeters + \underset{bus connected to load}{\underset{︸}{P / Q}} + voltage of all busses in the network))$ . The samples have been considered equal to batch - size = 32. It is worthwhile to note that the output is formed as output = (samples, 10).

3.5 Overall performance of the designed network

The proposed method is a hybrid model that results from combining the CNN. LSTM, encoder-decoder, and attention mechanism. In this network, CNN-based attention mechanism is responsible for interpreting the out sequence of the LSTM-encode and the CNN-based attention mechanism ability is capturing spatial features of the different load in the different locations of the power systems.

In the proposed network, the following procedure should be carried out to identify the CLM parameters:

Firstly, the input dataset fed into the encoder layer, which is an LSTM-based encoder block.

Then, the outputs of the encoder block have been considered as the input of the proposed attention mechanism (shown in Fig. 2) and construct ${\overset{´}{h}}^{t}$ .

The output of the attention mechanism has considered as the input of the LSTM-based decoder block.

Consequently, the decoder layer outputs are the estimated parameters of the CLM.

The procedure of the proposed deep attention network is shown in Fig. 3. The parameters of the designed network are given in Table 1.

Fig. 3

Designed deep attention network structure for CLM parameter identification.

Table 1

Parameters of the Designed Deep Attention Network for CLM Parameter Identification

Input	(sample, attention length, 12 + Voltage of all busses in the network), attention length = 10
Encoding	LSTM 1: #Filter + dropout #128 + 0.5 Activation function: ReLU
	LSTM 2: #Filter + dropout #128 + 0.5 Activation function: ReLU
	LSTM 3: #Filter + dropout #128 + 0.5 Activation function: ReLU
Attention Block	Convolutional Layer: 2D, (10, 1, 32), padding = ”valid”
	Dense 32
Attention length = 10
Decoding	LSTM 1: #Filter + dropout #128 + 0.5 Activation function: ReLU
	LSTM 2: #Filter + dropout #128 + 0.5 Activation function: ReLU
	LSTM 31: #Filter + dropout #128 + 0.5 Activation function: ReLU
Output	(Sample, parameter for identification of CLM model = 10)
Optimizer	Adam, learning rate – = 10^–5, max-gradient-norm = 5

4 Numerical results

This section validates the proposed data-driven parameter identification method for CLM through numerical experiments. The robustness and effectiveness of the proposed deep attention network are verified by a large-scale case study in presence of noise.

For the sake of the comparison, different shallow and deep structures have been considered. The SVM [18], LSTM [6], and multi-modal LSTM (MLSTM) [6] as the previously presented methods are considered in this paper to verify the superiority of the proposed method.

The dataset is generated in MATLAB software and is further processed in the TensorFlow package in a computer with Intel Core i-7-5960X CPU@ 3.00 GHz and 32-GB RAM memory.

4.1 Dataset generation

To evaluate the proposed method, the IEEE-68 bus system, which is included 86 lines and 16 synchronous generators and this system is simulated in the power system toolbox (PST) [35] with 0.01 samples per second. To evaluate the robustness of the proposed approach, we generate the dataset in two different ways. Firstly, 85 fault events are considered for each line and in each event, a line is disconnected. A similar procedure is applied for each load and in 34 different buses, the electrical load has been disconnected from the network. Overall 59500 different sample has been generated and 70%, 15%, and 15% of this dataset are devoted for the training, validation and testing process, respectively. Furthermore, Gaussian noise with the mean values equal to the original test case data and the standard deviation of 10% of the mean values are added to the data to verify the robustness of the proposed method.

4.2 Accuracy Indices

To evaluate the performance of the proposed deep structure and compare the proposed method with different methods, four different indices have been considered including root mean square error (RMSE), normalized root mean square error (NRMSE), mean absolute error (MAE), and mean absolute percentage error (MAE) are used, as: $RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i}^{'} - y_{i})}^{2}}$ (34) $NRMSE = \frac{1}{y_{i}^{\max}} \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i}^{'} - y_{i})}^{2}}$ (35) $MAE = \frac{1}{N} \sum_{i = 1}^{N} | y_{i}^{'} - y_{i} |$ (36) $MAPE = \frac{1}{N} \sum_{i = 1}^{N} | \frac{y_{i}^{'} - y_{i}}{y_{i}} |$ (37)

In these metrics, the real values of load parameter is shown by y_i and estimated values is shown by $y_{i}^{'}$ .

4.3 Discussion on results

Figures 4 and 5 compare the estimated values for parameters, H^t and $X_{s}^{t}$ with the actual values and obtained results by MLSTM. As can be realized from these Figs, the results obtained by the proposed deep attention structure are close to the actual values and the accuracy of the proposed method is significantly better than MLSTM.

Fig. 4

Comparison the actual values with estimated parameter H^t based on the designed deep attention network and MLSTM.

Fig. 5

Comparison the actual values with estimated parameter $X_{s}^{t}$ based on the designed deep attention network and MLSTM.

Table 2 shows the accuracy of the proposed deep attention network in terms of four different metrics. The low values of all metrics can validate the accurate performance of the proposed deep attention network.

Table 2

Performance of the Proposed Method

	Metrics
	MAPE (%)	RMSE(10^–⁴)	NRMSE	MAE(10^–4)
$α_{P}^{t}$	0.0223	0.9807	0.0345	0.7791
$β_{P}^{t}$	0.0392	2.1809	0.0913	1.7622
$α_{Q}^{t}$	0.0619	2.0437	0.0827	1.5479
$β_{Q}^{t}$	0.0365	2.0412	0.1031	1.6442
H ^t	0.0302	4.0537	0.0801	3.0161
$R_{S}^{t}$	0.0231	0.0936	0.0703	0.0757
$X_{S}^{t}$	0.0211	0.2218	0.0614	0.1799
$R_{r}^{t}$	0.0288	0.2159	0.0899	0.1773
$X_{r}^{t}$	0.0333	0.2454	0.1163	0.1998
$X_{m}^{t}$	0.0287	12.4828	0.0583	10.8410

For the sake of comparison, the results of the different methods in the estimation of H^t and $X_{s}^{t}$ are given in Tables 3 and 4, respectively. The superiority of the proposed method in comparison with different methods is obvious. For instance, in the identification of H^t, the proposed method significantly improves the accuracy of the MLSTM, LSTM, and SVM by approximately 66.05 %, 67.34%, and 82.39%, respectively, in terms of MAPE. In terms of MAE, the proposed method improves the estimation accuracy of the parameter $X_{s}^{t}$ , about 68.87%, 74.99 %, and 86.72 %, respectively.

Table 3

Comparison of the Proposed Deep Attention Network Parameter Identification of CLM with MLSTM. LSTM, and SVM in Estimation of H^t

Methods	Metrics
	MAPE	RMSE	NRMSE	MAE
	(%)	(×10^–⁴)		(×10^–⁴)
The Proposed	0.024056	1.3253	0.066935	1.0824
MLSTM	0.07087	3.8505	0.19447	3.189
LSTM	0.073644	4.0986	0.207	3.314
SVM	0.13659	7.4343	0.37569	6.0506

Table 4

Comparison of the Proposed Deep Attention Network Parameter Identification of CLM with MLSTM. LSTM, and SVM in Estimation of $X_{s}^{t}$

Methods	Metrics
	MAPE	RMSE	NRMSE	MAE
	(%)	(×10^–⁴)		(×10^–⁴)
The proposed	0.0211124	0.22177	0.061432	0.17989
MLSTM	0.067872	0.6603	0.18291	0.57803
LSTM	0.08447	0.82577	0.22875	0.71944
SVM	0.15725	1.5255	0.41979	1.3548

4.4 Sensitivity analysis in noise conditions

To test the robustness of the proposed method, this subsection discusses the performance of the proposed method in the different noisy conditions. Besides, to address the superiority of the proposed deep attention network, the comparative results are also given. Thus, five different Gaussian noise signals with same mean value (zero) and six different standard deviation i.e. 0.005, 0.01, 0.015, 0.02, 0.025, and 0.3 have been considered. Figure 6 compares the results obtained by the proposed and other methods in identification of H^t based on MAPE metric, while Fig. 7 shows the different values of estimating $X_{s}^{t}$ obtained by proposed deep attention, MLSTM, LSTM, and SVM in terms of RMSE. It is clear that the proposed method is highly robust in noisy conditions due to the implementation of the proposed loss function and significantly more accurate than MLSTM, LSTM, and SVM.

Fig. 6

Comparison of different methods in H^t estimation based different noise values (α₁) in terms of MAPE.

Fig. 7

Comparison of different methods in ${X_{s}^{t}}^{t}$ estimation based different noise values (α₁) in terms of MAPE.

4.5 Attention mechanism role analysis

In this subsection, the effectiveness of the proposed attention mechanism is verified through comparison with the deep attention networks proposed in [32] and [33]. Both of these deep attention structures have been developed for the CLM parameter identification. Furthermore, to show the attention mechanism effects on the CLM parameter identification, the proposed method is also compared with the proposed structure without the attention mechanism. The results obtained by the proposed, two different deep attention mechanisms, and proposed network without attention mechanism are given in Tables 5 and 6 for estimation of H^t and $X_{s}^{t}$ , respectively. As can be seen, the proposed method is superior to two different deep attention mechanisms. Besides, the proposed method improves the accuracy of the designed network without attention mechanism, about 75.36% and 82.43%, in the estimation of H^t and $X_{s}^{t}$ , respectively, in terms of MAPE.

Table 5
Comparison of the Proposed Deep Attention Network Parameter Identification of CLM with MLSTM. LSTM, and SVM in Estimation of H^t

Methods Metrics

MAPE RMSE NRMSE MAE

(%) (×10^–⁴) (×10^–⁴)

The Proposed 0.03016 4.0537 0.080112 3.0161

Deepattention [33] 0.06803 7.8384 0.15557 6.7226

Deep attention [34] 0.1018 11.58 0.22885 10.509

Without attention 0.11781 13.437 0.27129 12.239

Methods	Metrics
The Proposed	0.03016	4.0537	0.080112	3.0161
Deepattention [33]	0.06803	7.8384	0.15557	6.7226
Deep attention [34]	0.1018	11.58	0.22885	10.509
Without attention	0.11781	13.437	0.27129	12.239

Table 6

Comparison of the Proposed Deep Attention Network Parameter Identification of CLM with MLSTM. LSTM, and SVM in Estimation of $X_{s}^{t}$

Methods	Metrics
	MAPE	RMSE	NRMSE	MAE
	(%)	(×10^–⁴)		(×10^–⁴)
The proposed	0.02111	0.22177	0.061432	0.17989
Deep attention [33]	0.07110	0.66848	0.19074	0.58771
Deep attention [34]	0.10522	0.96903	0.26341	0.87405
Without attention	0.11889	1.144	0.30979	1.0239

5 Conclusion

Load parameter identification is an essential task for the power system operators in short/long-term power system analysis and decision-making processes. To this end, a model, namely CLM, including ZIP and IM as the representation of static and dynamic behavior of electrical load, has gained the attention of the different investigators to mimic the actual load behavior. However, it is crucial to propose a fast, accurate, as well as robust method to identify the time-varying parameters. Thus, this paper aims to develop a deep neural network for the CLM parameter identification based on wide-area measurement. To this end, a multi-variant deep attention network is designed to capture spatial features as well as time-varying characteristics of the raw data. A CNN-based attention mechanism is proposed to capture spatial features and the LSTM-based encoder-decoder structures are developed to capture the temporal pattern of the CLM parameters. Furthermore, to enhance the robustness of the proposed method, a pseudo-Huber loss function has also developed. The numerical experiment on the IEEE 68-bus system illustrates the effectiveness and superiority of the proposed method in comparison with different methods. The proposed deep attention network show at least 60% accuracy improvement in comparison with MLSTM and LSTM as deep neural structure, and SVM as a shallow-based structure. The robust performance of the proposed spatial-temporal deep network has also verified through a noise sensitivity analysis. Furthermore, to address the impact of the proposed attention mechanism, the proposed attention mechanism has compared with two deep attention networks, as the two state-of-the-art methods, which are developed for the CLM parameter identification. Consequently, the designed network without attention mechanism is also compared with the proposed network and shows less than 65% accuracy compared with the proposed network.

The investigations on the composite load modelling approaches reveal that further explorations in directions of proposing and designing new deep structures estimating the probability density function (PDF) of the time-varying load parameters in a large-scale power systems in order to provide full-statistical information, would be worthwhile.

References

Zheng

, Wang

and Zhu

, A novel real-time load modeling method for fast large-disturbance and short-term voltage stability analysis, International Transactions on Electrical Energy Systems 23(8) (2013), 1373–1395.

Prada

R.B.

and Santos

J.O.R.

, Load modelling in static voltage stability indices calculation, European Transactions on Electrical Power 9(5) (1999), 305–308.

Knak Neto

, et al., Load modeling of active low-voltage consumers and comparative analysis of their impact on distribution system expansion planning, International Transactions on Electrical Energy Systems 29(8) (2019), e12038.

Mombauer

and Week

K.-H.

, Load modelling for harmonic flow calculations, European Transactions on Electrical Power 3(6) (1993), 453–460.

Arif

, Wang

, Mather

, Bashualdo

and Zhao

, Load Modeling—A Review, IEEE Transactions on Smart Grid 9(6) (2018), 5986–5999.

Cui

, Khodayar

, Chen

, Wang

, Zhang

and Khodayar

M.E.

, Deep Learning-Based Time-Varying Parameter Identification for System-Wide Load Modeling, IEEE Transactions on Smart Grid 10(6) (2019), 6102–6114.

Milanovic

J.V.

, Yamashita

, Villanueva

S.M.

, Djokic

S.Ž.

and Korunović

L.M.

, International Industry Practice on Power System Load Modeling, IEEE Transactions on Power Systems 28(3) (2013), 3038–3046.

Milanovic

J.V.

, Yamashita

, Villanueva

S.M.

, Djokic

S.Ž.

and Korunović

L.M.

, International Industry Practice on Power System Load Modeling, IEEE Transactions on Power Systems 28(3) (2013), 3038–3046.

, Zhang

and Zhang

, Two-step method for the online parameter identification of a new simplified composite load model, IET Generation Transmission & Distribution 10(16) (2016), 4048–4056.

10.

Najafabadi

A.M.

and Alouani

A.T.

, Real time parameter identification of composite loadmodel, in 2013 IEEE Power & Energy Society General Meeting (2013), 1–5.

11.

Rouhani

and Abur

, Real-Time Dynamic Parameter Estimation for an Exponential Dynamic Load Model, IEEE Transactions on Smart Grid 7(3) (2016), 1530–1536.

12.

Regulski

, Vilchis-Rodriguez

D.S.

, Djurović

and Terzija

, Estimation of Composite Load Model Parameters Using an Improved Particle Swarm Optimization Method, IEEE Transactions on Power Delivery 30(2) (2015), 553–560.

13.

Regulski

, Vilchis-Rodriguez

D.S.

, Djurović

and Terzija

, Estimation of Composite Load Model Parameters Using an Improved Particle Swarm Optimization Method, IEEE Transactions on Power Delivery 30(2) (2015), 553–560.

14.

Wang

, Lu

and Zhang

, Applicability comparison of different algorithms for ambient signal based load model parameter identification, International Journal of Electrical Power & Energy Systems 111 (2019), 382–389.

15.

Afrasiabi

, Afrasiabi

, Parang

and Mohammadi

, Integration of Accelerated Deep Neural Network Into Power Transformer Differential Protection, IEEE Transactions on Industrial Informatics 16(2) (2020), 865–876.

16.

Keyhani

, Lu

and Heydt

G.T.

, Composite neural network load models for power system stability analysis, in, IEEE PES Power Systems Conference and Exposition 2 (2004), 1159–1163.

17.

Keyhani

, Lu

and Heydt

G.T.

, Composite neural network load models for power system stability analysis, in, IEEE PES Power Systems Conference and Exposition 2 (2004), 1159–1163.

18.

Wang

, Wang

and Zhao

, SVM-Based Parameter Identification for Composite ZIP and Electronic Load Modeling, IEEE Transactions on Power Systems 34(1) (2019), 182–193.

19.

Afrasiabi

, Mohammadi

, Rastegar

, Stankovic

, Afrasiabi

and Khazaei

, Deep-Based Conditional Probability Density Function Forecasting of Residential Loads, IEEE Transactions on Smart Grid 11(4) (2020), 3646–3657.

20.

Afrasiabi

, Mohammadi

, Rastegar

, Stankovic

, Afrasiabi

and Khazaei

, Deep-Based Conditional Probability Density Function Forecasting of Residential Loads, IEEE Transactions on Smart Grid 11(4) (2020), 3646–3657.

21.

Cai

, et al., Short-term load forecasting method based on deep neural network with sample weights, International Transactions on Electrical Energy Systems 30(5) (2020), e12340.

22.

Afrasiabi

, Mohammadi

, Rastegar

and Afrasiabi

, Advanced Deep Learning Approach for Probabilistic Wind Speed Forecasting, IEEE Transactions on Industrial Informatics (2020), 1–1.

23.

Afrasiabi

, Mohammadi

, Rastegar

and Afrasiabi

, Deep learning architecture for direct probability density prediction of small-scale solar generation, IET Generation Transmission & Distribution 14(11) (2020), 2017–2025.

24.

Afrasiabi

, Mohammadi

, Rastegar

and Kargarian

, Multi-agent microgrid energy management based on deep learning forecaster, Energy 186 (2019), 115873.

25.

Afrasiabi

, Mohammadi

, Rastegar

and Kargarian

, Probabilistic deep neural network price forecasting based on residential load and wind speed predictions, IET Renewable Power Generation 13(11) (2019), 1840–1848.

26.

Afrasiabi

, Afrasiabi

, Parang

and Mohammadi

, Real-Time Bearing Fault Diagnosis of Induction Motors with Accelerated Deep Learning Approach, in 2019 10th International Power Electronics, Drive Systems and Technologies Conference (PEDSTC), (2019), 155–159.

27.

Afrasiabi

, Afrasiabi

, Parang

and Mohammadi

, Designing a composite deep learning based differential protection scheme of power transformers, Applied Soft Computing 87 (2020), 105975.

28.

Mohammadi

, Afrasiabi

and Parang

, Detection and Classification of Multiple Power Quality Disturbances based on Temporal Deep Learning, in 2019 IEEE International Conference on Environment and Electrical Engineering and 2019 IEEE Industrial and Commercial Power Systems Europe (EEEIC / I&CPS Europe) (2019), 1–5.

29.

Liu

, Hussain

, Yue

, Yildirim

and Yawar

S.J.

, Classification of multiple power quality events via compressed deep learning, International Transactions on Electrical Energy Systems 29(6) (2019), e12010.

30.

Wang

, Zheng

, Chen

, Zhang

and Luo

, A data-driven probabilistic power flow method based on convolutional neural networks, International Transactions on Electrical Energy Systems 30(7) (2020), e12367.

31.

Afrasiabi

, Afrasiabi

, Parang

, Mohammadi

, Arefi

M.M.

and Rastegar

, Wind Turbine Fault Diagnosis with Generative-Temporal Convolutional Neural Network, in 2019 IEEE International Conference on Environment and Electrical Engineering and 2019 IEEE Industrial and Commercial Power Systems Europe (EEEIC / I&CPS Europe), (2019), 1–5.

32.

Luong

M.-T.

, Pham

and Manning

C.D.

, Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025, (2015).

33.

Bahdanau

, Cho

and Bengio

, Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473, (2014).