Abstract
With the development of the wide-area monitoring system (WAMS), power system operators are capable of providing an accurate and fast estimation of time-varying load parameters. This study proposes a spatial-temporal deep network-based new attention concept to capture the dynamic and static patterns of electrical load consumption through modeling complicated and non-stationary interdependencies between time sequences. The designed deep attention-based network benefits from long short-term memory (LSTM) based component to learning temporal features in time and frequency-domains as encoder-decoder based recurrent neural network. Furthermore, to inherently learn spatial features, a convolutional neural network (CNN) based attention mechanism is developed. Besides, this paper develops a loss function based on a pseudo-Huber concept to enhance the robustness of the proposed network in noisy conditions as well as improve the training performance. The simulation results on IEEE 68-bus demonstrates the effectiveness and superiority of the proposed network through comparison with several previously presented and state-of-the-art methods.
Keywords
Introduction
Load modeling is a crucial task in power system studies, e.g, voltage stability analysis [1, 2], planning programs [3], power quality studies [4], etc. Load modeling is a tool to represent consumption patterns by mathematical expression throughout a specific time interval. The emergence of smart grid technologies and renewable energies as a potential solution to prevent overexploitation of fossil fuel-based energies has led to new challenges in the reconstructed power systems such as the complex behavior of electricity consumption patterns [5].
A two-stage process is required to accurately model load energy consumption profile, including 1) Selecting a comprehensive and practical model, and 2) Designing a fast and accurate parameter identification. In terms of load models, there are two main categories of load models including physical and measurement-based models. The physical load models refer to a detailed model consists of a large number of components to provide a comprehensive description of the physical behaviors of electrical loads. The physical load model is modeled based on the aggregation of several individual load models or might be obtained by the experimental studies. It is almost impossible to provide accurate and comprehensive data for modeling and aggregating individual loads [6]. Moreover, wide-area load modeling expression based on physical models are too complex and characterization of both time-variant and wide-area relationship in a physical expression is too difficult.
To this end, measurement-based load models are more preferred than physical load models. The measurement-based load models are divided into three main subcategories including static, dynamic, and composite load models. Static load models expressed a connected load to the buss’ voltage and frequency based on measured active and reactive power. The impedance-current-power (ZIP), exponential, frequency-dependent model, and LOADSYN (presented by electric power research institute (EPRI)) are the most conventional type of static load models [5]. Dynamic load models are only a representation of the dynamic behavior of electrical loads connected to a specific bus and in a specific time interval. The most common dynamic load models are induction motor (IM) and exponential recovery load models. However, a single statistic or dynamic model cannot show a realistic load model. Thus, the composite load models (CLM) are highly preferred to model dynamic and static behavior of load consumption patterns simultaneously and, therefore, the CLM is selected in this paper. As reported in [7], CLM composed from the IM and ZIP is the most suitable load model due to the ability to model conditions, locations, and compositions. Thus, this paper investigates IM + ZIP as a composite time-varying load model.
In terms of parameter identification, we can divide the CLM parameter identification methods into state-space models, optimization-based, and artificial intelligence (AI)-based methods.
The state-space based methods use the state-space equations as a least-square problem [8, 9], or using Kalman filter methods such as extended Kalman filter (EKF) [10] and unscented Kalman filter (UKF) [11] to identify parameters of electrical loads. Although state-estimation based load parameter identification methods are fast and can be easily implemented, these type of methods cannot capture the inherent correlation between loads at different locations and only useful for electrical load modeling on the corresponding bus while considering the correlations between loads at different location can enhance the modeling accuracy.
An error based objective function is defined based on the difference between the actual and estimated values in the optimization-based load parameter identification. To find the optimal solution of this objective function, several optimization algorithms have been studied in the previous investigations. For instance, a particle swarm optimization (PSO) has been presented to identify parameters of a dynamic load model (IM model) in [12]. In [13], a Lagrangian coefficient based optimization algorithm has been presented to estimate the CLM parameters. In [14], four different optimization algorithms including differential evolution algorithm (DEA), grid search algorithm (GSA), interior-point algorithm (IPA), and active-set algorithm (ASA) are used to solve the error-based optimization problem to identify the CLM parameters. However, the error-based objective function parameter optimization algorithms are suffering from two major disadvantages, i) high computational cost, and ii) ignoring the dependency of the load electrical consumption time series on the previous time step.
AI-based are performed fast, are easily implemented methods, and can capture the correlation of electrical loads at different buses in the power systems. AI-based methods perform based on historical data in which can be provided by systems such as WAMS in modern power systems. The AI-based method is trained by a learning process based on historical data and then tested based on measurement data and estimated parameters in previous time steps. AI-based methods generally categorized into shallow and deep based methods [15]. The shallow-based methods such as artificial neural networks (ANN) [16], fuzzy logic combination with ANN [17], and support vector machine (SVM) [18] have been presented to identify load parameters in the power systems. The shallow-based methods perform poorly in capturing complicated time-varying signals due to the disability to characterize the power system raw measurement data. Moreover, the shallow-based methods are highly sensitive to measurement noise and suffer from a lack of generality due to their small hypothesis space [19, 20]. The deep neural network is a revolutionary concept in machine learning and data science and is widely used in different areas. Power system studies is no exception and deep learning has already shown great performance in short-term load forecasting [19, 21], renewable energy forecasting [22, 23], electricity price forecasting [24, 25], electrical machines fault detection [26], power transformer protection [27], power quality assessment [28, 29], load flow [30], wind turbine monitoring [31], etc. The long short-term memory (LSTM) as a deep gated recurrent neural network-based structure has been presented in [6]. However, LSTM, generally recurrent neural networks suffer from the disability in learning spatial features. Moreover, a large share of the previous investigation only focuses on the electrical load model connected to a specific bus. Hence, these methods cannot perform when a local measurement device is interrupted.
Despite all efforts conducted by researchers in the electrical load parameter identification in recent years, different challenges remain. 1) It is still challenging to design a method to characterize load parameters with complex spatial and temporal features. 2) The measurement and process noises are unavoidable phenomena in the power system, therefore, it is essential to design a structure that can show robust performance in noisy conditions, 3) Learning and take into account spatial features play a pivotal role to accurately characterize load parameters, and 4) The multi-variant structure is required to handle a large number of the unknown parameters.
To address these challenges, this study proposes a spatial-temporal deep attention network. The deep attention concept is developed in this paper to capture nonlinear, nonstationary, and complicated interdependencies between previous time steps as well as extracting robust and spatial features. In the proposed method, LSTM with an attention mechanism is used to understand dependency and temporal features in raw time-domain signals, and also capturing frequency-domain features without any additional feature extraction technique. Then, a convolutional neural network (CNN) based attention network is added to the network to learn the spatial feature between measurement signals at different locations. Furthermore, a CNN-based attention network can learn time interdependencies between multiple parameters with different dynamic behaviors. Consequently, a pseudo-Huber loss function is developed to enhance the robustness of the designed deep attention network as well as improving the training performance. The efficiency and superiority of the proposed method are verified by a numerical study on the IEEE 68-bus system and a comparison with several previously presented and state-of-the-art methods.
Thus, the key contribution of this paper can be summarized as: A new deep attention based structure is designed to identify CLM parameters by understanding Spatio-temporal features. A designed deep network can capture slow and fast dynamic behaviors of the CLM parameter and capture the interdependencies between multi-variant signals throughout a time interval. A pseudo-Huber loss function is developed to enhance the robustness of the proposed deep network against noisy conditions and improve the training performance. The proposed deep network considers the correlation between the loads at different locations through a CNN-based attention mechanism.
The remainder of the paper organized as follows: Section II describes the wide-area load modeling. The structure of the designed deep network is described in detail in Section III. Section IV discusses the numerical results of the proposed deep-based CLM parameter identification. Finally, the conclusion of the paper is given in Section V.
Wide-area load model
The time-varying load model consists of two main components including static (ZIP model) and dynamic (IM model). The ZIP model is a conventional static load model with three components, constant impedance (Z), constant current (I), and constant power (P). In the ZIP model, the active and reactive power follows a quadratic functions. To identify the ZIP model, a set of time-varying coefficients should be estimated. The ZIP model is described as follows:
Note that in ZIP load parameter identification, the constants should be:
To address the dynamic model, a three-order IM model is utilized considering meteorological impacts and consuming load patterns. A three-order state-space model is:
The d/q-axis stator currents
The d and q-axis component of bus voltage is computed based on the measured bus voltage as:
In addition,
The active/reactive of the IMS are:
Thus, consumed active and reactive power based on CLM are:
To model a load connected to the single bus, let be denote to set of parameters in the CLM and measurable variables respectively
Generally, the CLM is defined as a function of measurement parameters and corresponding noise,
In conventional load modeling, the aim is the identification of parameters for a single load connected to a specific bus. However, it would be possible that the measurement devices of a single bus would be interrupted or missed. Therefore, wide-area load modeling is preferred. In wide-are load modeling, a centralized controller can simultaneously estimate the parameters of multiple loads. Besides, analysis of real case scenarios indicates that electrical loads are dependent variables at various bus located in a power network. Wide-area load modeling incorporates the correlations of electrical loads into load modeling. Thus, a wide-area measurement load model expresses by a nonlinear function,
By replacing the conventional load model in (16) with wide-area load modeling in (15), the wide-area load model is written as (22).
where ΘdΘ×kΘ, χd
χ
×(kΥ+1),i, and χd
χ
×(kΥ+1) are:
An analytical model cannot estimate the parameters in (22), therefore, this paper proposes a data-driven method to identify the time-varying parameters.
A historical dataset consists of
This section presents the background and proposed deep attention network for wide-area CLM parameter identification. Firstly a background from a deep attention network is provided, then a proposed network is described.
Background of deepttention network
The attention mechanism is originally integrated with deep neural networks in [32] for multi-variant time series forecasting. To identify the CLM parameters with deep conventional deep attention, a recurrent neural network (RNN) constructs a time-varying vector v
t
using a set of hidden states H
t
={ h1, h2, ⋯ , ht-1 }. The output vector, v
t
is produced based h
i
, ∀ i ∈ 1, ⋯ , t - 1, and each of these output vectors includes features associated with a t time interval. To estimate parameters of a time-varying model, v
t
integrates into the hidden state at t time interval, h
t
. The output vectors are obtained based on the scoring function
The deep attention networks rely on the decoder and encoder process. To estimate time-varying parameters, the following steps require to follow: Calculation of score of each encoder hidden state. Calculation attention weights. Calculation of the output vectors based on (22). Concatenate output vectors at the current time with outputs in the previous time steps. Decoding the final outputs.
The main principle of the conventional deep attention network for the CLM parameter estimation is illustrated in Fig. 1. As can be seen from this Fig, all five mentioned steps are shown in detail.

Overall procedure of conventional deep attention network.
Although the deep attention network provides the ability to model CLM parameter identification as a multi-variant identification problem for the power system operators, the convention multi-variant deep attention network cannot capture the fully spatial-temporal feature from the measurement and parameters data. To this end, an LSTM is integrated to understand temporal interdependencies features, and CNN is:
To implement a deep attention network for the CLM parameter identification problem, the typical attention mechanism cannot understand fully temporal and spatial features due to the dependency of the output vector to the hidden states in the previous time steps. Thus, in the CLM parameter identification task in which each time step corresponds to multiple parameters and measurement values, it fails to estimate parameters that are noisy in terms of measurement devices. Besides, the conventional deep attention network can only track average the parameters across a long sequence, therefore, conventional deep attention network is vulnerable in noisy conditions and weakened to capture full features in the learning process. To address these problems, firstly a modified deep attention network is proposed in this paper. Then, a pseudo- Huber loss function is developed to enhance the robustness in noisy conditions.
The design of the proposed deep attention network is illustrated in Fig. 2. As can be realized from this Fig, in the proposed structure, the scoring function does not feed directly from hidden states, hidden layers are transposed and the row of hidden states are separated. Then, the attention weights are calculated based on the transposed hidden states and scoring function. The output vectors are the summation of the row vectors and included temporal features from the current and previous time steps.

Structure of the developed attention mechanism.
As we mentioned before, CNN is implemented in the designed network to capture temporal features. In wide-area load monitoring, it is essential to understand spatial features between measurement and load model parameters at the different locations. To this end, CNN filters are utilized on the row vectors and generated,
Thus, based on scoring function f
sc
(•), the output vectors of the designed deep attention network is defined as:
In the conventional deep attention network, softmax function,
The output vector is determined as:
The new hidden states (shown in Fig. 2) obtain as:
Consequently, the parameters of the CLM model are computed as the output of the designed deep attention network:
To design a structure to estimate CLM parameters, the squared error loss function is usually used. However, squared error loss functions might lead to a mean-unbiased and minimum-variance estimator during the training process. In noisy conditions, this feature can lead to inaccurate results in the CLM parameter identification. To tackle this possible problem, a modified loss function is formulated in this paper. To this end, a pseudo-Huber loss function is adopted from [33] to form the following loss function:
In CLM parameter identification, the pseudo-Huber loss function constructs the values of
Also,
To obtain learning weights in (30), an optimization process based on Adam optimization algorithm is applied [34] (more information is provided in [27]) Besides, the presented pseudo-Huber loss function enhances the robustness against measurement and process noises in CLM.
The measurement and load parameters in the generated dataset is normalized
The proposed method is a hybrid model that results from combining the CNN. LSTM, encoder-decoder, and attention mechanism. In this network, CNN-based attention mechanism is responsible for interpreting the out sequence of the LSTM-encode and the CNN-based attention mechanism ability is capturing spatial features of the different load in the different locations of the power systems.
In the proposed network, the following procedure should be carried out to identify the CLM parameters: Firstly, the input dataset fed into the encoder layer, which is an LSTM-based encoder block. Then, the outputs of the encoder block have been considered as the input of the proposed attention mechanism (shown in Fig. 2) and construct The output of the attention mechanism has considered as the input of the LSTM-based decoder block. Consequently, the decoder layer outputs are the estimated parameters of the CLM.
The procedure of the proposed deep attention network is shown in Fig. 3. The parameters of the designed network are given in Table 1.

Designed deep attention network structure for CLM parameter identification.
Parameters of the Designed Deep Attention Network for CLM Parameter Identification
This section validates the proposed data-driven parameter identification method for CLM through numerical experiments. The robustness and effectiveness of the proposed deep attention network are verified by a large-scale case study in presence of noise.
For the sake of the comparison, different shallow and deep structures have been considered. The SVM [18], LSTM [6], and multi-modal LSTM (MLSTM) [6] as the previously presented methods are considered in this paper to verify the superiority of the proposed method.
The dataset is generated in MATLAB software and is further processed in the TensorFlow package in a computer with Intel Core i-7-5960X CPU@ 3.00 GHz and 32-GB RAM memory.
Dataset generation
To evaluate the proposed method, the IEEE-68 bus system, which is included 86 lines and 16 synchronous generators and this system is simulated in the power system toolbox (PST) [35] with 0.01 samples per second. To evaluate the robustness of the proposed approach, we generate the dataset in two different ways. Firstly, 85 fault events are considered for each line and in each event, a line is disconnected. A similar procedure is applied for each load and in 34 different buses, the electrical load has been disconnected from the network. Overall 59500 different sample has been generated and 70%, 15%, and 15% of this dataset are devoted for the training, validation and testing process, respectively. Furthermore, Gaussian noise with the mean values equal to the original test case data and the standard deviation of 10% of the mean values are added to the data to verify the robustness of the proposed method.
Accuracy Indices
To evaluate the performance of the proposed deep structure and compare the proposed method with different methods, four different indices have been considered including root mean square error (RMSE), normalized root mean square error (NRMSE), mean absolute error (MAE), and mean absolute percentage error (MAE) are used, as:
In these metrics, the real values of load parameter is shown by y
i
and estimated values is shown by
Figures 4 and 5 compare the estimated values for parameters, H
t
and

Comparison the actual values with estimated parameter H t based on the designed deep attention network and MLSTM.

Comparison the actual values with estimated parameter
Table 2 shows the accuracy of the proposed deep attention network in terms of four different metrics. The low values of all metrics can validate the accurate performance of the proposed deep attention network.
Performance of the Proposed Method
For the sake of comparison, the results of the different methods in the estimation of H
t
and
Comparison of the Proposed Deep Attention Network Parameter Identification of CLM with MLSTM. LSTM, and SVM in Estimation of H t
Comparison of the Proposed Deep Attention Network Parameter Identification of CLM with MLSTM. LSTM, and SVM in Estimation of
To test the robustness of the proposed method, this subsection discusses the performance of the proposed method in the different noisy conditions. Besides, to address the superiority of the proposed deep attention network, the comparative results are also given. Thus, five different Gaussian noise signals with same mean value (zero) and six different standard deviation i.e. 0.005, 0.01, 0.015, 0.02, 0.025, and 0.3 have been considered. Figure 6 compares the results obtained by the proposed and other methods in identification of H
t
based on MAPE metric, while Fig. 7 shows the different values of estimating

Comparison of different methods in H t estimation based different noise values (α1) in terms of MAPE.

Comparison of different methods in
In this subsection, the effectiveness of the proposed attention mechanism is verified through comparison with the deep attention networks proposed in [32] and [33]. Both of these deep attention structures have been developed for the CLM parameter identification. Furthermore, to show the attention mechanism effects on the CLM parameter identification, the proposed method is also compared with the proposed structure without the attention mechanism. The results obtained by the proposed, two different deep attention mechanisms, and proposed network without attention mechanism are given in Tables 5 and 6 for estimation of H
t
and
Comparison of the Proposed Deep Attention Network Parameter Identification of CLM with MLSTM. LSTM, and SVM in Estimation of H
t
Comparison of the Proposed Deep Attention Network Parameter Identification of CLM with MLSTM. LSTM, and SVM in Estimation of H t
Comparison of the Proposed Deep Attention Network Parameter Identification of CLM with MLSTM. LSTM, and SVM in Estimation of
Load parameter identification is an essential task for the power system operators in short/long-term power system analysis and decision-making processes. To this end, a model, namely CLM, including ZIP and IM as the representation of static and dynamic behavior of electrical load, has gained the attention of the different investigators to mimic the actual load behavior. However, it is crucial to propose a fast, accurate, as well as robust method to identify the time-varying parameters. Thus, this paper aims to develop a deep neural network for the CLM parameter identification based on wide-area measurement. To this end, a multi-variant deep attention network is designed to capture spatial features as well as time-varying characteristics of the raw data. A CNN-based attention mechanism is proposed to capture spatial features and the LSTM-based encoder-decoder structures are developed to capture the temporal pattern of the CLM parameters. Furthermore, to enhance the robustness of the proposed method, a pseudo-Huber loss function has also developed. The numerical experiment on the IEEE 68-bus system illustrates the effectiveness and superiority of the proposed method in comparison with different methods. The proposed deep attention network show at least 60% accuracy improvement in comparison with MLSTM and LSTM as deep neural structure, and SVM as a shallow-based structure. The robust performance of the proposed spatial-temporal deep network has also verified through a noise sensitivity analysis. Furthermore, to address the impact of the proposed attention mechanism, the proposed attention mechanism has compared with two deep attention networks, as the two state-of-the-art methods, which are developed for the CLM parameter identification. Consequently, the designed network without attention mechanism is also compared with the proposed network and shows less than 65% accuracy compared with the proposed network.
The investigations on the composite load modelling approaches reveal that further explorations in directions of proposing and designing new deep structures estimating the probability density function (PDF) of the time-varying load parameters in a large-scale power systems in order to provide full-statistical information, would be worthwhile.
