A novel electric load consumption prediction and feature selection model based on modified clonal selection algorithm

Abstract

In this paper, a new combined method based on Clonal Selection Algorithm (CSA) and Artificial Neural Network (ANN) machine learning algorithm has been presented for the Short Term Load Forecasting (STLF) application. Compared to the other existing evolutionary based algorithm in this area, the proposed technique exploits both the ANN’s learning properties for solving the nonlinear and complex problems and CSA population-based algorithm for global and local search. Moreover, in order to select the most informative and irredundant features from the input feature set, a new feature selection method is introduced by using fuzzy set theory and fuzzy clustering techniques. In regards to overall performance enhancement of CSA algorithm, three sub-modifications are proposed to expand the search capability of CSA and avoid premature convergence. Finally, in order to demonstrate the effectiveness and superiority of proposed method compared to other existing methods, the real dataset of daily peak value of electric load consumption is provided and simulation results reveal the improved forecasting accuracy of the proposed method over the other popular techniques in the STLF application.

Keywords

Short Term Load Forecasting (STLF)optimization techniques Clonal Selection Algorithm (CSA)Artificial Neural Network (ANN)fuzzy-based feature selection

Nomenclature

Z _Ni

Number of input variables for ANN

N _i

Number of input layer for ANN

N _h

Number of hidden layer for ANN

N _o

Number of output layer for ANN

Transfer function for input layer in ANN

Transfer function for hidden layer in ANN

w_{i, j}^{1}

Weighting factor connecting the ith neuron in the input layer to the jth neuron in the hidden layer

w_{i, j}^{2}

Weighting factor connecting the ith neuron in the hidden layer to the jth neuron in the output layer

μ _A

Fuzzy membership function

x _i

Input candidate

Number of training patterns

Output candidate

x _i,h

Data of the hth pattern of the ith input

Spread factor of the membership function

c _i

Crisp output (Defuzzified)

m _i

The mean value of data set y_i

ω _i

Weight for the ith objective in fuzzy clustering

Order of the antibody in the population

N _w

Number of weighting factors of proposed SAMCSA-ANN

N _b

Number of biasing factors of proposed SAMCSA-ANN

{\bar{X}}_{k}

Vector of the adjusted weighting and biasing coefficient

{\bar{X}}_{i}

Situation of the kth antibody in the population

P _i

Evaluated load values of the ith day in objective function

T _i

Actual load values of the ith day in objective function

N _es

Number of forecasted data samples

F (X)

Constraint objective function

G (X)

Equality constraints function

H (X)

Inequality constraint function

L1, L2

Penalty functions for constraint observation

1 Introduction

The modern society is hugely dependent upon the use of electrical energy because of many advantages e.g. convenient form, easy control, greater flexibility, cheapness, cleanness, and high transmission efficiency which can be taken from electrical energy and make it superior over the other forms of energy. The growth of electrical energy usage in modern human life has caught a research attraction during recent years [1, 2]. Accurate models for electrical load forecasting is an important process for operation and planning of companies in order to reliably supply all consumers with required energy. This requirement makes researchers investigate to find novel scientific methodologies to provide more accurate load forecasting models [3].

Short Term Load Forecasting (STLF) plays a key role as an intelligent system to provide the most efficient and economic operation of the electrical generation sources particularly for today’s reforming power system structures [4, 15]. In general, time domain of STLF is considered as one hour to several weeks. Accurate load forecasting can be employed to estimate load flows and make decision that can avoid overloading [5]. For this purpose, different simulation methods e.g. Monte Carlo, point estimation methods have been proposed to decrease the level of uncertainty which pertains to the load forecasting error [6]. Time-series forecasting methods such as Autoregressive (AR) and Autoregressive Moving Average (ARIMA) are considered as two powerful forecasting methods for variety of areas namely electric load forecasting [7].

In [8], a new method based on elliptic-orbit model with weekly periodic extension is proposed for electric peak load movement prediction. In [9], Multivariate Adaptive Regression Splines (MARS) modelling is proposed toward daily peak electricity load forecasting in South Africa. Besides the mentioned methods for STLF, recently Artificial Neural Networks (ANNs) due to their powerful abilities to estimate the nonlinear load parameters relationships have become a popular method used in time series forecasting and more particularly attract the attention of researchers in the load forecasting area. In [10] an ANN-based approach is presented to learn the relationship among past, current, and future temperature and load and the average absolute error for one hour and 24 hours ahead are reached to be only 1.40% and 2.06% respectively. In [11] the authors proposed intelligent power load forecasting algorithm using wavelet packet. Authors employed wavelet packet technique to decompose the load data for extracting different frequencies of load components and then ANN is leveraged to predict the load component of each wavelet packet space. In [12], a dynamic model based on Recurrent Wavelet Network (RWN) is employed for solving STLF problem. In this research, in order to overcome the problem of RWN initialization, authors suggested a new method based on Orthogonal Least Square (OLS) technique. In [13], authors introduced a new hybrid model based on Support Vector Regression (SVR) for electric load forecasting. In order to attain the optimal values of SVR parameters, Krill Herd (KH) optimization algorithm is applied. Authors in [14] have introduced a combined method based on Support Vector Regression (SVR) and Modified Firefly Algorithm (MFA) for short-term load forecasting application. In this research MFA was utilized as an optimization algorithm to intelligently initialize the SVR parameters and the results was compared with some well-known existing algorithm to demonstrate the superiority of the introduced system. A new self-organizing model of fuzzy autoregressive moving average with exogenous input variables (FARMAX) for one day ahead hourly electric load forecasting is proposed in [16]. To achieve the purpose of self-organizing the FARMAX model, identification of the fuzzy model is formulated as a combinatorial optimization problem. Then a combined use of heuristics and evolutionary programming (EP) scheme is relied on to solve the problem of determining optimal number of input variables, best partition of fuzzy spaces and associated fuzzy membership functions. In [17] authors proposed a hybrid method is based on the fuzzy regression tree of a data mining method and the multi-layer perceptron (MLP) of artificial neural networks.

Recently, it has been observed that forecasting methods have become more complicated because of increasing complexity behavior of the forecast factors. Therefore, along with conventional methods, evolutionary algorithms are employed to boost the performance and accuracy of the load forecasting models [18]. In [19], Authors applied Support Vector Regression (SVR) in STFL. Since the precision of STFL is heavily depends upon the selected parameters of SVM, authors used SCE-UA algorithm to optimize the parameters of SVM simultaneously. Authors in [20] proposed new approach for STLF. Curve Fitting Prediction (CFP) along with Genetic Algorithm (GA) is introduced to gain the optimum parameters of Gaussian method which is used to minimize forecasted load error. Similar research work has been carried out based on a hybrid methodology of ANN and ARIMA to take advantage of each model in time series forecasting [21]. In [22], authors conducted a comparative study of three different artificial intelligence models for electric load forecasting namely: 1) Genetic Algorithm (GA), 2) Least Error Squares (LES) and 3) Least Absolute Value Filtering (LAVF) and there it was concluded that GA method obtained the best forecasting results compared to the other methods. Although GA is considered as a powerful method to forecast the peak load demand, there are two limitations existing in GA which can reduce its capability. 1) the dependability of the algorithm upon the initial various parameters such as the size of the population, fitness function, mutation rate, crossover rate, and selection method. and 2) the probability of trapping in local optima. Radial Basis Function (RBF) network, variant of Artificial Neural Network, [23] considered as a powerful modeling technique due to its structure simplicity and high identification performance is proposed to forecast the load consumption. Particle Swarm Optimization (PSO) algorithm is applied to optimize the weighting factor of the RBF network in [23]. Similarly, PSO algorithm also has the same limitation as GA which there is a chance to trap in local optima and difficulty to select probable value of inertia weight.

Hence, a new model based on Clonal Selection Algorithm (CSA) is introduced to improve the Multi-Layer Perceptron (MLP) Artificial Neural Network performance in a way that the prediction error is will be minimized noticeably. A comparison has been made between the traditional MLP ANN and proposed CSA algorithm for MLP ANN. Firstly, the traditional MLP ANN is defined with the most effective architecture e.g. number of hidden layers, type of transformation function, and number of neurons in input, hidden, and output layers, respectively. Secondly, the CSA algorithm is leveraged to tune the weighting and biasing coefficients in order to find the optimal values for these parameters. It will be presented how utilizing the CSA algorithm can enhance the training process of the MLP ANN. Feature selection phase plays an important role for modeling any intelligent system e.g. load forecasting. There are several popular methods available in the literature for this purpose namely Genetic Algorithm [24], Bayesian Network [25], Particle Swarm Optimization [26], Support Vector Machine [27], Partial Correlation Algorithm (PSA) [28], etc. Even though the mentioned algorithms are well-known methods for feature selection, with increasing the number of features for a given dataset they will become computationally expensive due to the complex mathematical structure for selecting the informative and useful input features. As a result, in this paper a new intelligent method for feature selection in introduced by using fuzzy set theory and fuzzy clustering. Finally, data is gathered from 230 K substation system as a case study in order to show the efficiency and performance of the proposed method.

2 Multi-layer perceptron artificial neural network

Multi-Layer Perceptron (MLP) is considered as a model of feedforward ANN which represents a nonlinear mapping between an input vector and an output vector. The architecture of an MLP can be different but in general, it is composed of several fully interconnected neurons distributed in multiple layers in such a way that each node is connected to every node in the next and previous layer. Figure 1 depicts the architecture of two layers MLP.

Fig.1

Multi-layer perceptron neural network architecture.

In Fig. 1, The output of the ith unit (neuron) is formulated as follow: $y_{i} = f_{1} (\sum_{j = 1}^{N_{h}} (w_{i, j}^{2} f_{2} (\sum_{k = 1}^{N_{i}} w_{j, k}^{1} z_{k} + b_{j}^{1}) + b_{i}^{2}))$ (1) where f ₁ and f ₂ are the input and hidden layer transfer functions, respectively; b ¹ and b ² are the biasing matrices of the hidden layer and output layer, respectively; $w_{i, j}^{1}$ is the weighting factor connecting the ith neuron in the input layer to the jth neuron in the hidden layer, $w_{i, j}^{2}$ is the weighting factor connecting the ith neuron in the hidden layer to the jth neuron in the output layer.

3 Fuzzy-based feature selection

In contrast to dimensionality reduction methods such as projection (PCA) or compression, feature selection methods do not alter the original representation of the variables but it is considered as a process in which the number of features can be decreased by identifying and removing non-informative features. Since they preserve the original representation of the variables, accuracy of classifier will not be reduced after removing those features and selecting only a subset of informative features. Furthermore, determining an appropriate feature selection can reduce complexity and dimensionality of the feature space which leads to processing rate acceleration. Therefore, feature selection and extraction are not only an important step toward solving STLF problem, but also they have found their importance in other areas of artificial intelligence such as image classification [37], image summarization [38], and device identification based on unique signal characteristics [29]. In the following section a new fuzzy-based feature selection method by using fuzzy set theory is proposed to enhance the learning process for STLF problem. Before introducing the proposed method for feature selection, in the following section fuzzy set theory is explained in more detail.

3.1 Fuzzy set theory

Fuzzy set theory was introduced by Lofti A. Zadeh in 1965 as a robust and powerful mathematical model for practical imprecise systems. Fuzzy logic has become as a practical method for intelligent systems in extensive areas such as electric load forecasting [30], fuzzy logic controller for industrial robots [31, 32], using fuzzy ontology for qualitative spatial reasoning [33, 34], system uncertainties compensator and intelligent optimization method [35, 36], and adaptive control of nonlinear systems [39, 40], etc. In this part a brief introduction of fuzzy logic theory is reviewed and then the proposed fuzzy-based feature selection for STLF is described. Assume term U is Universe of discourse and x is an element of U. Compared to the crisp set in which x has only two values {0,1}, in fuzzy set each element has a membership grade between 0 and 1. Therefore, Fuzzy set can be formulated as the following equation: $A = {x, μ_{A} (x) | x \in X}; A \in U$ (2)

μ _A is the membership function which defines the degree to which a given input belongs to a set and has output value between 0 and 1. The architecture of a Fuzzy Logic System (FLS) is shown in Fig. 2. composed of the following components and major steps: 1) Fuzzifier: in this step, which is also known as fuzzification, the crisp set of input data are converted to the fuzzy set by linguistic variables and membership functions. 2) The inference engine is constructed from several if-then set of rules. 3) Defuzzifier: during this step (Defuzzification) the resulting output will be converted to the crisp set using the membership functions.

Fig.2

Fuzzy Logic System (FLS) components.

Linguistic variables are the input or output variables of a fuzzy system which are defined as word or a sentence instead of numerical values. Linguistic variables are defined exclusively for each problem and can be different for each application area. For instance, one of the practical application of fuzzy logic is in the area of design of intelligent controller and system identification in which the linguistic variables are defined as error and change of error [41]. Fuzzy logic is composed of several if-then rules to formulate the condition statements as linguistic variables. As explained previously one of the components of fuzzy logic system is fuzzy inference engine which is employed to transfer if-then rules into fuzzy set. There are two transformation methods available for this purpose: Mamdani method and Sugeno method. The following definitions show these two methods: $If x is A and y is B then z is C (Mamdani)$ (3) $If x is A and y is B then z is f (x, y) (Sugeno)$ (4)

Where the if part (x is A and y is B) is known as antecedent part and then part (z is C) is known as consequent part. The main difference between Mamdani (Equation 3) and Sugeno (Equation 4) method is the way that crisp output is generated from the fuzzy input. While Mamdani-type FIS uses the technique of defuzzification of a fuzzy output, Sugeno-type FIS uses weighted average to compute the crisp output.

3.2 Fuzzy-based feature selection method description

In this part an intelligent fuzzy-based feature selection technique is explained. Assume k is the number of data patterns for each input and output variable (x _i, y), then k number of figures that each of them can represent the relation of the output y with the corresponding input variable xi can be plotted. Fuzzy membership function for 2-dimentional space (x _i, y) can be formulated as bellow:

$\begin{matrix} μ_{i, h} & = & exp (- {(\frac{x_{i, h} - x_{i}}{σ})}^{2}); \\ h = 1, 2, 3, \dots, k \end{matrix}$ (5)

In the above membership function σ is selected around 25% of the input candidates. As it was mentioned earlier, different types of membership functions are available. Here, Gaussian membership function is chosen for the proposed fuzzy-based feature selection method. The given membership function in formula (2) defines “if-then” rules between input and output variables. According to the k number of data patterns for any input variable, k number of “if-then” rules is generated. Thereafter, in order to convert the fuzzy grades into the crisp values, defuzzification process is required. Center of Gravity (COG) method and Center of Area (COA) are the most prevalent defuzzification methods. COG method which is also referred to centroid, mathematically obtains the center of mass of the triggered output membership function. Here, centroid defuzzification method is applied for generating the crisp output variables ci by the following equation: $c_{i} (x_{i}) = \frac{\sum_{l = 1}^{k} μ_{i, h} (x_{i}) . y_{h}}{\sum_{l = 1}^{k} μ_{i, h} (x_{i})}$ (6)

The curve pattern of the defuzzified output “c _i” for the corresponding input variable represents how dependent the output variable y is to the input variable x _i in such a way that if the plot of c _i is smooth with no big maximum or minimums then it can be concluded the input variable does not have much influence or dependency to the defuzzified output variable. Therefore, it is considered as non-informative feature in the process of feature selection. In contrast, if the fuzzy plot has fluctuations and irregular patterns in curve, then it will be observed the high influence and dependency between output variable and corresponding input variable are existed. To this end, the relevance of each feature is defined by these two objectives: 1) how rough the fuzzy curve is, and 2) distance between maximum and minimum in the fuzzy curve. Following equation formulates the roughness of the fuzzy curve: $Obj 1 (i) = \sqrt{\frac{1}{k} \sum_{h = 1}^{k} (y_{i, h} - m_{i})^{2}}; i = 1, 2, \dots, n$ (7)

Where m _i is defined as the mean value of a given data set y _i.

Distance between maximum and minimum in the fuzzy curve is formulated as follows as well:

$\begin{matrix} Obj 2 (i) & = & max (y_{i, j} - y_{i, h}); j, h = 1, 2, \dots, k; \\ i = 1, 2, \dots, n \end{matrix}$ (8)

As a result, the higher values of Obj1 (i) and Obj2 (i), the more important and influential corresponding features will be obtained for feature selection.

3.3 Fuzzy clustering method

In this part, fuzzy clustering is applied in order to assign membership grades and establish a balance between the level of importance and relevancy of each objective; Obj1 (i) and Obj2 (i) and input variables. Here, is the membership function which is defined for each objective as follows:

$\begin{matrix} μ_{{Obj}_{i}} (X) \\ = {\begin{matrix} 1 & for {Obj}_{i} (X) \leq {Obj}_{i}^{min} \\ 0 & for {Obj}_{i} (X) \geq {Obj}_{i}^{max} \\ \frac{{Obj}_{i}^{max} - {Obj}_{i} (X)}{{Obj}_{i}^{max} - {Obj}_{i}^{min}} & for {Obj}_{i}^{min} \leq {Obj}_{i} (X) \leq {Obj}_{i}^{max} \end{matrix} \end{matrix}$ (9)

The values of ${Obj}_{i}^{max}$ and ${Obj}_{i}^{min}$ will be attained from the n number of input variables. Then, the useful features are ordered based on the below formula: $N μ (j) = \frac{\sum_{i = 1}^{m} ω_{i} \times μ_{{Obj}_{i}} (X_{j})}{\sum_{j = 1}^{n} \sum_{i = 1}^{m} ω_{i} \times μ_{{Obj}_{i}} (X_{j})}$ (10)

Where ω _i is the weight for the ith criterion and m is the number of criteria (Obj1 & Obj2) which here equals 2. Although the aforementioned objective functions (Equations 6 & 7) are considered as two methods to improve feature selection process separately, they may have behavioral confliction. In order to address this confliction issue and could be able to satisfy both of the objective function in parallel, a multi-objective model is introduced in Equation 8 based on fuzzy minimax clustering model by minimizing the maximum value of the set of weighted cluster variation after performing normalization procedure which are obtained from Equation 9.

4 Clonal selection algorithm (CSA)

Evolutionary algorithms have become a powerful metaheuristic solution for optimization tasks due to their flexibility for the underlying fitness landscape and random and population-based characteristics. Different evolutionary-based methods are introduced by researchers for optimization namely Chaos-search Genetic Algorithm (CGA), Simulated Annealing (SA) [42], simulated rebounding algorithm (SRA) [43], Particle Swarm Optimization (PSO) [44], Artificial Bee Colony (ABC) [45], Cuckoo Search [46], etc. Although the mentioned metaheuristic techniques are well-known with satisfactory results for different applications, the main limitation of them is their dependency on the adjusting parameters of ANN. Clonal Selection Algorithms (CSA) is employed to adjust the weighting and biasing factors of ANN in order to enhance the training mechanism. Clonal Selection Algorithm (CSA) is a new method which is introduced to the field of evolutionary computation by abstracting and applying the same principles of Burnet’s theory. The immune system is considered as distributing in terms of control, adaptive in terms of function and parallel in terms of operation. All of these immune characteristics can be applied in the field of intelligent systems.

In clonal selection theory, the variety of antibodies molecules, which is employed to protect the organism form invasion, is generated by B-lymphocyte. B lymphocyte is a type of white blood cell which produces inimitable and customized antibodies for particular type. Clonal selection theory declares that each body organism has pre-existing pool of unique antibodies that have the capability of antigen recognition and matching with some level of accuracy. When the antigen is identified and matched specific antibody, then it prompts the cell to produce more cells with the same receptor. Throughout the cell proliferation stage, genetic mutations happen in the clone of cells which advocate the match with the antigen. This process can be considered as a kind of Darwinian theory in which the cell that has the highest match with the antigens are observed as the survival cell. Utilizing the Clonal Selection Algorithm in the engineering problem is to establish a memory pool of antibodies to show optimal solutions for a given problem. In other word, each antibody and antigen are considered as a solution of the given problem and evolution of the problem space, respectively. The algorithm has the powerful capability for both local and global search mechanism. Regarding the local search mechanism, it is achieved by hyper mutation of the cloned antibodies and for the global search mechanism, the solution is found by generating random antibodies to be added to the population in order to expand the diversity to avoid potentially trapping to the local optima. For each antibody, the fitness or affinity function is formulated as follows: $Affinity = fitness = \frac{1}{1 + objective function}$ (11)

The population in the fitness function is sorted out based on the antibodies proportional clones to their affinity which means that there is a direct relation between the number of generated clones for an antibody and their affinity. The higher number of clones for an antibody, the more affinity includes as below equation indicates: $numClones = round (\frac{β \times N}{i} + 0.5)$ (12)

During the affinity maturation process, which leads to the better antigen for clones, the maturation degree is reversed to their parent affinity. In other word, the clones with more affinity has lower mutation degree as follows: $α_{i} = β^{- 1} \times exp (- {fitness}_{i})$ (13)

At this point, the clones are prone to the antigen and fitness function is computed. This steps will be iterated till the clones’ population reach the mutation. At this stage, in order to select the identical number of clones as the size of the initial population, the tournament test is performed. At the end, the optimal solution and population will be updated. This procedure is iterated until the stopping requirement is met. Figure 3 demonstrates the completing workflow of the CSA algorithm.

Fig.3

CSA algorithm workflow.

As it is shown in Fig. 3, the first step of CSA is initialization of antibody pool with fix size of N and then the pool is divided into two categories: memory antibody section m that is become the algorithm solution and remaining anti-body pool r which is utilized to add more diversity into the population. Having initialization process is completed, the algorithm will be executing a number of iteration process in such a way that all known antigens get exposed to the system. It should be noted that G generation is user configurable. However, the system can assign some termination criteria as well. In the loop a single antigen will be chosen in random fashion without replacement from the antigen pool. Then, the system calculates the affinity values for all antibodies against the antigens. After that a set of n antibodies are chosen which have the highest affinity with the antigen. In the next step, the selected antibodies are ranked in proportion to their affinity. During the affinity maturation process, the selected antigens will go through the maturation process which has inverse relation to their parent’s affinity (greater affinity, lower mutation). The clone is then exposed to the antigen at the next step and then the anti-bodies with the highest affinity in the clone will be candidate as memory antibodies for replacement into m. Eventually, the d individuals in the remaining r antigen pool which obtained the lowest affinity will be replaced with the new random antibodies. Having completed the iterations, the memory m antigen pool component is selected as algorithm solution.

4.1 Self-Adaptive Modified CSA (SAMCSA)

In this study, a new self-adaptive modified technique is introduced to improve both local and global search capability of the CSA and prevent the algorithm to converge prematurely. The modified technique composed of three sub modification portions to improve the diversity of the population, bias the average of the population toward the optimal solution, and establish a balanced search mechanism for both locally and globally, respectively.

Sub-modification portion 1: In order to enlarge the population diversity Lévy flight as a powerful optimization algorithm is applied. Lévy flight algorithm is type of a random-walk in which the step-lengths have heavy-tailed probability distribution which lead to efficiently exploring the search space to find globally best solution. Lévy flight definition is defined as follow: ${Le}^{'} vy (ω) \sim τ = t^{- ω}; (1 < ω \leq 3)$ (14)

Then a new potential solution is triggered and if the new solution is more optimal than the previous generated solution then replacement will be occurred which can be defined as follows: $X_{i}^{new} = X_{i}^{old} + φ_{1} \oplus {Le}^{'} vy (ω)$ (15)

Sub-modification portion 2: this modification is applied in order to proceed the population average to the best solution which is defined as follows: $X_{i}^{new} = X_{i}^{old} + T_{F} (X_{Gbest} - M_{p})$ (16)

Sub-modification portion 3: As it is shown in Equation 11 the clonal factor (β) can have a relation between the number of clones and pertinent antibodies. The clonal factor has a range of (0,1]. If the value of clonal factor is small, then it causes the algorithm try to explore more in local area. In contrast, the high value of clonal factor can cause the issue of searching in far and irrelevant areas for global searching. Hence, in this modification portion a balanced search mechanism is formed to provide better search results both locally and globally which is defined as follows: $β^{t + 1} = (1 / (10 N_{t}))^{10 / N_{t}} β^{t}$ (17)

5 Artificial neural network adjusting factors based-on SAMCSA

As it was explained previously, the self-adaptive modified clonal selection algorithm is applied for ANN weight and biasing coefficients in such a way to boost the training procedure. Moreover, SAMCSA is used as an optimization technique and the obtaining results is compared with the classic training algorithm e.g. backpropagation. There are two underlying factors behind the proposed technique: Firstly, applying the evolutionary algorithm (SAMCSA in this study) to tune the adjusting factor for ANN can be prolonged process and can potentially reduce the efficiency of ANN when quick training method matters. Secondly, the obtaining results from the evolutionary techniques might not be as promising as traditional training algorithm like backpropagation. Hence, in this paper different hybrid method named as SAMCSA-ANN is introduced to conquer the mentioned drawbacks and enhance the training process in a way that the prediction accuracy of load peak will be increased remarkably. The proposed method consists of two major steps: firstly, the ANN is trained by using some traditional training algorithm which means that all adjusting factors, an appropriate neural network architecture (biasing and weight coefficients, number of neuron in each layer, number of hidden layers, and transformation function) are chosen by experimental effort. Then, the optimization task is taken into account. Each element in the population is made from all the weight and biasing factors of ANN as formulated below:

$\begin{matrix} {\bar{X}}_{i} & = & [w_{i, 1}, w_{i, 2}, \dots, w_{i, Nw}, b_{i, 1}, b_{i, 2}, \dots, b_{i, Nb}]_{(1, N)}; \\ N = N_{w} + N_{b} \end{matrix}$ (18)

The most suitable weighting and biasing coefficients will be chosen for center of the control vectors (antibodies) in CSA. Afterward, the difference range known as error range will be utilized for the control vector constraint in order to place the adjusted weighting and biasing coefficients in the center of defined constraints. $L_{min} {\bar{X}}_{K} \leq {\bar{X}}_{i} \leq L_{max} {\bar{X}}_{K}$ (19)

L _min and L _max are the constant values that are defined based on the values of weighting and biasing coefficients. It is worth mentioning that if the error interval, L _min and L _max, are defined too small, the efficiency of SAMCSA will be decreased due to the restriction of the control vector which is resulted in the small space. On the contrary, if the values of the error interval variables are defined too large, the capability of SAMCSA can be affected negatively due to the enlarging the search space. Therefore, the proposed method establishes a win-win situation between the classic training algorithm like backpropagation and the population-based evolutionary algorithm (here CSA). Firstly, the weighing and biasing coefficients are adjusted by using classic training algorithm and discovering the best values for the aforementioned parameters. Then these values are applied as the mean values for the control vector in CSA.

6 Proposed SAMCSA-ANN method evaluation for load forecasting application

As discussed earlier, SAMCSA-ANN is an optimization technique that can be utilized for short term load forecasting problem. Generally, in all optimization problems some yardsticks should be defined as an objective function in order to evaluate the enhancement and efficiency of the predicted values (here load peak values). Relative error σ _i and Root Mean Square Error (RMSE) are defined as two objective functions, respectively. $σ_{i} % = \frac{| P_{i} - T_{i} |}{T_{i}} \times 100, i = 1, 2, \dots, N_{es}$ (20) $RMSE = \sqrt{\frac{1}{N_{es}} \sum_{i = 1}^{N_{es}} σ_{i}^{2}}$ (21)

The lower value of RMSE, the more accurate predicted results is obtained and the level of uncertainty will be diminished consequently. The implementation of the SAMCSA-ANN follows the below steps:

Step 1) Determine input variables, training algorithm, validation data, artificial neural network structure (number of neurons in input layer, number of neuron in output layer, type of transformation function), number of antibodies in CSA, and the error interval (L _min and L _max).

Step 2) Data normalization: both training and test data should be normalized so that all the input variables are at the comparable range which help to transpose input variables into the data range that transformation function lies in.

Step 3) Applying Fuzzy-based feature selection method so that the most useful and informative features both for input and output variables will be chosen.

Step 4) ANN training process: backpropagation training algorithm is used by adjusting the weighting and biasing factors based on the experimental methods.

Step 5) Applying the L _min and L _max as error interval to the control vector in a way that adjusted weighting and biasing factors of the ANN would be placed in the center of the restraints.

Step 6) Running the SAMCSA as optimization process to discover the most suitable values for the parameters in Equation 17 by satisfying the following objective function: $Min F (X) = Min RMSE$ (22)

Step 7) Converting the constraint optimization problem into the unconstraint problems by using below supplementary objective function and penalty function:

$\begin{matrix} S (X) & = & F (X) + L_{1} \sum_{j = 1}^{N_{eq}} (G_{j} (X))^{2} \\ + L_{2} (\sum_{j = 1}^{Nueq} (Max [0, - H_{j} (X)])^{2}) \end{matrix}$ (23)

Step 8) Generate pool of antibodies in CSA as initial population.

Step 9) Perform objective function (here RMSE) for all antibodies in CSA.

Step 10) Calculate the affinity function as explained in Section 4.1.

Step 11) Perform the Self-Adaptive Modified CSA (SAMCSA) part to improve the overall search ability and premature convergence avoidance.

Step 12) Update the population. Note that by applying the modified portions of the CSA, the initial population (antibodies) will be updated as more diverse population.

Step 13) Examine the stopping requirement is satisfied or not. Stopping requirement can be a pre-defined for the number of loop or a particular value for objective function. If the stopping requirement is met, terminate the algorithm; otherwise go to Step 11.

To this end, the most efficient and appropriate weighting and biased coefficient will be discovered by using the SAMCSA algorithm so that the whole search space will is explored.

7 Simulation results

To demonstrate the effectiveness and enhanced performance of our method, the real dataset for daily peak value of load consumption of Tehran province, Iran from August 2001 to August 2016 is gathered and the proposed forecasting method is applied for on dataset in September 2016. The initial population in the SAMCSA is chosen as 20 antibodies and the number of iteration is 150 as stopping requirement. The underlying reason behind selecting this number as stopping criteria is that it was discerned that after this iteration the algorithm goes to the saturated level and no further improvement was observed. In PSO algorithm, the number of particles and generation are 35 and 10000, respectively. In addition, the values of maximum velocity and inertia weight factor is chosen as 2 and 0.7, respectively. As for Genetic Algorithm, the value initial population and number of iteration is set to 700 individuals and 200 iterations. Also the values of mutation probability and crossover is chosen as 0.07 and 0.7, respectively. As discussed earlier, the major benefit of the proposed fuzzy-based feature selection is its quick response with same accuracy compared to the other existing methods. Figure 4. depicts the processing time of proposed fuzzy-based feature section method along with other popular methods.

Fig.4

Computational time of different feature selection methods.

The number of features in the original dataset was 70 which deduced to 10 features after applying the fuzzy-based feature selection to leave out the irrelevant and non-informative features which is represented in Table 1. After sorting the input variables in descending order by the feature selection method, different number of input variables like 4 and 13 were tested and eventually 10 as number of input variable is chosen. In fact, N _μ in Equation 9 can be considered as effective factor to estimate the number of input features.

Table 1

Most relevant feature set for electric load prediction

Parameter	Definition
INP_Feature_1	Electric load consumption in the same day of 6 months ago
INP_Feature_2	Electric load consumption in the same day of 5 months ago
INP_Feature_3	Electric load consumption in the same day of 4 months ago
INP_Feature_4	Electric load consumption in the same day of 3 months ago
INP_Feature_5	Electric load consumption in the same day of 2 months ago
INP_Feature_6	Electric load consumption in the same day of 1 month ago
INP_Feature_7	Electric load consumption in 4 days before the predicted day
INP_Feature_8	Electric load consumption in 3 days before the predicted day
INP_Feature_9	Electric load consumption in 2 days before the predicted day
INP_Feature_10	Electric load consumption in 1 day before the predicted day
OUT_Feature_1	Electric load value in the predicated day

Constructing the ANN architecture like number of input variables, number of hidden layers and number of neuron for each layer is considered as experimental knowledge to end up with the best structure containing satisfactory level of accuracy. In this study, ANN constructed as two hidden layers with 4 and 7 neurons for each layer, respectively. Shown in Fig. 5 is the relative error σ _i for 30 days ahead for different prediction models namely traditional ANN, ANN-PSO, ANN-GA, ANN-CSA, ANN-SAMCSA. By conducting comparative analysis of traditional ANN with other evolutionary-based ANN, it can be observed that the proposed training method has better results in terms of less relative error and higher prediction accuracy. Additionally, shown in Table 2, the superiority and efficiency of the proposed method in terms of decreasing the Root Mean Square Error (RMSE) is observed. As it is shown in Table 2. RMSE error has decreased from 4.1943 in traditional ANN to 1.9759 in the proposed ANN-SAMCSA method. In order to demonstrate the promising results of proposed ANN-SAMCSA method compared to the traditional ANN, Fig. 6 presents a comparison of the actual load consumption values, predicted load values by traditional ANN, and predicted load values by proposed ANN-SAMCSA, in parallel.

Fig.5

Relative Error (σ _i) for 30 days ahead electric load forecasting.

Fig.6

Comparison between the traditional ANN and proposed ANN-SAMCSA method.

Table 2

RMSE values for five different methods

Method	Root Mean Square Error
ANN	4.1943
ANN-GA	3.7932
ANN-PSO	2.9780
ANN-CSA	2.4855
ANN_SAMCSA	1.9759

8 Conclusion

In this article, a new method by using fuzzy-based feature selection and SAMCSA-ANN training method is proposed to enhance the performance and accuracy of the load forecasting prediction. In order to demonstrate the superiority of the proposed hybrid method over the traditional load forecasting method, ANN was trained by using the traditional training algorithm e.g. backpropagation in such a way that ANN architecture, weighting and biasing parameters is achieved by the experimental effort and then SAMCSA, modified evolutionary population-based optimization algorithm is applied to decrease the relative error by tuning the weighting and biasing factors efficiently. In addition to the proposed training ANN algorithm, a fuzzy-based feature selection is introduced to select only informative features in dataset and get rid of the noisy data which leads to features deduction by keeping the satisfactory prediction accuracy of modified method which is made up of fuzzy set theory and fuzzy clustering. In order to evaluate the performance and improved efficiency of the proposed method, a real dataset for daily peak load of Fars province in Iran is exploited. The obtained results indicate not only the population-based evolutionary technique can be introduced as a preferred alternative training algorithm for ANN, but it also demonstrates that the modification which is made for the CSA method enhanced the overall performance and accuracy of CSA as an optimization model.

References

Papari

, Edrington

C.S.

, Vu

T.V.

, Diaz-Franco

, A heuristic method for optimal energy management of DC microgrid, In DC Microgrids (ICDCM), 2017 IEEE Second International Conference on, IEEE, (2017), pp. 337–343.

Senjyu

, Takara

, Uezato

and Funabashi

, One-hour-ahead load forecasting using neural network, IEEE Transactions on Power Systems 17(1) (2002), 113–118.

Papari

, Edrington

C.S.

and Kavousi-Fard

, An effective fuzzy feature selection and prediction method for modeling tidal current: A case of persian gulf, IEEE Transactions on Geoscience and Remote Sensing 55(9) (2017), 4956–4961.

Kavousi-Fard

, Khosravi

and Nahavadi

, A new fuzzy based combined prediction interval for wind power forecasting, IEEE Trans Power System 31 (2016), 18–26.

Fan

and Hyndman

R.J.

, Short-term load forecasting based on a semi-parametric additive model, IEEE Transactions on Power Systems 27(1) (2012), 134–141.

Kavousi-Fard

, A novel probabilistic method to model the uncertainty of tidal prediction, IEEE Trans Geo and Remote Sens 55(2) (2017), 828–833.

Hagan

M.T.

and Behr

S.M.

, The time series approach to short term load forecasting, IEEE Transactions on Power Systems 2(3) (1987), 785–791.

Zong-chang

, Modelling and forecasting of electric daily peak load movement based on the elliptic-orbit model with weekly periodic extension: A case study, IET Generation, Transmission & Distribution 8(12) (2014), 2046–2054.

Sigauke

and Chikobvu

, Daily peak electricity load forecasting in South Africa using a multivariate non-parametric regression approach, ORiON 26(2) (2010).

10.

Park

D.C.

, El-Sharkawi

M.A.

, Marks

R.J.

, Atlas

L.E.

and Damborg

M.J.

, Electric load forecasting using an artificial neural network, IEEE transactions on Power Systems 6(2) (1991), 442–449.

11.

Bashir

, El-Hawary

M.E.

, Short term load forecasting by using wavelet neural networks, In Electrical and Computer Engineering, 2000 Canadian Conference on IEEE Vol. 1, 2000, pp. 163–166.

12.

Baniamerian

, Asadi

and Yavari

, Recurrent wavelet network with new initialization and its application on short-term load forecasting, In Computer Modeling and Simulation, 2009 EMS’09 Third UKSim European Symposium on IEEE, 2009, pp. 379–383.

13.

Kavousi-Fard

, Niknam

and Fotuhi-Firuzabad

, A novel stochastic framework based on cloud theory and Θ-modified bat algorithm to solve the distribution feeder reconfiguration, IEEE Trans Smart Grid 7(2) (2015), 740–750.

14.

Kavousi-Fard

, Samet

and Marzbani

, A new hybrid modified firefly algorithm and support vector regression model for accurate short term load forecasting, Expert Systems with Applications 41(13) (2014), 6047–6056.

15.

, Cheng

C.T.

, Lin

J.Y.

and Zeng

, Short-term load forecasting using support vector machine with SCE-UA algorithm, In Natural Computation, 2007 ICNC 2007 Third International Conference on, Vol. 1, 2007, pp. 290–294.

16.

Yang

H.T.

and Huang

C.M.

, A new short-term load forecasting approach using self-organizing fuzzy ARMAX models, IEEE Transactions on Power Systems 13(1) (1998), 217–225.

17.

Kavousi-Fard

, Modeling uncertainty in tidal current forecast using prediction interval-based SVR, IEEE Trans Sustainable Energy 8(2) (2017), 708–715.

18.

Kavousi-Fard

, A hybrid accurate model for tidal current prediction, IEEE Trans Geo and Remote Sens 55(1) (2017), 112–118.

19.

Farahat

M.A.

, Talaat

, A new approach for short-term load forecasting using curve fitting prediction optimized by genetic algorithms, Proceedings of the 14th International Middle East Power Systems Conference, 2010.

20.

Papari

, Vu

and Edrington

C.S.

, Stochastic operation of interconnected microgrids, In Power & Energy Society General Meeting, 2017 IEEE, IEEE 2017, pp. 1–5.

21.

El-naggar

K.M.

and Al-rumaih

K.A.

, Electric load forecasting using genetic based algorithm, optimal filter estimator and least error squares technique: Comparative study, World Academy of Science, Engineering and Technology 6 (2005), 138–142.

22.

Lei

S.L.

, Sun

C.X.

, Zhou

, Zhang

X.X.

and Cheng

, Short-term load forecasting method based on RBF neuralNetwork and anfis system, Proceedings-Chinese Society of Electrical Engineering 25(22) (2005), 78.

23.

Hong

W.C.

, Chaotic particle swarm optimization algorithm in a support vector regression electric load forecasting model, Energy Conversion and Management 50(1) (2009), 105–117.

24.

Yang

and Honavar

, Feature subset selection using a genetic algorithm, IEEE Intelligent Systems and their Applications 13(2) (1998), 44–49.

25.

Inza

, Larrañaga

, Etxeberria

and Sierra

, Feature subset selection by Bayesian network-based optimization, Artificial Intelligence 123(1-2) (2000), 157–184.

26.

Papari

, Edrington

C.S.

, Bhattacharya

and Radman

, Effective energy management of hybrid AC-DC microgrids with storage devices, IEEE Transactions on Smart Grid (2017).

27.

Kavousi-Fard

, Khosravi

and Nahavadi

, Reactive power compensation in electric arc furnaces using prediction intervals, IEEE Trans Industrial Electronics 64(7) (2017), 5295–5303.

28.

Lallich

and Rakotomalala

, Fast feature selection using partial correlation for multi-valued attributes, European Conference on Principles of Data Mining and Knowledge Discovery, Springer Berlin Heidelberg, 2000, pp. 221–231.

29.

Avatefipour

, Hafeez

, Tayyab

and Malik

, Linking Received Packet to the Transmitter Through Physical-Fingerprinting of Controller Area Network, Rennes, France, (Accepted), IEEE International Workshop on Information Forensics and Security (WIFS) Conference (2017).

30.

Ranaweera

D.K.

, Hubele

N.F.

and Karady

G.G.

, Fuzzy logic for short term load forecasting, International Journal of Electrical Power & Energy Systems 18(4) (1996), 215–222.

31.

Sahamijoo

, Avatefipour

, Nasrabad

M.R.S.

, Taghavi

and Piltan

, Research on minimum intelligent unit for flexible robot, International Journal of Advanced Science and Technology 80(6) (2015), 79–104.

32.

Mokhtar

, Piltan

, Mirshekari

, Khalilian

and Avatefipour

, Design minimum rule-base fuzzy inference nonlinear controller for second order nonlinear system, International Journal of Intelligent Systems and Applications 6(7) (2014), 79.

33.

Samani

, Shamsfard

, A fuzzy ontology model for qualitative spatial reasoning, Paper Presented at the Computer Sciences and Convergence Information Technology (ICCIT), 2011 6th International Conference on 2011.

34.

Samani

and Shamsfard

, On the application of fuzzy ontology for qualitative spatial reasoning, Journal of Next Generation Information Technology (JNIT) 3(2) (2012), 9–23.

35.

Kavousi-Fard

and Su

, A combined prognostic model based on machine learning for tidal current prediction, IEEE Trans Geo and Remote Sens 55(6) (2017), 3108–3114.

36.

Shahcheraghi

, Piltan

, Mokhtar

, Avatefipour

and Khalilian

, Design a novel SISO off-line tuning of modified PID fuzzy sliding mode controller, International Journal of Information Technology and Computer Science (IJITCS) 6(2) (2014), 72.

37.

Abdollahpour

, Samani

Z.R.

, Moghaddam

M.E.

, Image classification using ontology based improved visual words, In Electrical Engineering (ICEE), 2015 23rd Iranian Conference on 2015, pp. 694–698.

38.

Samani

Z.R.

and Moghaddam

M.E.

, A knowledge-based semantic approach for image collection summarization, Multimedia Tools and Applications 76(9) (2017), 11917–11939.

39.

T.V.

, Perkins

, Papari

, Vahedi

and Edrington

C.S.

, Distributed adaptive control design for cluster of converters in DC distribution systems, In DC Microgrids (ICDCM), 2017 IEEE Second International Conference on IEEE, 2017, pp. 197–201.

40.

Avatefipour

, Piltan

, Nasrabad

M.R.S.

, Sahamijoo

and Khalilian

, Design new robust self tuning fuzzy backstopping methodology, International Journal of Information Engineering and Electronic Business 6(1) (2014), 49.

41.

Khalilian

, Sahamijoo

, Avatefipour

, Piltan

and Nasrabad

M.R.S.

, Design high efficiency-minimum rule base PID like fuzzy computed torque controller, International Journal of Information Technology and Computer Science (IJITCS) 6(7) (2014), 77.

42.

Liao

G.-C.

and Tsao

T.-P.

, Application of a fuzzy neural network combined with a chaos genetic algorithm and simulated annealing to short-term load forecasting, in IEEE Transactions on Evolutionary Computation, vol. 10, no 3.

43.

Niknam

, Kavousi-Fard

and Ostadi

, Impact of hydrogen production and thermal energy recovery of PEMFCPPs on optimal management of renewable micro-grids, IEEE Trans Industrial Informatics 11(5) (2015), 1190–1197.

44.

, Zhou

J.Z.

and He

Y.Y.

, Particle swarm optimization-based neural network model for short-term load forecasting, Power System Protection and Control 38(12) (2010), 65–68.

45.

Hong

W.C.

, Electric load forecasting by seasonal recurrent SVR (support vector regression) with chaotic artificial bee colony algorithm, Energy 36(9) (2011), 5568–5578.

46.

Hong

W.C.

, Electric load forecasting by seasonal recurrent SVR (support vector regression) with chaotic artificial bee colony algorithm, Energy 36(9) (2011), 5568–5578.