Abstract
In this paper, a new combined method based on Clonal Selection Algorithm (CSA) and Artificial Neural Network (ANN) machine learning algorithm has been presented for the Short Term Load Forecasting (STLF) application. Compared to the other existing evolutionary based algorithm in this area, the proposed technique exploits both the ANN’s learning properties for solving the nonlinear and complex problems and CSA population-based algorithm for global and local search. Moreover, in order to select the most informative and irredundant features from the input feature set, a new feature selection method is introduced by using fuzzy set theory and fuzzy clustering techniques. In regards to overall performance enhancement of CSA algorithm, three sub-modifications are proposed to expand the search capability of CSA and avoid premature convergence. Finally, in order to demonstrate the effectiveness and superiority of proposed method compared to other existing methods, the real dataset of daily peak value of electric load consumption is provided and simulation results reveal the improved forecasting accuracy of the proposed method over the other popular techniques in the STLF application.
Keywords
Nomenclature
Number of input variables for ANN
Number of input layer for ANN
Number of hidden layer for ANN
Number of output layer for ANN
Transfer function for input layer in ANN
Transfer function for hidden layer in ANN
Weighting factor connecting the ith neuron in the input layer to the jth neuron in the hidden layer
Weighting factor connecting the ith neuron in the hidden layer to the jth neuron in the output layer
Fuzzy membership function
Input candidate
Number of training patterns
Output candidate
Data of the hth pattern of the ith input
Spread factor of the membership function
Crisp output (Defuzzified)
The mean value of data set yi
Weight for the ith objective in fuzzy clustering
Order of the antibody in the population
Number of weighting factors of proposed SAMCSA-ANN
Number of biasing factors of proposed SAMCSA-ANN
Vector of the adjusted weighting and biasing coefficient
Situation of the kth antibody in the population
Evaluated load values of the ith day in objective function
Actual load values of the ith day in objective function
Number of forecasted data samples
Constraint objective function
Equality constraints function
Inequality constraint function
Penalty functions for constraint observation
Introduction
The modern society is hugely dependent upon the use of electrical energy because of many advantages e.g. convenient form, easy control, greater flexibility, cheapness, cleanness, and high transmission efficiency which can be taken from electrical energy and make it superior over the other forms of energy. The growth of electrical energy usage in modern human life has caught a research attraction during recent years [1, 2]. Accurate models for electrical load forecasting is an important process for operation and planning of companies in order to reliably supply all consumers with required energy. This requirement makes researchers investigate to find novel scientific methodologies to provide more accurate load forecasting models [3].
Short Term Load Forecasting (STLF) plays a key role as an intelligent system to provide the most efficient and economic operation of the electrical generation sources particularly for today’s reforming power system structures [4, 15]. In general, time domain of STLF is considered as one hour to several weeks. Accurate load forecasting can be employed to estimate load flows and make decision that can avoid overloading [5]. For this purpose, different simulation methods e.g. Monte Carlo, point estimation methods have been proposed to decrease the level of uncertainty which pertains to the load forecasting error [6]. Time-series forecasting methods such as Autoregressive (AR) and Autoregressive Moving Average (ARIMA) are considered as two powerful forecasting methods for variety of areas namely electric load forecasting [7].
In [8], a new method based on elliptic-orbit model with weekly periodic extension is proposed for electric peak load movement prediction. In [9], Multivariate Adaptive Regression Splines (MARS) modelling is proposed toward daily peak electricity load forecasting in South Africa. Besides the mentioned methods for STLF, recently Artificial Neural Networks (ANNs) due to their powerful abilities to estimate the nonlinear load parameters relationships have become a popular method used in time series forecasting and more particularly attract the attention of researchers in the load forecasting area. In [10] an ANN-based approach is presented to learn the relationship among past, current, and future temperature and load and the average absolute error for one hour and 24 hours ahead are reached to be only 1.40% and 2.06% respectively. In [11] the authors proposed intelligent power load forecasting algorithm using wavelet packet. Authors employed wavelet packet technique to decompose the load data for extracting different frequencies of load components and then ANN is leveraged to predict the load component of each wavelet packet space. In [12], a dynamic model based on Recurrent Wavelet Network (RWN) is employed for solving STLF problem. In this research, in order to overcome the problem of RWN initialization, authors suggested a new method based on Orthogonal Least Square (OLS) technique. In [13], authors introduced a new hybrid model based on Support Vector Regression (SVR) for electric load forecasting. In order to attain the optimal values of SVR parameters, Krill Herd (KH) optimization algorithm is applied. Authors in [14] have introduced a combined method based on Support Vector Regression (SVR) and Modified Firefly Algorithm (MFA) for short-term load forecasting application. In this research MFA was utilized as an optimization algorithm to intelligently initialize the SVR parameters and the results was compared with some well-known existing algorithm to demonstrate the superiority of the introduced system. A new self-organizing model of fuzzy autoregressive moving average with exogenous input variables (FARMAX) for one day ahead hourly electric load forecasting is proposed in [16]. To achieve the purpose of self-organizing the FARMAX model, identification of the fuzzy model is formulated as a combinatorial optimization problem. Then a combined use of heuristics and evolutionary programming (EP) scheme is relied on to solve the problem of determining optimal number of input variables, best partition of fuzzy spaces and associated fuzzy membership functions. In [17] authors proposed a hybrid method is based on the fuzzy regression tree of a data mining method and the multi-layer perceptron (MLP) of artificial neural networks.
Recently, it has been observed that forecasting methods have become more complicated because of increasing complexity behavior of the forecast factors. Therefore, along with conventional methods, evolutionary algorithms are employed to boost the performance and accuracy of the load forecasting models [18]. In [19], Authors applied Support Vector Regression (SVR) in STFL. Since the precision of STFL is heavily depends upon the selected parameters of SVM, authors used SCE-UA algorithm to optimize the parameters of SVM simultaneously. Authors in [20] proposed new approach for STLF. Curve Fitting Prediction (CFP) along with Genetic Algorithm (GA) is introduced to gain the optimum parameters of Gaussian method which is used to minimize forecasted load error. Similar research work has been carried out based on a hybrid methodology of ANN and ARIMA to take advantage of each model in time series forecasting [21]. In [22], authors conducted a comparative study of three different artificial intelligence models for electric load forecasting namely: 1) Genetic Algorithm (GA), 2) Least Error Squares (LES) and 3) Least Absolute Value Filtering (LAVF) and there it was concluded that GA method obtained the best forecasting results compared to the other methods. Although GA is considered as a powerful method to forecast the peak load demand, there are two limitations existing in GA which can reduce its capability. 1) the dependability of the algorithm upon the initial various parameters such as the size of the population, fitness function, mutation rate, crossover rate, and selection method. and 2) the probability of trapping in local optima. Radial Basis Function (RBF) network, variant of Artificial Neural Network, [23] considered as a powerful modeling technique due to its structure simplicity and high identification performance is proposed to forecast the load consumption. Particle Swarm Optimization (PSO) algorithm is applied to optimize the weighting factor of the RBF network in [23]. Similarly, PSO algorithm also has the same limitation as GA which there is a chance to trap in local optima and difficulty to select probable value of inertia weight.
Hence, a new model based on Clonal Selection Algorithm (CSA) is introduced to improve the Multi-Layer Perceptron (MLP) Artificial Neural Network performance in a way that the prediction error is will be minimized noticeably. A comparison has been made between the traditional MLP ANN and proposed CSA algorithm for MLP ANN. Firstly, the traditional MLP ANN is defined with the most effective architecture e.g. number of hidden layers, type of transformation function, and number of neurons in input, hidden, and output layers, respectively. Secondly, the CSA algorithm is leveraged to tune the weighting and biasing coefficients in order to find the optimal values for these parameters. It will be presented how utilizing the CSA algorithm can enhance the training process of the MLP ANN. Feature selection phase plays an important role for modeling any intelligent system e.g. load forecasting. There are several popular methods available in the literature for this purpose namely Genetic Algorithm [24], Bayesian Network [25], Particle Swarm Optimization [26], Support Vector Machine [27], Partial Correlation Algorithm (PSA) [28], etc. Even though the mentioned algorithms are well-known methods for feature selection, with increasing the number of features for a given dataset they will become computationally expensive due to the complex mathematical structure for selecting the informative and useful input features. As a result, in this paper a new intelligent method for feature selection in introduced by using fuzzy set theory and fuzzy clustering. Finally, data is gathered from 230 K substation system as a case study in order to show the efficiency and performance of the proposed method.
Multi-layer perceptron artificial neural network
Multi-Layer Perceptron (MLP) is considered as a model of feedforward ANN which represents a nonlinear mapping between an input vector and an output vector. The architecture of an MLP can be different but in general, it is composed of several fully interconnected neurons distributed in multiple layers in such a way that each node is connected to every node in the next and previous layer. Figure 1 depicts the architecture of two layers MLP.

Multi-layer perceptron neural network architecture.
In Fig. 1, The output of the ith unit (neuron) is formulated as follow:
In contrast to dimensionality reduction methods such as projection (PCA) or compression, feature selection methods do not alter the original representation of the variables but it is considered as a process in which the number of features can be decreased by identifying and removing non-informative features. Since they preserve the original representation of the variables, accuracy of classifier will not be reduced after removing those features and selecting only a subset of informative features. Furthermore, determining an appropriate feature selection can reduce complexity and dimensionality of the feature space which leads to processing rate acceleration. Therefore, feature selection and extraction are not only an important step toward solving STLF problem, but also they have found their importance in other areas of artificial intelligence such as image classification [37], image summarization [38], and device identification based on unique signal characteristics [29]. In the following section a new fuzzy-based feature selection method by using fuzzy set theory is proposed to enhance the learning process for STLF problem. Before introducing the proposed method for feature selection, in the following section fuzzy set theory is explained in more detail.
Fuzzy set theory
Fuzzy set theory was introduced by Lofti A. Zadeh in 1965 as a robust and powerful mathematical model for practical imprecise systems. Fuzzy logic has become as a practical method for intelligent systems in extensive areas such as electric load forecasting [30], fuzzy logic controller for industrial robots [31, 32], using fuzzy ontology for qualitative spatial reasoning [33, 34], system uncertainties compensator and intelligent optimization method [35, 36], and adaptive control of nonlinear systems [39, 40], etc. In this part a brief introduction of fuzzy logic theory is reviewed and then the proposed fuzzy-based feature selection for STLF is described. Assume term U is Universe of discourse and x is an element of U. Compared to the crisp set in which x has only two values {0,1}, in fuzzy set each element has a membership grade between 0 and 1. Therefore, Fuzzy set can be formulated as the following equation:
μ A is the membership function which defines the degree to which a given input belongs to a set and has output value between 0 and 1. The architecture of a Fuzzy Logic System (FLS) is shown in Fig. 2. composed of the following components and major steps: 1) Fuzzifier: in this step, which is also known as fuzzification, the crisp set of input data are converted to the fuzzy set by linguistic variables and membership functions. 2) The inference engine is constructed from several if-then set of rules. 3) Defuzzifier: during this step (Defuzzification) the resulting output will be converted to the crisp set using the membership functions.

Fuzzy Logic System (FLS) components.
Linguistic variables are the input or output variables of a fuzzy system which are defined as word or a sentence instead of numerical values. Linguistic variables are defined exclusively for each problem and can be different for each application area. For instance, one of the practical application of fuzzy logic is in the area of design of intelligent controller and system identification in which the linguistic variables are defined as error and change of error [41]. Fuzzy logic is composed of several if-then rules to formulate the condition statements as linguistic variables. As explained previously one of the components of fuzzy logic system is fuzzy inference engine which is employed to transfer if-then rules into fuzzy set. There are two transformation methods available for this purpose: Mamdani method and Sugeno method. The following definitions show these two methods:
Where the if part (x is A and y is B) is known as antecedent part and then part (z is C) is known as consequent part. The main difference between Mamdani (Equation 3) and Sugeno (Equation 4) method is the way that crisp output is generated from the fuzzy input. While Mamdani-type FIS uses the technique of defuzzification of a fuzzy output, Sugeno-type FIS uses weighted average to compute the crisp output.
In this part an intelligent fuzzy-based feature selection technique is explained. Assume k is the number of data patterns for each input and output variable (x i , y), then k number of figures that each of them can represent the relation of the output y with the corresponding input variable xi can be plotted. Fuzzy membership function for 2-dimentional space (x i , y) can be formulated as bellow:
In the above membership function σ is selected around 25% of the input candidates. As it was mentioned earlier, different types of membership functions are available. Here, Gaussian membership function is chosen for the proposed fuzzy-based feature selection method. The given membership function in formula (2) defines “if-then” rules between input and output variables. According to the k number of data patterns for any input variable, k number of “if-then” rules is generated. Thereafter, in order to convert the fuzzy grades into the crisp values, defuzzification process is required. Center of Gravity (COG) method and Center of Area (COA) are the most prevalent defuzzification methods. COG method which is also referred to centroid, mathematically obtains the center of mass of the triggered output membership function. Here, centroid defuzzification method is applied for generating the crisp output variables ci by the following equation:
The curve pattern of the defuzzified output “c
i
” for the corresponding input variable represents how dependent the output variable y is to the input variable x
i
in such a way that if the plot of c
i
is smooth with no big maximum or minimums then it can be concluded the input variable does not have much influence or dependency to the defuzzified output variable. Therefore, it is considered as non-informative feature in the process of feature selection. In contrast, if the fuzzy plot has fluctuations and irregular patterns in curve, then it will be observed the high influence and dependency between output variable and corresponding input variable are existed. To this end, the relevance of each feature is defined by these two objectives: 1) how rough the fuzzy curve is, and 2) distance between maximum and minimum in the fuzzy curve. Following equation formulates the roughness of the fuzzy curve:
Where m i is defined as the mean value of a given data set y i .
Distance between maximum and minimum in the fuzzy curve is formulated as follows as well:
As a result, the higher values of Obj1 (i) and Obj2 (i), the more important and influential corresponding features will be obtained for feature selection.
In this part, fuzzy clustering is applied in order to assign membership grades and establish a balance between the level of importance and relevancy of each objective; Obj1 (i) and Obj2 (i) and input variables. Here, is the membership function which is defined for each objective as follows:
The values of
Where ω i is the weight for the ith criterion and m is the number of criteria (Obj1 & Obj2) which here equals 2. Although the aforementioned objective functions (Equations 6 & 7) are considered as two methods to improve feature selection process separately, they may have behavioral confliction. In order to address this confliction issue and could be able to satisfy both of the objective function in parallel, a multi-objective model is introduced in Equation 8 based on fuzzy minimax clustering model by minimizing the maximum value of the set of weighted cluster variation after performing normalization procedure which are obtained from Equation 9.
Evolutionary algorithms have become a powerful metaheuristic solution for optimization tasks due to their flexibility for the underlying fitness landscape and random and population-based characteristics. Different evolutionary-based methods are introduced by researchers for optimization namely Chaos-search Genetic Algorithm (CGA), Simulated Annealing (SA) [42], simulated rebounding algorithm (SRA) [43], Particle Swarm Optimization (PSO) [44], Artificial Bee Colony (ABC) [45], Cuckoo Search [46], etc. Although the mentioned metaheuristic techniques are well-known with satisfactory results for different applications, the main limitation of them is their dependency on the adjusting parameters of ANN. Clonal Selection Algorithms (CSA) is employed to adjust the weighting and biasing factors of ANN in order to enhance the training mechanism. Clonal Selection Algorithm (CSA) is a new method which is introduced to the field of evolutionary computation by abstracting and applying the same principles of Burnet’s theory. The immune system is considered as distributing in terms of control, adaptive in terms of function and parallel in terms of operation. All of these immune characteristics can be applied in the field of intelligent systems.
In clonal selection theory, the variety of antibodies molecules, which is employed to protect the organism form invasion, is generated by B-lymphocyte. B lymphocyte is a type of white blood cell which produces inimitable and customized antibodies for particular type. Clonal selection theory declares that each body organism has pre-existing pool of unique antibodies that have the capability of antigen recognition and matching with some level of accuracy. When the antigen is identified and matched specific antibody, then it prompts the cell to produce more cells with the same receptor. Throughout the cell proliferation stage, genetic mutations happen in the clone of cells which advocate the match with the antigen. This process can be considered as a kind of Darwinian theory in which the cell that has the highest match with the antigens are observed as the survival cell. Utilizing the Clonal Selection Algorithm in the engineering problem is to establish a memory pool of antibodies to show optimal solutions for a given problem. In other word, each antibody and antigen are considered as a solution of the given problem and evolution of the problem space, respectively. The algorithm has the powerful capability for both local and global search mechanism. Regarding the local search mechanism, it is achieved by hyper mutation of the cloned antibodies and for the global search mechanism, the solution is found by generating random antibodies to be added to the population in order to expand the diversity to avoid potentially trapping to the local optima. For each antibody, the fitness or affinity function is formulated as follows:
The population in the fitness function is sorted out based on the antibodies proportional clones to their affinity which means that there is a direct relation between the number of generated clones for an antibody and their affinity. The higher number of clones for an antibody, the more affinity includes as below equation indicates:
During the affinity maturation process, which leads to the better antigen for clones, the maturation degree is reversed to their parent affinity. In other word, the clones with more affinity has lower mutation degree as follows:
At this point, the clones are prone to the antigen and fitness function is computed. This steps will be iterated till the clones’ population reach the mutation. At this stage, in order to select the identical number of clones as the size of the initial population, the tournament test is performed. At the end, the optimal solution and population will be updated. This procedure is iterated until the stopping requirement is met. Figure 3 demonstrates the completing workflow of the CSA algorithm.

CSA algorithm workflow.
As it is shown in Fig. 3, the first step of CSA is initialization of antibody pool with fix size of N and then the pool is divided into two categories: memory antibody section m that is become the algorithm solution and remaining anti-body pool r which is utilized to add more diversity into the population. Having initialization process is completed, the algorithm will be executing a number of iteration process in such a way that all known antigens get exposed to the system. It should be noted that G generation is user configurable. However, the system can assign some termination criteria as well. In the loop a single antigen will be chosen in random fashion without replacement from the antigen pool. Then, the system calculates the affinity values for all antibodies against the antigens. After that a set of n antibodies are chosen which have the highest affinity with the antigen. In the next step, the selected antibodies are ranked in proportion to their affinity. During the affinity maturation process, the selected antigens will go through the maturation process which has inverse relation to their parent’s affinity (greater affinity, lower mutation). The clone is then exposed to the antigen at the next step and then the anti-bodies with the highest affinity in the clone will be candidate as memory antibodies for replacement into m. Eventually, the d individuals in the remaining r antigen pool which obtained the lowest affinity will be replaced with the new random antibodies. Having completed the iterations, the memory m antigen pool component is selected as algorithm solution.
In this study, a new self-adaptive modified technique is introduced to improve both local and global search capability of the CSA and prevent the algorithm to converge prematurely. The modified technique composed of three sub modification portions to improve the diversity of the population, bias the average of the population toward the optimal solution, and establish a balanced search mechanism for both locally and globally, respectively.
Then a new potential solution is triggered and if the new solution is more optimal than the previous generated solution then replacement will be occurred which can be defined as follows:
As it was explained previously, the self-adaptive modified clonal selection algorithm is applied for ANN weight and biasing coefficients in such a way to boost the training procedure. Moreover, SAMCSA is used as an optimization technique and the obtaining results is compared with the classic training algorithm e.g. backpropagation. There are two underlying factors behind the proposed technique: Firstly, applying the evolutionary algorithm (SAMCSA in this study) to tune the adjusting factor for ANN can be prolonged process and can potentially reduce the efficiency of ANN when quick training method matters. Secondly, the obtaining results from the evolutionary techniques might not be as promising as traditional training algorithm like backpropagation. Hence, in this paper different hybrid method named as SAMCSA-ANN is introduced to conquer the mentioned drawbacks and enhance the training process in a way that the prediction accuracy of load peak will be increased remarkably. The proposed method consists of two major steps: firstly, the ANN is trained by using some traditional training algorithm which means that all adjusting factors, an appropriate neural network architecture (biasing and weight coefficients, number of neuron in each layer, number of hidden layers, and transformation function) are chosen by experimental effort. Then, the optimization task is taken into account. Each element in the population is made from all the weight and biasing factors of ANN as formulated below:
The most suitable weighting and biasing coefficients will be chosen for center of the control vectors (antibodies) in CSA. Afterward, the difference range known as error range will be utilized for the control vector constraint in order to place the adjusted weighting and biasing coefficients in the center of defined constraints.
L min and L max are the constant values that are defined based on the values of weighting and biasing coefficients. It is worth mentioning that if the error interval, L min and L max, are defined too small, the efficiency of SAMCSA will be decreased due to the restriction of the control vector which is resulted in the small space. On the contrary, if the values of the error interval variables are defined too large, the capability of SAMCSA can be affected negatively due to the enlarging the search space. Therefore, the proposed method establishes a win-win situation between the classic training algorithm like backpropagation and the population-based evolutionary algorithm (here CSA). Firstly, the weighing and biasing coefficients are adjusted by using classic training algorithm and discovering the best values for the aforementioned parameters. Then these values are applied as the mean values for the control vector in CSA.
As discussed earlier, SAMCSA-ANN is an optimization technique that can be utilized for short term load forecasting problem. Generally, in all optimization problems some yardsticks should be defined as an objective function in order to evaluate the enhancement and efficiency of the predicted values (here load peak values). Relative error σ
i
and Root Mean Square Error (RMSE) are defined as two objective functions, respectively.
The lower value of RMSE, the more accurate predicted results is obtained and the level of uncertainty will be diminished consequently. The implementation of the SAMCSA-ANN follows the below steps:
To this end, the most efficient and appropriate weighting and biased coefficient will be discovered by using the SAMCSA algorithm so that the whole search space will is explored.
To demonstrate the effectiveness and enhanced performance of our method, the real dataset for daily peak value of load consumption of Tehran province, Iran from August 2001 to August 2016 is gathered and the proposed forecasting method is applied for on dataset in September 2016. The initial population in the SAMCSA is chosen as 20 antibodies and the number of iteration is 150 as stopping requirement. The underlying reason behind selecting this number as stopping criteria is that it was discerned that after this iteration the algorithm goes to the saturated level and no further improvement was observed. In PSO algorithm, the number of particles and generation are 35 and 10000, respectively. In addition, the values of maximum velocity and inertia weight factor is chosen as 2 and 0.7, respectively. As for Genetic Algorithm, the value initial population and number of iteration is set to 700 individuals and 200 iterations. Also the values of mutation probability and crossover is chosen as 0.07 and 0.7, respectively. As discussed earlier, the major benefit of the proposed fuzzy-based feature selection is its quick response with same accuracy compared to the other existing methods. Figure 4. depicts the processing time of proposed fuzzy-based feature section method along with other popular methods.

Computational time of different feature selection methods.
The number of features in the original dataset was 70 which deduced to 10 features after applying the fuzzy-based feature selection to leave out the irrelevant and non-informative features which is represented in Table 1. After sorting the input variables in descending order by the feature selection method, different number of input variables like 4 and 13 were tested and eventually 10 as number of input variable is chosen. In fact, N μ in Equation 9 can be considered as effective factor to estimate the number of input features.
Most relevant feature set for electric load prediction
Constructing the ANN architecture like number of input variables, number of hidden layers and number of neuron for each layer is considered as experimental knowledge to end up with the best structure containing satisfactory level of accuracy. In this study, ANN constructed as two hidden layers with 4 and 7 neurons for each layer, respectively. Shown in Fig. 5 is the relative error σ i for 30 days ahead for different prediction models namely traditional ANN, ANN-PSO, ANN-GA, ANN-CSA, ANN-SAMCSA. By conducting comparative analysis of traditional ANN with other evolutionary-based ANN, it can be observed that the proposed training method has better results in terms of less relative error and higher prediction accuracy. Additionally, shown in Table 2, the superiority and efficiency of the proposed method in terms of decreasing the Root Mean Square Error (RMSE) is observed. As it is shown in Table 2. RMSE error has decreased from 4.1943 in traditional ANN to 1.9759 in the proposed ANN-SAMCSA method. In order to demonstrate the promising results of proposed ANN-SAMCSA method compared to the traditional ANN, Fig. 6 presents a comparison of the actual load consumption values, predicted load values by traditional ANN, and predicted load values by proposed ANN-SAMCSA, in parallel.

Relative Error (σ i ) for 30 days ahead electric load forecasting.

Comparison between the traditional ANN and proposed ANN-SAMCSA method.
RMSE values for five different methods
In this article, a new method by using fuzzy-based feature selection and SAMCSA-ANN training method is proposed to enhance the performance and accuracy of the load forecasting prediction. In order to demonstrate the superiority of the proposed hybrid method over the traditional load forecasting method, ANN was trained by using the traditional training algorithm e.g. backpropagation in such a way that ANN architecture, weighting and biasing parameters is achieved by the experimental effort and then SAMCSA, modified evolutionary population-based optimization algorithm is applied to decrease the relative error by tuning the weighting and biasing factors efficiently. In addition to the proposed training ANN algorithm, a fuzzy-based feature selection is introduced to select only informative features in dataset and get rid of the noisy data which leads to features deduction by keeping the satisfactory prediction accuracy of modified method which is made up of fuzzy set theory and fuzzy clustering. In order to evaluate the performance and improved efficiency of the proposed method, a real dataset for daily peak load of Fars province in Iran is exploited. The obtained results indicate not only the population-based evolutionary technique can be introduced as a preferred alternative training algorithm for ANN, but it also demonstrates that the modification which is made for the CSA method enhanced the overall performance and accuracy of CSA as an optimization model.
