Abstract
Differential evolution (DE) is a versatile and fast evolutionary algorithm (EA) for real world global optimization problems. It has been widely applied to diverse areas as a remarkable search technique. However, because of large step sizes in mutation, DE is known to be incapable of exploiting the existing population than exploring the search region. DE inherent drawback is avoiding the true optima. To heal its poor exploitation, hybridization with local search techniques will be a healthy choice, because local search strategies are proven to be robust in exploitation. In this paper, we hybridize DE variant, JADE with two gradient based local search (LS) techniques, Steepest Decent Method (SDM) and Broyden-Fletcher-Goldfarb-Shanno (BFGS) method. The new algorithm is known as hybidJADE. The SDM is applied iteratively to locate promising solutions in the evolution, afterwards BFGS is applied to fine tune the elite solutions. If BFGS found solutions are in the neighborhood of current best solution, a restart is incorporated, where JADE and BFGS explore the population. The performance of hybridJADE is tested on 15 benchmarks from literature. The obtained experimental results are compared with the results of two state-of-the-art algorithms, JADE, and jDE. The results show significance performance of hybridJADE in terms of success rate on various benchmarks. The sensitivity analysis of hybridJADE to its various parameters is also presented.
Introduction
Optimization means finding the best solution for an objective function (s) subject to constraints. Optimization problems can be categorized as single objective, multi-objective, and many objective unconstrained/constrained real-parameter or discrete optimization problems. The problem having just one objective to be optimized is called single objective optimization problem. When there are two to three objectives, the problem is called multi-objective optimization problem, while with more than three objectives the problem becomes many objective optimization problem [18]. In unconstrained optimization the main issue is to get to optimal value for the objective function (s), while in constrained optimization the constrains, both bound and functional, are also meant to be satisfied along with optimizing the objective function (s) [21].
Evolutionary optimization is inspired from Darwinian theory of evolution [28]. Algorithms belonging to this category are very efficient for finding global optimum of many real world problems, including problems from mathematics, engineering, economics, business, and medicines etc. In general, each EA starts with a population of candidate solutions (individuals). The quality of each solution is evaluated according to a fitness function, which represents the problem at hand. In each iteration of an EA, a selection process is applied to select a new set of solutions (population). The selection in EA is biased towards the most promising traits of the current population of solutions to increase their chances of being propagated to the new generation. At each iteration (generation), the population is evolved through a predefined set of operators, like mutation and recombination. This procedure is repeated until convergence is achieved. The best solution found by this procedure is expected to be a nearly optimum solution [14, 39].
Among different variants of EAs, Differential Evolution (DE) [39, 41] is a relatively recent algorithm. DE is efficient in solving many optimization problems [1, 46]. DE has many advantages. For example, it is easy to understand and implement, has a few parameters to control [43], and is robust. There is no doubt that DE is a remarkable optimizer but different reviews suggested that due to large step sizes in mutation, it is not very capable of exploiting the existing solutions than the exploration of search area [20]. Unlike the traditional deterministic methods, DE has the inherent drawback of skipping the true optima [20].
With higher reliability and modern technology, systems are getting complex day by day. It is very difficult to find a failure free algorithm, stand alone or hybrid, to solve above mentioned optimization problems [16]. Hybrid algorithms have shown performance improvement over stand alone algorithms. In he following, we briefly review some hybrid algorithms.
In [33], water cycle algorithm (WCA) (population based approach) is combined with gradient based LS; the new hybrid showed good performance against algorithms in comparison. In another recent experiment [30], particle swarm optimization (PSO) is hybridized with functional calculus (FC) as a LS. Some researchers combined two global search techniques, GA and PSO [17] and time varying acceleration (TVAC) with PSO [34] for constrained optimization problems. Their proposed hybrid algorithm works well for constrained optimization [29]. Likewise, harmony search (HS) is enhanced to heal its parameter sensitivity [3]. In all of these algorithms, experiments are conducted on discrete, constrained and multi-objective optimization problems.
Although there exist many DE variants [7, 47–50] for coping with complex real-world optimization problems, the performance of DE for some real-life problems can be further improved to bring better solutions. Moreover, various studies have disclosed that entrenching LS method can greatly improve the search capability of DE [22, 37]. In recent years the hybridization of DE with LS methods have gained much attraction due to their individual merits. Many hybrid algorithms have shown significant performance improvement. Here we review some of the methods in this category, which some how are related to our work as well.
In [38], the Levy Flight inspired LS mechanism is hybridized with DE known as Levy Flight DE (LFDE). LFDE exploits the search region brought by the best solution. A recent experiment [11] incorporated orthogonal LS in DE, denoted by (OLSDE). In every iteration of OLSDE, a DE process is used at first, and then to enhance the quality of some solutions an OLS based local search information is utilized. In an other improved DE algorithm, LSDE [19], an LS operator is merged in DE. In [27], an LS mechanism is employed in DE to enhance its poor LS ability. A new DE algorithm with localization around the best point (DELB) is proposed in [23]. In DELB, the initial steps are the same as DE except that the mutation scale factor F is chosen randomly for each mutant vector. DELB also modifies the selection step by introducing reflection and contraction. The trial vector is compared with the current best and the parent vector. If the parent is worse than the trial vector, it is replaced by a new concentrated or reflected vector. Thus in DELB, the trial vector can be replaced by its parent vector, reflected vector or contracted vector, while in classic DE only the trial vector replaces the parent. In NSDE [2], DE is hybridized with nonlinear simplex method. Here, a nonlinear simplex method with uniform random numbers is applied to initialize DE population. Initially, N(p) individuals are generated uniformly and then next N(p) individuals are generated from these N(p) points by application of Nelder-Mead Simplex (NMS) [15, 25]. Then from 2N(p) population, the fittest N(p) individuals are selected as DE’s initial population and the rest of DE is unaltered in their algorithm. Thus, NSDE only modifies DE in the population step. It has shown good performance in reducing function evaluations and the CPU time. In an other experiment, Janez et al. [8] hybridized DE with Sequential Quadratic Programming (SQP), an efficient but expensive gradient-based LS method. Their hybrid applies the DE algorithm until function evaluations reach 30% of the maximum function evaluations. It then applies SQP for the first time to the best point thus obtained. Afterwards, SQP is applied after each 100 generations to the best solution of the current search. In their hybrid the population size keeps reducing and the process ends with a minimum population size.
Motivated by concerns about the complexity of systems and real-life problems, the rich literature of hybrid algorithms and it performance improvement over other stand alone algorithms (particularly of DE), and to avoid the inherent drawback of trapping in local optima of DE and thus improve it performance, this study proposes a hybrid DE algorithm, called hybridJADE, which combines two different gradient based LS strategies, one expensive and one inexpensive, with an improved variant of DE, JADE [49]. In hybridJADE, we incorporate gradient based information to the elite individuals to promote the balance between exploration and exploitation [9]. Also, we design a simple restart mechanism to enhance the robustness. Furthermore, in this work, we focus on single objective unconstrained optimization.
The rest of the paper is organized as follows. Section 2 describes the basic DE, JADE, SDM and BFGS algorithms. Section 3 details the proposed algorithm. Section 4 gives the experimental results. Finally, Section 5 concludes this paper and discusses future research direction.
Basic DE, JADE, SDM and BFGS
Our new hybrid algorithm will rely on basic DE variant, JADE [49], BFGS [15] and SDM [15]. Thus this section presents the basic operators of DE, BFGS and SDM, whereas the detailed description of JADE is already given in our previous work [26].
The iterative process of a standard DE algorithm [39, 41] relies on four basic steps: initialization, mutation, crossover, and selection. The iterations terminate when a termination criteria is met. It starts with a random initial population of N(p) n-dimensional real-valued vectors in the search space, which is bounded by

Differential evolution.
JADE [49] is an efficient version of DE. JADE updates the control parameters F and Cr of the probability distributions adaptively. The values of F and Cr that yield successful offspring in the recent generations are considered in the updating scheme of these parameters. It also employs a new mutation strategy DE/current/to-pbest and keeps a record of the inferior solutions of the selection scheme in an external archive (For more details readers are referred to [26, 49]).
The Quasi Newton search/BFGS algorithm is basically gradient-dependent approach, which employs the Hessian matrix in locating a suitable search direction. BFGS is considered as a very good LS method due to its efficiency. The Pseudo-code of BFGS is given in our previous work [24]. For more details on BFGS see [44].
The SDM is also a gradient based method. The gradient of a function at a point indicates the rapid increase in the function at that point. So the negative gradient will be the direction of most rapid decrease in the function at that point [44]. The method which searches the minimum along the negative gradient is called steepest descent method [10]. The details of SDM are presented in Algorithm 1.
Pseudo-code of steepest descent method (SDM) [44]
In this section, a new hybrid algorithm based on hybridization of JADE and two LS techniques with a restart strategy, namely hybridJADE is presented. Before presenting the algorithm, the main idea on which it is based is given. The key idea of hybridJADE is to examine whether the performance of a hybrid algorithm can be improved by applying two LSs, one cheap and one costly, instead of only implementing a costly LS method. hybridJADE not only incorporates the two LSs together, but it also applies a restart if it stucks in a local optimum. The restart in hybridJADE is a local restart strategy instead of an entire restart. By local restart we mean that the new solutions will be generated in the neighborhood of the local optimum. The newly sampled solutions increase the diversity of the algorithm. Thus we hope that it will increase the performance of hybridJADE. Moreover, we are interested in examining the performance of hybridJADE when it relies on only expensive LS after restart instead of both LSs.
hybridJADE
The detailed description of hybridJADE is given in Algorithm 2, and its flowchart is presented in Fig. 2.
Flowchart of hybridJADE. Algorithmic scheme for hybridJADE Algorithmic framework of initialization of the search range for restart
In the following, we discuss the implementation of hybridJADE.
Since the issue is to design a model which has capabilities of both global and local search and, as a result a balance between exploration and exploitation is achieved, implementing JADE (as a global search method) at the beginning of the algorithm (
Algorithmic framework of restart strategy
Algorithmic framework of restart strategy
The population which is brought by JADE is already concentrated to some extent. The implementation of an inexpensive LS to the top, say q individuals would hopefully decrease their function values, which would be further refined later. An appropriate number for parameter q, say q = 3 is very important, because if it is chosen one in population of size 100, it will have no effect on the population. On the other hand if it is too large, it will waste function evaluations. SDM is less expensive as compared to BFGS. Because it approximates the gradient and incorporates some line search technique to compute the step lenght α i as given in Algorithm 1, for searching a new direction. SDM does not require Hessian approximation. That is why its computational complexity is less than that of BFGS. Thus it can be implemented in global search at regular intervals and for more than one iterations. That is why, SDM iteration counter, γ is set to 2 in our experiments.
Refinement
BFGS is applied to the best solution among the q solutions found by the cheap LS of
In the restart strategy, the initial population is already concentrated, so BFGS is applied ς, set to 1 here, iterations to the current best solution of the search. Moreover, the LS call of BFGS is implemented after each η generations after restart with a hope that it will find the desired optimum. The parameter η is set to 30 generations such that BFGS can do the LS after a significant gap of global search by JADE during restart. Thus, it is expected that in this way a balance between exploration and exploitation could be established during the restart.
Updating the population
Adding promising solutions to the population of hybridJADE and removing the worst points from it can improve the quality of offspring in the next generations. Contrary to good parents which can produce good offspring, worse parents have the chance of producing worse solutions. So their removal can have a good effect on the entire population.
Stopping condition
The stopping condition is borrowed from [36] for hybridJADE. hybridJADE stops when one or both of the following conditions are met.
The maximum number of function evaluations is reached. |f (
Deciding the neighborhood size
When BFGS in
The search range, illustrated in Fig. 3, varies with the problem dimension n and the values of L and U. To avoid manual adjustment of θ, we made it dimension dependent so that it can choose appropriate values with varying dimension. For example, in case of test instance F1, the values of L and U are -100 and 100, respectively. Thus the value of θ with n = 30 is 10. Similarly, in case of test instance F8, θ = 3.20 with n = 30 and L = -32 and U = 32. Moreover, if θ < 0.50, then θ = 0.50 is taken as the default value. An example of the latter case is test instance F11, where θ = 0.05 with L = -0.5, U = 0.5 and n = 30. In this case, we take θ = 0.50, because a very small region (θ < 0.50) would not be an ideal space for search of hybridJADE.

2D Concept of neighborhood size.
The advantage of θ defined in Equation (theta) is that it adjusts itself according to the dimension and search bounds. Hence, it helps the algorithm to sample a new population at an appropriate distance from the local optimum. We hope it will discourage the pre-mature convergence of the algorithm.
Next to answer the question that why the initialization of the search range is necessary for the new population, we comment that if we initialize again the population in its previous search range within the same function evaluations, it would be unwise as the majority of the function evaluations are already done. Doing the same search again will waste function evaluations and the algorithm would not be able to reach the desired optimum. On the other hand, if we generate the new population just around the best solution, then there is a big chance of getting into the same local optimum again. Thus, it is necessary to generate a new population at a certain distance and within bounds.
If the BFGS found a local optimum, N(p) - 1 new solutions are generated in the predefined search range
Comparison studies
Two sets of experiments are carried out. The first set of experiments was carried out to compare the hybridJADE with DE variant, JADE. In the second set of experiments, a set of comparison studies was made to state-of-the-art algorithm jDE. All these experiments were carried out using MATLAB software. A t-test was also applied for validation of these results. We compare hybridJADE with JADE and jDE for 30 dimensional problems.
Test instances for both experiments
To study the performance of hybridJADE, we use CEC2005 competition problems for real parameter optimization. More details about these instances can be found in [35]. The instances of CEC2005 can be divided into:
Unimodal functions (F1 - F5). Multimodal functions (F6 - F14). Hybrid composition test instance (F15)
The 15
th
test instance, F15 is designed by combining ten different benchmark functions, i.e., two Rastrigin’s functions, two Weirstrass’s functions, two Griewank’s functions, two Ackley’s functions and two Sphere functions. Its value to reach is 120.
Parameter settings
Basic parameter settings are the same as given in [35], i.e., N(p) = 100, n = 30, maxFES = n × 10000, error = 10-2 for F7 and 10-8 for all other functions. REST, initially set to zero, is set to 1, when the restart strategy is called. q = 3, the number of points which undergo SDM. γ = 2, the number of iterations of SDM and BFGS before the restart process. ς = 1, the number of iterations after the restart. Once the restart strategy is called, BFGS is then applied after each η (=30) generations.
Evaluation metrics
Thirty independent runs were conducted for JADE, jDE and hybridJADE. The mean and standard deviation of the function error |f (
We also record the Success Rate (SR) for each test instance. A run is considered as successful if it achieves the desired accuracy within the maximum allowed function evaluations. The SR for a particular function is calculated as:
The average and standard deviation of function error values obtained in each run are used to compare the performance of both algorithms. They are shown in Table 1. The function evaluations are recorded when an algorithm has achieved the desired accuracy level during a run, within the maximum function evaluations.
Unimodal test instances (F1 - F5)
“Mean Error”, “Std Dev” and “SR” of the function error values of JADE and hybridJADE over 30 independent runs on 15 test instances of 30 variables with 3 × 105 FES
“Mean Error”, “Std Dev” and “SR” of the function error values of JADE and hybridJADE over 30 independent runs on 15 test instances of 30 variables with 3 × 105 FES
“-”, “+” and “≈” denote that the performance of JADE is worse than, better than or similar to that of hybridJADE, respectively.
The SRs of both algorithms are given in Table 1. This table shows that the SRs of JADE are better than those of hybridJADE on two unimodal test instances F4 and F5, while on three unimodal functions, F1-F3 the SRs of both algorithms are equal. The lower SRs of hybridJADE than JADE on test functions F4 and F5 could be due to our two techniques, firstly, the restart utilized extra N(p) - 1 function evaluations, and also we use the efficient LS which is fast but utilize more function evaluations. Hence, sometimes hybridJADE might finish the maximum number of function evaluations before reaching the desired local optimum. Another possible cause of failure could be the parameter η which is the interval between the LS calls after restart, chosen 30 here. Since LS after restart is BFGS and calling it after each 30 generations may not be an appropriate gape between the LS calls. The convergence graphs of function error values against FES are also plotted in Fig. 4.

Convergence graphs of JADE and hybridJADE for six representative test functions at n=30, with population size=100. (a) F1 (b) F2 (c) F7 (d) F8 (e) F9 (f) F10.
Table 1 shows that hybridJADE is better, in terms of function average error values, than JADE on basic two multimodal functions, F6 and F7. On three test instances, F8, F10 and F12, both algorithms are equivalent, while on four test instance, F9, F11, F13 and F14 JADE is superior in getting better value of mean than that of hybridJADE. Though hybridJADE performance is worse than JADE on four instances in multimodal test instances, it has higher success rate, SR, than JADE on two test instances F6 and F12; 80% SR against 63.3% SR of JADE. This good value may be due to the combination of LS methods which increases the exploitation ability of the hybridJADE and helps it to reach a better solution.
On expanded multimodal functions, F13 and F14, JADE is better than hybridJADE in terms of function error values, although both have the same SR, 0.
Hybrid composition test instance (F15)
On hybrid composition function F15, JADE and hybridJADE had the same performance in terms of function error values. But JADE got higher SR than hybridJADE, as shown in Table 1.
Hence from the above results it can be concluded that JADE showed good performance on eight test instances out of 15.
Comparison with jDE
We further compare the hybridJADE with jDE [6]. jDE introduced a self adaptive mechanism for control parameters F and CR. The experimental results for both algorithms hybridJADE and jDE were obtained in 30 independent runs. The function error values and SR are presented in Table 2 and the convergence graphs of both algorithms are shown in Fig. 5. The detailed comparison of both algorithms is presented below.
Convergence graphs of jDE and hybridJADE for eight representative test functions at n = 30, with population size=100.
“Mean Error”, “Std Dev” and “SR” of the function error values of jDE and hybridJADE over 30 independent runs on 15 test instances of 30 variables with 3 × 105 FES
“Mean Error”, “Std Dev” and “SR” of the function error values of jDE and hybridJADE over 30 independent runs on 15 test instances of 30 variables with 3 × 105 FES
“-”, “+” and “≈” denote that the performance of jDE is worse than, better than or similar to that of hybridJADE, respectively.
Overall, in case of unimodal test instances, both jDE and hybridJADE performed equally well.
The performance of hybridJADE in this category of multimodal test instances can be observed in Table 2. hybridJADE performed well for three out of nine multimodal test instances, F7, F10 and F12 in getting a better local optimal solution. This success can also be observed from graphs presented in Fig. 5. For these instances SR is 0 for both algorithms except F12 where hybridJADE succeeded in achieving the desired accuracy with SR of 3· 3 %.
These test instances have many local optima, so it is usually difficult for any algorithm to get the desired accuracy in each run. For three test instances, F6, F9 and F13, jDE performed good in attaining a better average value than hybridJADE as can be seen in Table 2. Though jDE performed well on F6 in function error value, hybridJADE was good on F6 in finding a higher SR. Overall hybridJADE’s performance is better than that of jDE for multimodal test instances in SRs as it reached the desired level in few runs for three test instances, while jDE got it for only one test instance, F9.
Hybrid composition test instance (F15)
For this hard composition test instance jDE’s performance is similar to that of hybridJADE in function error values. None of the algorithm could have a positive SR for this tricky test instance.
Overall hybridJADE dominates jDE in terms of function error values, on five test instances, F2, F3, F7, F10 and F12 and jDE on the other hand also surpasses hybridJADE in exactly five test instances, F4, F5, F6, F9 and F13. However the performance of hybridJADE is better in terms of SR, as evident by Table 2 for six (F1, F2, F5, F6, F9 and F12) out of fifteen test instances. It has reached the required accuracy for some of the runs, but jDE could do it only for two test instances, F1 and F9. The cause for the good performance in SRs is that hybridJADE utilizes two local search methods, which can be of great help in exploring the small regions. jDE on the other hand is a global search method, hence it may suffer poor exploitation. The inferior performance of hybridJADE to jDE on some test instances may be due to gradient based LS which some times get trapped in local optimum.
Sensitivity study of hybridJADE to its parameters
To investigate the behaviour of proposed algorithm to various parameters we conducted experiments on F8 as representative test instance.
Sensitivity of hybridJADE to parameter ξ
To illustrate the sensitivity of the performance of hybridJADE to the interval between the LS calls, ξ, we have tested 6 values of ξ (i.e., ξ = 5, 10, 15, 20, 25, 30) on F8. Each value of ξ is executed 30 times. All other parameters are kept the same.
Figure 6 (a) displays the objective function average error values versus values of ξ. The graph shows high sensitivity of hybridJADE to the increasing different values of ξ. Thus, small value of ξ is suggested.
The objective function average error values vs. parameters ξ, q, η, and γ obtained with hybridJADE on F8 (CEC2005) representative test function.
To analyze the behaviour of hybridJADE to parameter q (the number of solutions undergoing LS), hybridJADE is tested with 10 values of q (i.e., q = 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) on F8. Each value of q is tested in 30 independent runs. All other parameters remain the same. Figure 6 (b) gives the objective function average error values versus values of q. The graph shows sensitive behaviour of hybridJADE to various values of q. A higher value of this parameter can produce better results.
Sensitivity of hybridJADE to parameter η
To illustrate the sensitivity of the performance of hybridJADE to the interval between the LS calls, η, we have experimented with 6 values of η (i.e.,η = 10, 20, 30, 40, 50, 60) on F8. Each value of η is tested with 30 independent runs. All other parameters were kept the same. Figure 6 (c) plots the objective function average error values versus values of η. The graph shows sensitivity of hybridJADE to different values of η. It also confirms our appropriate choice (η = 30) of this parameter.
Influence of parameter γ on hybridJADE
In order to study the sensitivity of hybridJADE to parameter γ, we have tried 5 values of γ (i.e., γ = 1, 2, 3, 4, 5) on F8. Each value of γ is tried 30 independent times. All other parameters remain the same. Figure 6 (d) displays the objective function average error values versus different values of γ. The graph in this figure indicate that hybridJADE is sensitive to changes in values of γ; a high value of γ can produce better results.
In general, hybridJADE is sensitive to all its parameters, as shown by the graphs in Fig. 6. Thus, appropriate choice of values for these parameters is necessary for the better performance of hybridJADE.
Conclusion
In this paper, a hybrid algorithm, hybridJADE of a global search optimizer, JADE with two local search optimizers, SDM and BFGS is proposed to complement the exploration strength of JADE with the exploitation strengths of SDM and BFGS. Whenever the algorithm gets trapped in a local optima, a restart strategy is called and BFGS is applied. In fact, we want to spend much efforts on good points that are identified during the evolution. That is why, an efficient LS, BFGS is called after the restart strategy. From experimental results, the following points can be concluded.
HybridJADE obtained better statistics than JADE on two CEC2005 test instances, F6 and F7, it also performed equally good on six test instances. However, it showed inferior performance on eight test instances against JADE. The failure could be due to its restart strategy, which utilizes an expensive LS, or may be the neighborhood size is not suited to each test instance.
The comparison of hybridJADE with state-of-the-art jDE on CEC2005 test instances indicated that hybridJADE had found better function error values for five test instances, F2, F3, F7, F10 and F12. However, it loses performance on the other five instances, F4, F5, F6, F9 and F13 against jDE. For the rest both had the same performance.
Sensitivity analysis to newly introduced parameters is provided, which identified a bit receptive behaviour of hybridJADE to these parameters.
In the future, with intend to embed some other LSs, e.g., conjugate gradient and Davidon-Fletcher-Powel methods in PSO, Firefly algorithm (FA) and some other promising EAs. The existing algorithm can be further improved and extended to solve latest CEC test function suits and constrained multiplicative real life engineering optimization problems.
Footnotes
Acknowledgment
The authors would like to thank King Khalid University of Saudi Arabia for supporting this research under the grant number R.G.P.2/7/38.
