Abstract
This paper proposes a new metaheuristic global optimization algorithm inspired by Wildebeest herding behavior called Wildebeest Herd Optimization (WHO) algorithm. WHO algorithm mimics the way nomadic Wildebeest herds search vast areas of grasslands efficiently for regions of high food density. The WHO algorithm models five principal Wildebeest behaviors: firstly Wildebeests have limited eyesight and can only search for food locally, secondly Wildebeests stick to the herd to escape predators, thirdly Wildebeest herd as a whole migrates to regions of high food availability based on historical knowledge of annual grass growth rates and rainfall patterns, fourthly Wildebeests move out of crowded overgrazed regions and finally Wildebeests move to avoid starvation. The WHO algorithm is compared to Physics inspired, Swarm based, Biologically inspired and Evolution inspired global optimization algorithms on an extended test suite of benchmark optimization problems including rotated, shifted, noisy and high dimensional problems. Extensive simulation results indicate that the WHO algorithm proposed in this paper significantly outperforms state-of-the-art popular metaheuristic optimization algorithms like Particle Swarm Optimization Algorithm (PSO), Genetic Algorithm (GA), Gravitational Search Algorithm (GSA), Artificial Bee Colony Algorithm (ABC) and Simulated Annealing (SA) on shifted, high dimensional and large search range problems.
Keywords
Introduction
The need to find the global minimum of high-dimensional non-convex function arises in many real world applications involving engineering design, machine learning, optimal filter design, VLSI circuit optimization, bioinformatics and power system optimization [1]. In a majority of applications the cost function being minimized is highly multimodal and high dimensional involving a large number of independent variables. The exact solution to global optimization problems cannot be computed in polynomial time and involves an exhaustive search of all possible solutions. Thus the problem of finding the exact solution to global optimization problems is computationally intractable despite their ubiquitous appearance in applications. Thus heuristic based optimization algorithms that compute an approximate or acceptable solution to global optimization problems are of interest [16]. These heuristic algorithms do not always find the global minimum but usually succeed in locating a deep local minimum. In recent years heuristic global optimization algorithms inspired by Physics, Biology, Social, Swarm and Herd behavior have been proposed and applied successfully to a wide variety of economically important and challenging problems.
Heuristic global optimization algorithms attempt to perform a biased random search of the solution space with the search biased towards promising areas as identified by the heuristic used. The stochastic nature of heuristic based algorithms allow escape from local minima and enables exploration of large high dimensional spaces. Deterministic algorithms such as Nelder Mead Simplex [14], Gradient Descent and Quasi Newton are proven to get stuck in local minimum [16] and underperform on high dimensional problems [16]. In most deterministic algorithms, a sequence of points with decreasing function values is computed iteratively and hence these algorithms cannot escape from local minima. On the other hand stochastic algorithms such as Genetic Algorithm [3], Particle Swarm Optimization algorithm [19], Ant Colony Optimization algorithm [15] and Artificial Bee colony algorithm [4] perform a biased random search and hence have the ability to escape local minimum [12]. However even stochastic search algorithms are not guaranteed to escape from all local minima and exploration of new local minima must always be performed at the cost of exploitation of already known good solutions. If too much computational effort is expended in exploration of numerous local minima then the task of accurately determining the location of the local minima will be compromised [16]. Thus a fundamental and unavoidable trade-off exists between exploration of new solutions and exploitation of already known good solutions. In most practical applications, accuracy of more than four decimal digits is seldom useful, the proposed WHO algorithm was tuned to limit exploitation of local minima to this accuracy to avoid wasting computational resources meant for exploration of new solutions.
A review of extant literature confirms that metaheuristics are mostly inspired by principles of physics [1], natural evolution [2] and collective animal behaviour [5]. A majority of metaheuristic optimization algorithms are derivative-free and hence can be applied to discontinuous and non-differentiable cost functions that frequently occur in practical applications [21]. Metaheuristics can be broadly classified into single agent based and multi-agent based depending upon the number of candidate solutions that are considered at a time [12]. In the case of a single-agent metaheuristic, a single solution is continuously improved until the solution stagnates. Single-agent metaheuristics are cheaper in terms of memory and computational resources as they do not keep track of multiple intermediate locations with high and low cost values. Since single agent metaheuristics do not consider multiple solutions simultaneously these approaches suffer from slow or premature convergence to local minima. Simulated annealing and Hill climbing are examples of single-agent metaheuristics which consider only one tentative solution at a time. Multi-agent based metaheuristics (also referred to as population based metaheuristics) on the other hand consider multiple solutions at the same time. Exchange of information between the search agents (members of the population) improves exploration and prevents premature convergence to the nearest local minimum without considering better solutions in distant regions on the search space.
Nature inspired multiagent metaheuristics can be broadly classified as evolutionary, swarm and herd behaviour based approaches. Evolutionary approaches are inspired by natural evolution processes which result in ever more complex and better adapted life forms [11]. Swarm and herd behavior based approaches are attempts to mimic collective intelligence, decentralized problem solving and emergent behaviour exhibited by large collections of animals. Large collections (insect swarms and animal herds) of interacting animals can exhibit surprisingly complex behaviour although composed of simple individuals with limited intelligence. In such collections, individuals act locally to secure their own selfish goals and often mindlessly copy the behaviour of their neighbors without any planning or global awareness [5]. However such collections due to the complexity of interactions between individuals often exhibit surprisingly complex problem solving abilities without any centralized coordination or planning. The terms swarm is usually applied to collections of simple organisms while the term herd is applied to collections of more complex animals which exhibit ‘neighbor copy’ behaviour [19]. Although numerous optimization metaheuristics inspired by swarm behaviour have been proposed only a small number of herd behaviour based metaheuristics have been proposed [8]. One such herd behavior based algorithm is krill herd optimization algorithm [9]. Models of herd behavior have demonstrated their usefulness in numerous domains from neuroscience to economics. Thus exploration of new herd behaviour based optimization metaheuristics with novel features is of interest. In this paper a new herd behaviour based metaheuristic based on Wildebeest herd foraging behaviour is proposed and shown to be competitive with swarm and evolutionary approaches.
Regardless of their origin, the performance of all global optimization algorithms is primarily determined by the trade-off made by the particular algorithm between exploration of new solutions and exploitation of known regions of the search space. Many popular algorithms suffer from poor exploration and exhibit premature convergence to local minima. Still other algorithms have an inbuilt predetermined bias for the origin (x = (0, 0, . . . , 0)) and converge to a location near the origin even when the true global minimum is not located at the origin. Thus when testing the performance of different global optimization algorithm, care must be exercised to use test functions with global minima significantly shifted fromx = (0, 0, . . . , 0). In many test suites only small random shifts in a small interval like [– 2, 2] are considered. Since the average of these random shifts is zero in each coordinate, these small random shifts do not really differentiate between an algorithm that blindly converges to zero and an algorithm that actually finds the global minimum. The WHO algorithm proposed in this paper is shown to perform significantly better that state-of-the-art algorithms on high dimensional test functions with large shifts.
The remainder of this paper is organized as follows: A review of the literature on global optimization metaheuristics is presented in section 2, the Wildebeest herd behaviour and relationship to global optimization is discussed in section 3, a mathematical model of Wildebeest herd behaviour and a new global optimization behaviour based on Wildebeest behaviour is presented in 4 and finally the performance of the proposed WHO algorithm is compared to a wide variety of heuristic global optimization algorithms on challenging high dimensional, rotated and shifted benchmark problems.
Literature review
Metaheuristic algorithms offer several advantages over existing classical gradient based optimization approaches that converge to the nearest local minimum. If the initial position lies in the neighborhood of a local minimum then gradient based approaches cannot escape from that neighborhood. Some approaches also require the calculation of matrices such as Jacobians and Hessians which are computationally very expensive for high dimensions [21]. Metaheuristic approaches on the other hand are stochastic and perform a random search. This inherent nature of the metaheuristic algorithms help escape from local minimum and assist in exploring the multidimensional solution spaces. Popular metaheuristics algorithms developed are swarm inspired, evolution inspired or physics inspired.
Algorithms such as PSO, ACO, ABC, GSO and GWO models swarm behavior. The PSO algorithm copies the behavior of migratory flock of birds and school of fishes [19, 26]. The movement of the particles in the search space is controlled by the best position found by each particle and the best position found by the entire the swarm. These positions get updated every time better positions are identified thereby guiding the swarm to better solutions. Updating each particle’s best solution contributes to the local search whereas the initial random distribution of the particles contributes to the exploration of the search space. PSO has very simple update formulae and does not require the cost problem to be differentiable. Since the PSO algorithm moves every particle towards a single global best solution, it can potentially converge to a local minimum [16].
A recently proposed biologically inspired optimizer is based on the hunting behaviour of grey wolves [20]. Grey wolf optimizer divides its population among four types of wolves namely alpha, beta, delta and omega. The best solution is termed as the alpha which tops the hierarchy. The second and third best become the beta and delta respectively. Searching for prey by the omegas are guided by the positions of the alpha, beta and gamma. They then converge towards the location of the alpha to carry out the attack. Diversification is provided during searching the prey and convergence happens during attack. Since the search of the prey is heavily influenced by the position of the alpha, the algorithm might converge to locally optimal solution.
Another popular group of metaheuristics draws inspiration from the Darwin’s theory of evolution. The most widely used among them is the Genetic Algorithm (GA) [12]. An initial population of candidate solutions is randomly generated. Candidate solutions are improved by applying operators such as selection, crossover and mutation which model evolutionary processes in nature. Since fit individuals have a higher chance of being selected for the next generation the solutions get better with each generation. The selected individuals are then recombined and mutated to generate the next population. The crossover operation allows the resultant offspring to inherit the favorable traits of both the parents. Mutation and crossover operations ensure that the offsprings are not identical to their parents resulting in exploration of unkown regions of the search space. Many variants of GAs have been proposed and applied to a variety of engineering problems. Some of them are Nelder Mead GA [2], Gradient Descent GA and Real valued GA [3]. A feature of GA is the loss of valuable information stored by less fit individuals during the selection process. This loss of information reduces the performance of GAs on highly multimodal cost functions. Other evolution based metaheuristics include Differential evolution, Evolutionary algorithm and Evolutionary Programming [2].
The third main category of metaheuristics draws inspiration from the laws of physics. These include Gravitational Search algorithm [7], Electromagnetic field optimization [10], Electromagnetism metaheuristic [22] and Ray optimization [1]. Gravitational search algorithm simulates the interaction between collection of masses in the universe based on Newton’s laws of gravity and motion. The gravitational force which acts upon the masses is directly proportional to the product of their masses and inversely proportional to the square of the distance between them. Good solutions are assumed to have higher mass and therefore attract other masses towards them. This helps in convergence to the global minimum. Masses also have velocity and acceleration governed by the Newton’s laws of motion [7]. Velocity of a particle depends upon the previous velocity and the acceleration at the moment.
Since comparison to all known metaheuristic optimization algorithms is not possible, representatives from each category are taken for comparison in this paper.
Wildebeest herd behaviour
The biological inspiration behind WHO algorithm is discussed in this section. The mathematical formulation is provided in Section 4. The movement of individual wildebeests and entire wildebeest herds in search of regions with high grass density is very similar to the operation of population based stochastic global search algorithms. The search by wildebeests to identify regions of high grass density is analogous to the search for points of minimum cost by global optimization algorithms. Although the exact behaviour of wildebeests in a herd is quite complicated and difficult to model, the following simplified wildebeest behaviours are proposed as the basis of a new global optimization algorithm:
Mathematical formulation of the WHO algorithm
A simplified mathematical model of wildebeest behaviour is presented in this section as the basis of a new heuristic global optimization algorithm. The complete WHO algorithm is presented in Table 1. The distribution of a population of wildebeest in agrassland is modelled as random points within a search space (Step 1 in Table 1).
Nomenclature
Nomenclature
Wildebeest herd optimization algorithm
First the ‘local movement’ or Milling Behaviour can be modelled as taking a fixed number
Next a fixed number
The ‘herd instinct’ is modelled by movement of each wildebeest towards other wildebeests. This can be achieved by choosing a random individual in the herd and moving towards that individual with a small probability if that random individual is in a more fertile region. The herd instinct driving a wildebeest to move towards other wildebeest is only actualized if the other wildebeest is located in a region with more food. This movement is implemented by linearly combining the position of each wildebeest x
p
with that of a randomly chosen fitter wildebeest x
h
as in (3). Figure 2 illustrates the combination of x
p
and x
h
.

A herd of wildebeest grazing the grasslands [13].

Illustration of the combination of two wildebeest’s position.
Movement of wildebeest away from regions without grass to avoid starvation is modelled by movement away from regions with least grass. This behaviour is analogous to movement away from the worst solution and can be modelled as taking a random step away from the worst solution (4)
In Equation (4),
The population pressure in regions of highest grass density lead to territorial conflicts between wildebeest and movement of losing individuals away from regions of highest grass density. The strongest wildebeest dominate the regions of high grass density and drive away competitors. This is modelled by movement of wildebeest near but not too close to the best solution away from it in accordance with (5).
The social memory of the entire herd of locations of regions of high grass density and growth rates leads the herd to explore previously known fertile regions. However the social memory is not perfect resulting in different individuals in the herd remembering a slightly different location. This social memory of best locations and imperfect recall of the best location can be modelled as random search around the best location. This local search around the best solution is implemented by taking random exploratory steps around the best solution to refine it as in (6).
The WHO algorithm proposed in this paper was compared to representative algorithms from swarm inspired, evolution inspired and physics inspired metaheuristics. The benchmark functions taken for comparative analysis is presented in Table 3. To challenge the explorative ability of the WHO algorithm, the search range was set as large as possible and multimodal high-dimensional test functions were used. Testing on high dimensional search spaces is done to check if the performance degrades gracefully with increasing dimension [12]. The choice of the test suite of benchmark functions was guided by [17, 18] for large scale optimization problems. A brief description of the challenged posed by different benchmark functions is first provided before discussion of results. The Sphere function is a unimodal convex function. It is also differentiable and separable. The Sphere function tests the ability of an algorithm to quickly and accurately explore a single global minimum. Simple local search algorithms algorithms like Gradient Descent and Nelder-Mead Simplex method perform well on the Sphere function. Stochastic search algorithms are designed for test functions with many local minima and hence perform poorly on unimodal functions because computational effort is wasted by exhaustive random search of the solution space. If a function is known to be convex very efficient convex optimization techniques exist and so global optimization methods should not be used. The Sphere function tests the local search or exploitative ability of a global optimization algorithm. Zakharov is also a unimodal function but with a flat region near the minimum. Flat regions are quite challenging to optimization algorithms since local movement around any point in flat regions do not lead to a reduction in the function value. Since all search directions are equally feasible inside flat regions, a random search must be performed to explore flat regions. The Rastrigin, Ackley and Griewangk functions are highly multi modal functions which test the ability of the algorithm to escape local minima and locate the true global minimum. The minimum of the Noisy Sphere function, is offset by a Gaussian random number vector. This tests the robustness of the proposed algorithm to incorrect or noisy function values. Rosenbrock function has a flat narrow ridge that makes finding good search directions extremely challenging. Rotated version of the Ackley and Griewank test functions were obtained by applying Salomon’s random coordinate rotation technique. The global minimum of Shifted-Sphere, Shifted-Rosenbrock and Shifted-Griewangk is randomly located in the hypercube [– 10, 10]D. Shifted and rotated test functions are used to weed out algorithms that have an inherent systemic bias. Algorithms which are biased to a predefined point like [0, 0, ... 0] will perform badly on shifted test functions. Algorithms that have a systemic bias in the search direction will perform poorly on rotated test functions. Thus rotated and shifted test functions provide a measure of the true performance of an optimization algorithm because they provide an unbiased performance metric.
Benchmark Test functions used for comparison
Benchmark Test functions used for comparison
The parameters of the WHO algorithm were set to perform uniformly well on all test functions chosen. A population size of 20 was found to be adequate and provide good performance for all dimensions up to 100. The population size and the number of iterations is usually progressively increased with increasing dimension [6]. However WHO algorithm demonstrated good performance on the entire test suite with the population size fixed at 20. Parameters α1, β1control the local movement of the wildebeest towards region of higher grass density and were empirically observed to provide good local search ability when set to 0.9 and 0.3 respectively. Parameters α2, β2 control the global movement of the wildebeest and were observed to provide good exploration ability when set to 0.2 and 0.8 respectively. These parameter choices were empirically observed to provide a good tradeoff between exploration and exploitation. Improper parameter settings can result in premature convergence to local minima or divergence to infinity.
Figure 2 illustrates the linear combination of two wildebeest’s positions x p and x h . The choice of α determines the movement of the wildebeest. If α is 0.5 then it is midway between the two members. Since good solutions need to be explored near x h the value was set to be 0.9. If the value of α is more than 1, it will overshoot the position of the wildebeest.
The performance of the proposed WHO algorithm was compared with GA, PSO, ABC, SA and GSA. The algorithms were tested on unimodal, multimodal, rotated and shifted benchmark functions for 30, 50 and 100 dimensions. The average performance over 50 independent trials was considered to reduce the effects of lucky or unlucky random initial positions. The mean, median and standard deviation for the 50 independent runs for each test function is presented in Tables 4 6.
Test results for 30 dimensions
Test results for 50 dimensions
Test results for 100 dimensions
The mean and standard deviation are significantly influenced by outliers therefore the median is also provided for comparison. The test results presented in Tables 4 to 6 shows that the proposed WHO algorithm is competitive with other popular algorithms. Table 4 compares the WHO algorithm with other algorithms for 30 dimensions. WHO algorithm returns better results than the other algorithms for all the functions. WHO algorithm returns zero within machine precision for the step function. GSA provides the second best solution for Schwefel 2.21 and the shifted versions of the Sphere, Rosenbrock and Griewangk. GA SA,ABC and PSO underperform in almost all the test functions
Table 5 presents the results for 50 dimensions. It is observed that the proposed WHO algorithmperforms significantly better for 50 dimensions. GSA algorithm which performed competitively well with the WHO algorithm in 30 dimensional search space, underperforms in 50 dimensions. As in the case of 30 dimensions, GA, SA, ABC and PSO underperform in almost all the test functions.
The results for 100 dimensions is presented in Table 6. It is observed that the WHO algorithm returns the best result for all the test functions. The significantly better performance of the WHO algorithms for 100 dimensions clearly demonstrates its superior explorative ability. The WHO algorithm is also observed to display graceful degradation of performance with increasing dimension in contrast to other algorithms. GSA, ABC, SA, GA and PSO return considerably poor results for 100 dimensions clearly indicating the steep degradation in the performance with increase in the dimension.
Convergence plots are shown in Figs. 3 10 for the 100 dimension test cases. Figures 3 10 illustrate the convergence characteristics of the algorithms for Griewangk, Schwefel2.21, Rotated Ackley, Noisy sphere, Step, Ackley, Rastrigin and Sphere respectively. Figures 3 10 indicate that the WHO algorithm approaches close to the global optimum in less than 200 iterations for a majority of test functions.

Convergence characteristics of Griewangk function.

Convergence characteristics of Schwefel 2.21 function.

Convergence characteristics of Rotated Ackley function.

Convergence characteristics of Noisy Sphere function.

Convergence characteristics of Step function.

Convergence characteristics of Ackley function.

Convergence characteristics of Rastrigin function.

Convergence characteristics of Sphere function.
Many optimization algorithms perform poorly when the global minimum is significantly shifted from the origin. These algorithms have a systematic bias and converge to points near the origin even when the global minimum is not located at the origin. Thus to obtain a realistic measure of performance on real world problems test functions with global minimum significantly shifted from x = (0, 0, . . . , 0) must be considered. Results for large shifts is presented in Tables 4, 5 & 6. The results clearly show that WHO algorithm performs significantly better than all other algorithms taken for comparison on test functions with large shifts.
The time complexity of stochastic optimization algorithms can be measured by computing the average time required to achieve a predefined accuracy. This average must be take over a large number of independent runs to reduce the effects of lucky or unlucky random number sequences generated during the run of the algorithm. In this paper the average over 30 independent trials is taken since the average of 30 independent random variables is approximately Gaussian according to the Central Limit Theorem. Table 7 below shows the average time required to compute the global minimum to an accuracy of 0.0001 for 50 dimensions. This accuracy is chosen since it is sufficient for most engineering applications.
Test results on mean running time for 50 dimensions
Test results on mean running time for 50 dimensions
The functions that were taken for comparison represent the unimodal, multimodal and shifted benchmark test functions. The functions f1 to f6are Griewangk, Step, Rotated Griewangk, Sumsquares, Noisy Sphere and Shifted Sphere respectively. The best mean running time is highlighted in bold. Table 7 clearly indicates that the proposed algorithm converged to the pre-set accuracy with the smallest mean running time in most of the test functions.
In this paper a new meta-heuristic global optimization algorithm inspired by Wildebeest herd behaviour was proposed and shown to perform significantly better on high-dimensional and shifted test functions. The proposed WHO algorithm was based on a simplified mathematical model of Wildebeest milling behaviour, herd instinct, starvation avoidance instinct, population pressure and herd social memory. Extensive simulation results on 15 benchmark test functions indicated that the WHO algorithm is highly competitive with well-known global optimization algorithms like GA, GSA, PSO, ABC and SA especially on high dimensional and shifted test functions. In particular the WHO algorithm exhibited superior explorative ability and performed significantly better on high dimensional test functions with global minimum significantly shifted from x = (0, 0, . . . , 0). The WHO algorithm also demonstrated graceful degradation in performance with increasing dimension compared to other algorithms. Future work can explore parallel implementation and variations of the WHO algorithm for discrete (combinatorial)optimization.
