Abstract
Antlion Optimization Algorithm (ALO) is a promising bionic swarm intelligence algorithm, which has good robustness and convergence, but there are still many areas to be improved and modified. Aiming at the fact that the ALO algorithm is more likely to fall into the local optimum, proposes three strategies to improve the classic ALO algorithm in this paper. First of all, we adopt a parallel idea in the algorithm, through the communication strategy between groups based on Quantum-Behaved to enhance the diversity of the population. Secondly, we adopted two strategies, Opposition Learning, and Gaussian Mutation, to balance the performance of exploration and exploitation during the execution of the algorithm, further formed the MSALO algorithm. The CEC2013 Benchmark function is selected as the standard, and MSALO is compared with other intelligent optimization algorithms. The experimental results show that MSALO has stronger optimization performance compared with other intelligent algorithms. Besides, we applied MSALO to the practical scenarios of feature selection, and use SVM classifiers as training evaluators to improve the accuracy of feature extraction from high-dimensional data.
Introduction
Optimization refers to the process of finding the best value for some variables or combination schemes of a specific problem based on certain constraints so that a certain performance of the problem can be optimized [38]. In our daily life and work, we will encounter various optimization problems, which are widely present in the fields of mathematical science, intelligent computing, industrial control, distribution and scheduling, and play a central role. Intelligent Computing is divided into three branches: Artificial Neural Network(ANN), Genetic Algorithm(GA) [40, 41], and Fuzzy [51], reasonable optimization algorithms have an increasing influence on intelligent computing. Optimization algorithms can be divided into Exact algorithms and Heuristic algorithms [31], where Exact algorithms such as branch and bound algorithm, background segmentation algorithm, dynamic programming, and other computing more complex, only suitable for small-scale problem solving; Heuristic [26] algorithms are divided into meta-heuristic and specific. Among them, meta-heuristic has individual-based algorithms, such as Simulated Annealing Algorithm (SA) [13], which applies the physical process of solid heating annealing to the algorithm; Tabu Search Algorithm (TS) [19, 30], which uses flexible memory technology to choose. There are also population-based algorithms, such as Genetic Algorithm (GA) [36], which refers to the mechanism of natural selection and biological genetic evolution; Particle Swarm Optimization (PSO) [18, 57], which simulates the behavior of migration and clustering in the predation process of birds [12, 32]; Ant Colony Algorithm (AC) [11, 43], which simulates the behavior of path marking when ants foraging [37]; Differential evolution algorithm (DE) [27] uses the principle of cross mutation in genetics. With the advancement of science and technology, more and more complex high-dimensional optimization problems will be encountered in many modern disciplines and scientific fields. When we use traditional optimization methods to solve, there will always be problems such as high complexity, a large amount of calculation, long calculation time, etc. Especially for some difficult and nonlinear combinatorial optimization problems. Traditional methods need to traverse all search solutions in the optimization process, which further increases the search range and optimization difficulty. Therefore, the improvement of algorithms has become a hot topic in society, and look for more efficient optimization method is particularly important during the development of a variety of algorithms. In recent decades, optimization algorithms have received extensive attention from all walks of life. To solve more complex optimization problems and real scenarios. Research scholars based on the existing classic optimization evolutionary algorithm [5, 53], combined with a variety of innovative strategies to synchronize the improvement of the algorithm, such as Parallel Strategy [20, 35], Adaptive Parameter Technology [3, 33], Opposite Learning Strategy, Neural Network Technology [56], Neighborhood Operation [6, 48], Chaos Mechanism, QUATRE [15, 34], etc. Combine these strategies organically, learn from each other’s concepts, enabling the algorithm to achieve a more efficient global optimization.
The ALO algorithm is inspired by the foraging behavior of antlions and obtains the global optimal solution by simulating the trap mechanism of antlions. ALO was first proposed by Australian scholar Mirjalili in 2015. As a meta-heuristic algorithm, because of its better robustness, convergence, and fewer adjustable parameters, and is not bound by the object, at this point, many scholars have proposed a variety of improvement strategies and applied the improved algorithms to multiple fields. For example, Zhang and others applied the Antlion algorithm to the problem of air material configuration optimization; Zhao et al. optimized the SVM parameters with the improved ALO [55]; Chen et al. Applied the ALO algorithm to the optimization of PID controller parameters; Xu and others applied ALO improved by the hybrid strategy to the optimization of wireless sensor network coverage. First of all, we divide the population into several groups using parallelism and communicate between groups based on based on quantum-behaved. Besides, we combine The Opposition Learning Strategy and The Gaussian Mutation Strategy [7] to improve the Antlion algorithm to form MSALO. We selected 28 test functions of CEC2013 as the standard, including single-mode functions, basic multi-mode functions, to test the performance of the algorithm, and compared with other algorithms in terms of optimization capabilities. Experiments show that MSALO is superior to other algorithms in optimization ability, and is superior to the original ALO.
Feature Selection [14, 52] is an important method for dimensionality reduction of high-dimensional big data. It is a process of screening data features and obtains a set of optimal feature subsets by reducing irrelevant features and redundant features, And this feature subset does not lose important data but preserves the true meaning of the data set. We use the wrapper method combined with our MSALO algorithm for feature selection and specify the SVM classifier as our learning algorithm to train the evaluator on the initial feature set, to select a subset of features that can achieve high performance for real-world application problems, to reduce data dimensions and improve algorithm optimization efficiency without affecting data accuracy.
Section II will briefly introduce several optimization algorithms, such as PSO and the original ALO algorithm, and our strategy to improve ALO. Section III introduces MSALO in detail and how to apply it to feature selection. The results of The Section IV performance demonstrates MSALO CEC2013 Benchmark suite, and compared with other algorithms. Finally, in Section V, the experimental results of the comparison and analysis of the fourth quarter, and draw conclusions.
Related works
ALO algorithm
Ants walk randomly and fall into traps
ALO algorithm [2] simulates the trap mechanism used by antlions when preying. The antlion will dig a cone-shaped sandpit, then hide at the bottom of the sandpit and wait for the ants. When the ant falls into the sandpit, it will slide to the bottom of the pit. Recorrect the pit to prepare for the next hunting. Among them, the ant represents the attempted solution; the antlion represents the local optimal solution; the elite antlion represents the global optimal solution. Ants will walk randomly within the search range, and the formula for moving is
Antlions will build traps proportional to their fitness. We use mathematical methods to simulate the process of ants sliding down the trap, and adaptively reduce the radius of the trap, that is, the swimming radius of the ant.
The Individual learning process is similar to the Quantum-Behaved [9], and every particle in quantum space can simulate an individual. According to the Aggregation properties of swarm intelligence and the movement of particles in quantum space, we know that particles will constantly approach the point p
i
. There is potential energy at point pi that attracts the particles to reach a bound state, which is the particle in the bound state. There will be a certain probability to appear at any point in space so that it can search in quantum space under the attraction of potential energy. Based on Quantum behavior, the evolution equation is derived by the Monte Carlo method as follows,
In MSALO, we use the idea of quantum-behaved. Every 11 iterations of the algorithm, we select several better ones at the same time Point, use Equ.(12) to update the better antlion, and select a few weak points from the antlion population, use Equ.(13) to perform the position of the poor antlion. The difference between Equ.(12) and Equ.(13) is that Equ.(12) needs to judge the updated antlion after updating the better antlion. Whether the antlion is more than the antlion before the update, if we can make the better antlion better, we will replace it. while Equ.(13) directly updates the position to make the poor antlion better [9]; This is a greedy strategy in which based on quantum-behaved, the evolution equation is derived by Monte Carlo method as follows,
When the optimization algorithm is performing optimization, the random solution in the search space may be far from the optimal solution, and it takes a long time to converge. Quasi-Opposite Learning [44] is an optimization strategy that increases the diversity of the population by solving the opposite solution of the current solution, and then selects the best of the population and the opposite population.
Gaussian Mutation [17] is to add the disturbance term of the normal distribution to the original solution vector. During the mutation, the direction is guided by the change of the elite antlion, which not only increases the diversity of the population but does not affect the local development.
We select an improvement rate γ as the standard, select the best fitness antlion of the current iteration minus the best fitness antlion of the previous tenth iteration, and divide the absolute value of the difference by the best fitness of the current iteration, When the value is less than the improvement rate γ, the gaussian mutation is performed, and the mutation formula is as Equ.(15).
In the data to be processed in real life, an object has multiple attributes, and data with too many attributes will cause dimensionality disasters when applied. The characteristics of the data can be divided into related features, irrelevant features, and redundant features. The factors that cause dimensional disasters are irrelevant features and redundant features. Feature selection is very important in the field of machine learning and big data.
In addition, another reason for feature selection is: when we need a large number of features, too many or too few features will have a great impact. When the features are insufficient, data overlap is prone to occur, in which case any classifier will fail. As shown in Fig. 1, relying only on the x-axis or y-axis cannot distinguish the two types of data. Adding features can be understood as mapping to a high-dimensional space. When this "dimensionality" is too high, it is easy to cause similar data to be far away in the space and become sparse, which also easily makes many classification algorithms invalid. As shown in Fig. 1, only the x-axis can be divided into features, but the introduction of the y-axis makes the same category no longer clustered. Furthermore, we know that there is a method similar to feature selection, that is, data dimensionality reduction. The effect of the two is the same, that is, to reduce the features in the feature data set. The difference between simple dimensionality reduction and feature selection is that the former is combined through the linear relationship between multiple features to become a new feature, which changes the space of the original feature set; while The latter is to select beneficial features as feature subset. The advantage is to remove features that don’t affect the results of the learning algorithm, do not change the space of the original feature data set, and maintain the authenticity and validity of the feature data set.

Two phenomena caused by the number of features.
Fig. 2 shows the four processes of feature selection. First, the Generation process is to search for combined data to generate feature subsets; The second is the Evaluation function, which evaluates the generated feature subsets; The third is the Stop condition to determine the stop of feature selection; The fourth is the Verification process to verify whether the selected feature subset is valid.

Feature selection process.
In this paper, we use the Wrapper and select the SVM classifier for feature selection.
MSALO algorithm
In terms of improving the ALO algorithm, we first adopted a parallel idea, dividing ants and antlions into groups for optimization, and comparing and exchanging information between groups based on quantum-behaved in the communication between groups. Besides, we use two strategies, opposition learning, and Gaussian mutation, to increase the diversity of the population and prevent the algorithm from stagnating to the local optimum. The steps of the algorithm are as follows: The first step is to initialize variables and population; In the second step, the ant go through the process from Equ.(1) to Equ.(8), but the process of ant learning from the opposite is added; The third step is to iterate continuously, every 11 iterations will find the global optimal value, and use the communication strategy based on quantum-behaved to update the Antlion position. Among them, if there is little change between the current optimal iteration and the first ten iterations optimal, that is, when Equ.(15) is less than the improvement rate, Gaussian mutation is performed.
The application of MSALO in feature selection
In this paper, we choose Wrapper for feature selection, use LIBSVM [1, 4] as the evaluation criterion for feature subsets, use recursive feature elimination (RFECV) to evolve the training set and test set in SVM [16], and combine the optimization process of MSALO to select The best feature subset.
SVM and LIBSVM
SVM is essentially a two-class model, which can not only learn the characteristics of known samples but also predict unknown samples. Its basic principle is to find an optimal classification hyperplane in the n-dimensional data space. The hyperplane equation is

Two-dimensional classification example.
Vectors, which determine the position of the line.
When f (x) is equal to 0, x is a point on the hyperplane, and the point with f (x) greater than 0 corresponds to the data class represented by the circle, and the point with f (x) less than 0 corresponds to the data class represented by the Pentagram.
We need to find the optimal value of weight ω and threshold b to minimize the weight cost function Equ.(17).
For the problem of linear inseparability, SVM maps the samples in the n-dimensional space to the higher-dimensional space, so that the samples in the high-dimensional space are linearly separable. Choose different kernel functions according to the nature of the problem and the size of the data.
In the paper, we choose a simple, easy-to-use, fast, and effective software package LIBSVM developed and designed by Professor Lin for SVM pattern recognition and regression, and it provides many default parameters and can interact. Inspection is currently the most widely used SVM library in China. The definition of the RBF kernel function we used is as follows.
In this article, we choose Wrapper for feature selection, use LIBSVM as the feature subset evaluation function, use recursive feature elimination (RFECV) to evolve the training set and test set in SVM, and combine the optimization process of MSALO to select The best feature subset. The main steps of MSALO applied to feature selection are as follows and the detailed description is shown in Fig. 4.

Flowchart of feature selection by MSALO.

omparison of the best ?tness for functions f1 - f4 with 30D optimization.

Comparison of the best ?tness for functions f5 - f28 with 30D optimization.
Step1, Initialization parameters, and population;
Step2, Normalize the data set and divide the data set into 10 points, of which 1/10 is used as the test set;
Step3, Use libsvm for 10 times of training, and verify on the test set, and calculate the error rate of all subsets through LIBSVM;
Step4, Use MSALO to optimize the feature subset and find the feature subset with the smallest error.
In this section, we choose CEC2013 BenchMark [23] to test the performance of our improved algorithm and introduce the results of the MSALO application in feature selection.
Simulation results of CEC2013 benchmark test of MSALO
CEC2013 BenchMark contains 28 benchmark functions, including 5 Unimodal Functions(f1 - f5), 15 Basic Multimodal Functions(f6 - f20), and 8 Composition Functions(f21 - f28). In order to further verify its performance, we compare the proposed minimum value of MSALO with ALO, PSO, PPSO, SOS [9], BA. The spatial dimension of these algorithms for optimization is 30, the search range is [-100, 100], and the parallel algorithms that need to be grouped are divided into 4 groups.
As shown in Table 1, it is the Mean and Standard deviation of the optimal value result of each algorithm compared in the CEC2013 benchmark function test, where the value in bold is the minimum fitness value for the comparison of the six algorithms. The MSALO algorithm is better than other algorithms in 13 benchmark functions of f6, f8, f9, f11, f14, f15, f17, f18, f19, f22, f23, f25, and f28, and can reach the optimal fitness value; The ALO algorithm is tested in the 5 benchmark functions of f1, f5, f10, f16, f21, the PPSO in the 3 benchmark functions of f2, f4, and f12, and the SOS algorithm in the 6 benchmark functions of f7, f13, f20, f24, f26, f27, the BA algorithm in the benchmark function of f3, the optimization performance is better than other algorithms. Among the 28 benchmark functions, the PSO algorithm does not get the optimal value in comparison with other algorithms.
The mean and standard deviation of 20 runs of 28 functions
The mean and standard deviation of 20 runs of 28 functions
The error rate of the algorithm for feature selection
The global optimal iterative process of the 6 algorithms based on 28 benchmark functions is shown in the Figs. 10 and 11. We can see that in the process of optimizing the MSALO algorithm and other algorithms on 28 functions, the convergence speed and convergence results of the MSALO algorithm are better in most functions. According to the optimization results and the optimization process, the MSALO algorithm has better optimization capabilities than other algorithms and faster convergence.
In this section, we show the results of the feature selection of our proposed MSALO, and compare them with the results of feature selection of ALO, PSO, PPSO, SOS, BA to verify the performance of MSALO in practical applications. This application experiment is to optimize the error rate with a single objective. The data set is Parkinson (used to detect and classify Parkinson’s disease), and the optimized result is the error rate. The table summarizes the comparison results of the algorithm for feature selection, and the figure shows the error rate of the algorithm for application. From the table, we can see that in the environment of 30-dimensional space, 100 individuals, and 1000 times, the error rate of the MSALO algorithm for feature selection is 0.1838, which is much smaller than the error rate of other algorithms, and the error is small. Therefore, MSALO has better accuracy in single target feature selection.
Conclusions
As a new intelligent optimization algorithm, ALO has been applied in many fields such as 5G [47, 50] and smart grid [49], and achieved results. However, it may still fall into a local optimum. For this, we propose an improved ALO with multiple strategies based on quantum-behaved, called MSALO. MSALO adopts a parallel idea, divide the population into several subgroups for independent optimization, and then communicates between the subgroups based on quantum-behaved so that the better solutions in the population become better, and the poor solutions are replaced with good points, Always maintain the quality of the population, thereby improving the convergence speed of the algorithm. In addition, MSALO combines the Gaussian mutation strategy and the quasi-adversarial learning strategy to enhance the diversity of the population and prevent the algorithm from falling into a local optimal stagnation. Use 5 Unimodal Functions, 15 Basic Multimodal Functions, and 8 Composition Functions to compare the optimization process and results of MSALO with several optimization algorithms. The comparison result shows that MSALO is better than ALO, PSO, PPSO, SOS, BA intelligent algorithms. In addition, we apply MSALO to feature selection and use Wrapper to optimize the error rate of feature selection. The experiment shows that MSALO can be effectively applied to feature selection for data dimensionality reduction. In future work, we will further study more efficient and comprehensive optimization algorithms, innovative communication strategies, and improvement strategies to better improve algorithm efficiency and performance. For the application of the algorithm, there are many aspects to discuss, for example, use different search strategies and evaluation functions for feature selection; feature selection in binary and non-binary forms; try to apply the algorithm to multi-objective feature selection. Through further research and improvement, more efficient feature selection will be applied to scenarios such as document processing, drug diagnosis, and gene selection.
Footnotes
Acknowledgment
This work was support edinpart by National Natural Science Foundation of China under Grant No. 619761-26, Shandong Natural Science Foundation under Grant No. ZR2019MF003, No. ZR2017MF054.
