CBDA: Chaos-based binary dragonfly algorithm for evolutionary feature selection

Abstract

The goal of feature selection in machine learning is to simultaneously maintain more classification accuracy, while reducing lager amount of attributes. In this paper, we firstly design a fitness function that achieves both objectives jointly. Then we come up with a chaos-based binary dragonfly algorithm (CBDA) that incorporates several improvements over the conventional dragonfly algorithm (DA) for developing a wrapper-based feature selection method to solve the fitness function. Specifically, the CBDA innovatively introduces three improved factors, namely the chaotic map, evolutionary population dynamics (EPD) mechanism, and binarization strategy on the basis of conventional DA to balance the exploitation and exploration capabilities of the algorithm and make it more suitable to handle the formulated problem. We conduct experiments on 24 well-known data sets from the UCI repository with three ablated versions of CBDA targeting different components of the algorithm in order to explain their contributions in CBDA and also with five established comparative algorithms in terms of fitness value, classification accuracy, CPU running time, and number of selected features. The results show that the proposed CBDA has remarkable advantages in most of the tested data sets.

Keywords

Feature selection dragonfly algorithm chaos evolutionary population dynamics classification accuracy

1. Introduction

The expansion of data in fields such as data mining, pattern recognition, and machine learning has resulted in an increase in unnecessary and repetitive information, hindering the development of learning algorithms [1]. In high-dimensional data sets, there may be numerous unhelpful or misleading features. Therefore, it is crucial to find ways to efficiently select valid data by removing irrelevant ones from the data set being analyzed [2].

Feature selection is a type of dimensionality reduction technique that aims to choose the most optimal subsets from entire data sets while maintaining the classification accuracy of the original data [3]. Feature selection methods are commonly used in various fields, including active matter [4], supervised learning [5], compression, and cluster analysis in spectroscopy [6], among others.

There are three main types of feature selection methods: filters, wrappers, and embedded approaches [7]. The filter-based methods assess features based on their statistical scores, keeping those with high scores and discarding those with low scores [8]. The wrapper-based methods utilize a defined classification algorithm, such as a machine learning classifier, to determine the quality of feature subsets. This method typically achieves better classification accuracy than filter-based methods [9]. The embedded approach is a specific case of the wrapper-based method, as it is integrated into the learning process [10].

Feature selection involves selecting subsets of features and evaluating their effectiveness. The search method used for feature selection can be categorized into three types: complete, random, and heuristic search. In the complete search method, all possible feature subsets are evaluated, which can be impractical for high-dimensional data sets since the number of combinations is $2^{N}$ (where N is the number of features). In the random search method, potential feature subsets are combined randomly until an optimal subset is found. However, if the entire data set is selected as a subset, the random search method is equivalent to the complete search method in the worst-case scenario [11]. As an alternative, the heuristic search method is commonly used because it incorporates heuristic information to guide the search process, combining the advantages of both complete and random search methods [12].

Swarm intelligence algorithms are a type of meta-heuristic algorithm that generates a population of initial solutions and updates them at each iteration based on the previous iteration. Compared to traditional methods, swarm intelligence algorithms present distinct advantages in dealing with feature selection problems [13]. Several swarm intelligence algorithms, such as particle swarm optimization (PSO) [14], invasive weed optimization (IWO) [15], ant colony optimization (ACO) [16], gray wolf optimization (GWO) [17], artificial bee colony (ABC) [18], and grasshopper optimization algorithm (GOA) [19], are commonly applied to handle optimization problems. Among these algorithms, dragonfly algorithm (DA) was originally proposed by Mirjalili et al. in 2016 [20] that simulate the behaviors of dragonflies in nature, which can effectively contribute to find the global optimal solutions [21] in some optimized problems. Nevertheless, the deficiencies of DA are also obvious, such as being designed for continuous optimization problems and lacking the ability to converge quickly or avoid local optima when used for complex optimization problems [22]. For this reason, there is ongoing research focused on enhancing the efficiency of conventional DA for feature selection optimization.

The key accomplishments of this paper can be outlined as follows:

Our goal is to achieve two optimization objectives: reducing the number of selected features while improving classification accuracy. To achieve this, we have developed a fitness function aimed at optimizing both objectives simultaneously.

We have developed a novel algorithm named the chaos-based binary dragonfly algorithm (CBDA) to address the fitness function we formulated. CBDA leverages a chaotic map to enhance the exploitation potential of the traditional dragonfly algorithm by regulating the primary parameters of dragonflies’ movements. Moreover, we have integrated a fitness-based evolutionary population dynamics (EPD) mechanism to maintain a balance between the algorithm’s exploration and exploitation capabilities. Lastly, we have introduced a binarization technique to transform the continuous solution space into binary, rendering it appropriate for feature selection solutions.

We have carried out comprehensive experiments to assess the efficacy of CBDA against other algorithms on 24 well-known UC Irvine (UCI) Machine Learning Repository data sets. Furthermore, we have evaluated the algorithms based on various criteria such as fitness value, classification accuracy, CPU running time, and the number of selected features. In particular, the experiments are categorized into two sets: the ablation experiments involving different iterations of CBDA, and the comparative experiments involving five established optimized algorithms (notably, these include four meta-heuristic algorithms and one neural network algorithm). Furthermore, the superoirity of CBDA are additionally consolidated and outlined.

The remaining sections of this paper are structured as follows. In Section 2, we provide a concise overview of the relevant literature. Section 3 outlines the background of the research focus in this paper. Section 4 provides an in-depth explanation of CBDA. Section 5 presents the outcomes of the experimental analysis. Lastly, for the Section 6, we make a conclusion to the whole research and suggest potential areas for future research.

2. Related work

The wrapper-based selection methods are playing important roles in feature selection optimizations [23]. The wrapper-based methods treat feature selection as a black box, where meta-heuristic algorithms and classifier are deployed to obtain the optimal subset [24]. Many classical meta-heuristic algorithms have been modified to solve the feature selection problems, such as binary bat algorithm (BBA) [25], bare bones particle swarm optimization algorithm (BPSO) [26], binary gray wolf optimization algorithm (BGWO) [27], binary gravitational search algorithm (BGSA) [28], and so on.

In recent years, more and more new-designing algorithms are proposed to optimize various wrapper-based feature selection problems because of the importance they play. For instance, Hou et al. [29] propose a binary improved fruit fly optimization algorithm (BIFFOA) and employ four different EPD strategies to enhance the BIFFOA in solving feaure selection problems. Thaher et al. [30] propose an efficient feature selection approach based on a Boolean variant of BPSO boosted with EPD, aiming at avoiding local optima obstacles via boosting the algorithm’s exploration ability. Mafarja et al. [31] innovatively propose an enhanced hybrid meta-heuristic approach using GWO and WOA to alleviate the drawbacks of both algorithms in feature selection. In [32], Xue et al. propose a self-adaptive particle swarm optimization (SaPSO) algorithm with a typical self-adaptive mechanism for feature selection, particularly for large-scale feature selection. In literature [33], the researchers utilize binary variants of the Butterfly Optimization Algorithm (BOA) to select the optimal feature subset for classification purposes in a wrapper-mode. In [34], Paniri et al. propose a novel multi-label relevance-redundancy feature selection method named MLACO, which is based on Ant colony optimization (ACO) to search in the features space to find the most promising features through introducing two unsupervised and supervised heuristic functions. Alilbrahim et al. [35] present a hybrid optimization approach combining with the slap swarm algorithm (SSA) and PSO to improve the efficacy of the exploration and the exploitation steps. Zhang et al. [36] introduce the Gaussian mutation operator and the chaotic local search strategy into the basic fruit fly optimization algorithm (FOA) to improve the exploitative tendencies and enhance the local searching ability of FOA in dealing with feature selection problems.

There are some state-of-the-art researches related to DA algorithms in feature selection optimizations. DA was designed to solve the continuous problems, thus transfer functions are required to convert the conventional DA to a binary one (BDA) [37], by which enables it tackle feature selection optimizations. Hammouri et al. [38] use different strategies to update the values of its five main coefficients of BDA to solve feature election problems. Cui et al. [39] propose a hybrid improved dragonfly algorithm (HIDA), which combines the advantages of both mRMR and improved dragonfly algorithm (IDA) in order to generate promising candidate subset and achieve higher classification accuracy rate. Li et al. [40] try to solve feature selection problems by extending from BDA to develop an improved BDA (IBDA). The improved factors in their paper include EPD, crossover operator, and a novel binary mechanism. Tawhid et al. [41] present a new hybrid binary version of dragonfly and enhanced PSO algorithm named HBDESPO to solve feature selection problems. In order to prevent DA fall into local optimal solution, Chen et al. [42] put forward a spark-based BDA for feature selection, which integrates the global optimization ability of DA with the parallel computing ability of spark. In addition, in [43], the authors embed various chaotic maps into searching iterations of DA for discriminating features.

Despite the above works have successfully solved some or certain feature selection problems in various scenarios, according to no free lunch (NFL) theorem [44], there is no method can solve all problems in the field of optimizations. In addition, none of the above works can find the optimal subsets for all tested data sets. Due to the strong potential of DA in the area of feature selection optimizations, we plan to employ several improved factors into DA and solve feature selection problems more efficiently in this work.

3. Background

3.1 Population-based evolutionary computation algorithms for feature selection

Population-based evolutionary computation is a computational intelligence approach inspired by the principles of natural evolution. It involves creating a population of potential solutions and iteratively applying genetic operators such as mutation, crossover, and selection to evolve and improve the solutions over generations [45].

In the context of feature selection, evolutionary computation algorithms aim to search for an optimal subset of features that maximizes the performance of a given learning algorithm or model. By representing potential feature subsets as individuals in a population, these algorithms iteratively evaluate and evolve the solutions based on their fitness, which is typically determined by the performance of the selected features on a specific evaluation criterion [46].

However, the task of feature selection using evolutionary computation can be challenging due to several factors:

The large search space of possible feature subsets makes it computationally demanding to explore all possible combinations. The number of potential feature subsets grows exponentially with the number of features, leading to a combinatorial explosion [47].

The evaluation of fitness for each individual in the population requires training and evaluating the learning algorithm on the selected features. This process can be time-consuming, especially for large and complex data sets.

The presence of redundant or irrelevant features, noisy data, and the existence of interdependencies among features can further complicate the optimization process [48]. These factors introduce additional complexity and make it challenging to find the optimal subset of features that truly captures the relevant information for accurate and efficient learning.

Overall, using population-based evolutionary computation for feature selection lies in its ability to explore the vast search space of feature subsets. However, the inherent complexity of this problem, including the large search space, computational demands, and the presence of various challenges, makes it a difficult task to achieve optimal results.

Based on the preceding analysis, it is evident that exploring the adaptation of an existing nature-inspired algorithm into a population-based evolutionary computation algorithm presents a viable research approach for addressing the challenges associated with feature selection.

On the other hand, DA mimics the behavior of dragonflies in their search for prey. This algorithm utilizes the concept of swarming behavior, where dragonflies communicate and coordinate their movements to optimize their hunting efficiency. This collective intelligence enables DA to effectively explore the search space and find promising solutions for feature selection [49]. DA incorporates a combination of local search and global search strategies. The local search component allows dragonflies to exploit the local neighborhood and refine their solutions, while the global search component enables exploration of the entire search space to discover potentially better solutions. This balance between exploration and exploitation enhances the algorithm’s ability to find optimal or near-optimal feature subsets [50]. Additionally, DA employs adaptive mechanisms that dynamically adjust its parameters during the optimization process. These adaptive mechanisms improve the algorithm’s adaptability and robustness, allowing it to handle different types of data sets and feature selection objectives effectively.

Given the aforementioned benefits of DA in addressing feature selection problems, integrating genetic operators such as mutation and crossover makes DA a feasible and potentially efficient evolutionary computation algorithm for tackling feature selection problems.

3.2 Conventional DA

DA is a simulation of the swarm behaviors of dragonflies in natural environments, which can be divided into two types: static (feeding) swarm and dynamic (migrating) swarm. In the static swarm, dragonflies converge in clusters to search for prey in various areas. They create sub-swarms to hunt for food, which involves local movements and flight-path mutations. In the dynamic swarm, a large number of dragonflies migrate long distances in one direction. The static and dynamic clusters of dragonflies correspond to the global search (exploitation) and local development (exploration) in the algorithm, respectively.

There are five main behaviors of dragonflies in sub-swarms, which are separation ( $S$ ), alignment ( $A$ ), cohesion ( $C$ ), attraction toward food sources ( $F$ ), and avoidance of enemies ( $E$ ). In the algorithm, these behaviors are defined as follows.

The separation behavior involves avoiding collisions with other individuals. Mathematically, this behavior can be expressed as:

$\displaystyle S_{i}=-\sum\limits_{j=1}^{N}{(X-X_{j}})$ (1)

The equation comprises multiple variables. $X_{i}$ denotes the position of the $i^{\text{th}}$ dragonfly without encountering any collisions, whereas $X$ denotes the position of the current dragonfly. $X_{j}$ denotes the position of the $j^{\text{th}}$ individual in the sub-swarm, and ${N}$ represents the total number of individuals in the sub-swarm.

The alignment behavior involves matching the velocity of a dragonfly with that of other individuals in its vicinity. This behavior of alignment is modeled as follows:

$\displaystyle A_{i}=\frac{\sum\nolimits_{j=1}^{N}{V_{j}}}{N}$ (2)

the equation comprises the variable $V_{j}$ , which denotes the flying velocity of the $j^{\text{th}}$ dragonfly in the sub-swarm.

The cohesion behavior refers to dragonflies’ tendency to congregate towards the center of the sub-swarm. This behavior can be represented using the following model:

$\displaystyle C_{i}=\frac{\sum\nolimits_{j=1}^{N}{X_{j}}}{N}-X$ (3)

Following the cohesion behavior, dragonflies are drawn to a food source. This procedure can be modeled as:

$\displaystyle F_{i}=X^{\textit{food}}-X$ (4)

where $X^{\textit{food}}$ represents the position of food.

Dragonflies endeavor to maintain distance from their adversaries. The distraction mechanism can be computed using the following formula:

$\displaystyle E_{i}=X^{\textit{enemy}}+X$ (5)

where $X^{\textit{enemy}}$ represents the enemy position.

The step factor which integrates the combined with the five behaviors metioned above is as follows:

$\displaystyle\Delta X_{t+1}=(s\times S_{i}+a\times A_{i}+c\times C_{i}+f\times F% _{i}+e\times E_{i})+\omega\times\Delta X_{t}$ (6)

where $t$ represents the iteration counter, while $s$ , $a$ , $c$ , $f$ , and $e$ represent the weights of the separation, alignment, cohesion, food, and enemy factors in the Eqs (1)–(5), respectively. The variable $\omega$ represents the inertia weight.

In summary, the position updating method of the conventional DA is listed as follows:

$\displaystyle X_{t+1}=X_{t}+\Delta{X}_{t+1}$ (7)

It should be noted that in the absence of a nearby sub-swarm, DA utilizes the Lévy flight mechanism to facilitate the dragonflies’s search for additional potential solutions. This improves DA’s capacity for exploration. The technique for updating positions, in conjunction with Lévy flight, is outlined below:

$\displaystyle X_{t+1}=X_{t}+L\acute{e}vy\otimes X_{t}$ (8)

This article will not delve into the specifics of Lévy flight, but interested readers can refer to [37] for information on the Lévy mechanism employed in DA.

The key steps of the conventional DA are presented in algorithm 3.2.

: DA[1] $N$ : the population size $X_{i}$ ( $i=$ 1, 2, …, $N$ ): the populations (dragonflies) $t_{\max}$ : the maximum iteration $X^{\textit{food}}$ for $t=1:t_{\max}$ Calculate the fitness value of each dragonfly; Update the position of food (i.e., $X^{\textit{food}}$ ) and enemy (i.e., $X^{\textit{enemy}}$ ); Update the weights of $\omega$ , $s$ , $a$ , $c$ , $f$ and $e$ ; for $i=1:N$ if there is no neighborhood in the sub-warms Update the position of $i^{\text{th}}$ dragonfly by Eq. (8); else Calculate $S_{i}$ , $A_{i}$ , $C_{i}$ , $F_{i}$ , $E_{i}$ by Eqs (1)–(5), respectively; Calculate the step vector by Eq. (6); Update the position of $i^{\text{th}}$ dragonfly by Eq. (7); end if Update $X^{\textit{food}}$ if there is a better position of dragonfly in the swarm; Update $X^{\textit{enemy}}$ if there is a worse position of dragonfly in the swarm; end for end forReturn $X^{\textit{food}}$ .// $X^{\textit{food}}$ is the best solution obtained by DA

3.3 Problem formulation

Feature selection is a challenging optimization problem that involves multiple objectives. Typically, the two primary goals are minimizing the number of features selected and improving classification accuracy [51]. To address these objectives, we have developed a single-objective fitness function that considers both goals using the linear weighting method. The fitness function is designed to balance the trade-off between the two objectives and is expressed as follows:

$\displaystyle\textit{Fitness}(S)=\alpha\times\gamma(S)+\beta\times\frac{|R|}{|% C|}$ (9)

where the variable $\textit{Fitness}(S)$ denotes the measure of fitness obtained from the subset $S$ , $\gamma(S)$ denotes the classification error rate obtained from the $S$ subsets of features. In addition, $\alpha$ and $\beta$ are the coefficients utilized to balance the two optimization objectives. It is important to note that $\alpha+\beta=1$ , and both coefficients range from 0 to 1. Furthermore, $R$ represents the number of selected attributes in the subset, while $C$ denotes the total number of attributes in the data set.

We can see that the Eq. (9) has two parts: the first part represents classification accuracy, and the second part represents the number of selected features. In practice, if the goal is to achieve higher classification accuracy, the value of $\alpha$ should be increased, which will result in more features being selected. Conversely, if the goal is to select fewer features at the cost of lower classification accuracy, the value of $\beta$ should be increased.

In addition, the classification accuracy is dependent on the specific classifier being used. Previous research has shown that the $K$ -nearest neighbor ( $K$ -NN) method is a supervised learning algorithm that is straightforward to implement in a wrapper approach since it only requires consideration of the parameter $K$ [52]. Therefore, referred to the previous works [29, 31, 40], we have adopted the $K$ -NN method with $k=$ 5 as the classifier and used the Euclidean distance to calculate the distance between test and training samples in this study.

4. The proposed algorithm

Feature selection is a challenging problem that is often considered NP-hard [53], making it cannot be easily resolved through conventional algorithms. Population-based evolutionary computation algorithms which belong to the meta-heuristic algorithms, are promising methods for solving this problem. Nevertheless, these algorithms may not be effective in dealing with high-dimensional and discrete optimization problems, leading us to seek ways to improve existing evolutionary computation algorithms to solve the optimization problem. In this study, we have proposed CBDA, which combines a chaotic map, EPD mechanism, and binarization strategy with conventional DA to improve the effectiveness of solving feature selection problems.

4.1 CBDA

Based on the prior research and analysis [54], DA outperforms some other meta-heuristic algorithms to some extent. Although DA was initially designed to address continuous problems, the optimization of feature selection is discrete and the solution space is binary and considerably large. This may lead to conventional DA lacking the ability to efficiently exploit the solution space and solve such optimization problems. For this part, we introduce CBDA, an enhanced version of DA that incorporates several improved factors, including a chaotic map, EPD mechanism, crossover strategy, and binarization strategy, to make it well-suited for feature selection problems.

Since feature selection solutions are binary, the distance between dragonflies cannot be clearly determined in a discrete space, and thus, in CBDA, we assume that all dragonflies belong to a single sub-swarm.

Algorithm 4.1 outlines the overall structure of CBDA, and the details are provided below.

: CBDA[1] $N$ : the population size $N_{\textit{dim}}$ : the dimension size $X_{i}$ ( $i=$ 1, 2, …, $N$ ): the populations (dragonflies) $t_{\max}$ : the maximum iteration $X^{\textit{food}}$ for $t=1:t_{\max}$ Calculate the fitness value of each dragonfly; Sort the fitness values of populations $X_{i}$ ( $i=$ 1, 2, …, $N$ ) in ascending order; Update the position of food (i.e., $X^{\textit{food}}$ ) and enemy (i.e., $X^{\textit{enemy}}$ ); Reposition new solutions from the worst half of populations around the guide solutions and update the new populations by using Algorithm 4.1; Calculate the chaotic value of $\tau$ by using Eq. (11); for $i=1:N$ Calculate $s$ , $a$ , $f$ , $c$ , $e$ by Eq. (10); Calculate $\omega$ by Eq. (12); Calculate $S_{i}$ , $A_{i}$ , $C_{i}$ , $F_{i}$ , $E_{i}$ by Eqs (1)–(5), respectively; Calculate the step vector by Eq. (6); Update the position of $i^{\text{th}}$ dragonfly by Eq. (7); for $j=1:N_{\textit{dim}}$ For each dragonfly in the swarm, convert its $j$ th dimension from continuous to binary by using Eq. (17); end for Update $X^{\textit{food}}$ if there is a better position of dragonfly in the swarm; Update $X^{\textit{enemy}}$ if there is a worse position of dragonfly in the swarm; end for end forReturn $X^{\textit{food}}$ .// $X^{\textit{food}}$ is the best solution obtained by CBDA

4.1.1 Chaotic map in CBDA

Parameter determination is crucial for the performance of an evolutionary algorithm. However, adjusting five parameters ( $s$ , $a$ , $c$ , $f$ , and $e$ ) independently can be challenging and time-consuming. To address this, we adopt the similar tuning principle introduced in [20], which involves using the parameter $\tau$ to represent the five aforementioned parameters simultaneously. This is calculated as follows:

$\displaystyle\begin{split}\displaystyle s&\displaystyle=2\times\textit{rand}(0% ,1)\times\tau\\ \displaystyle a&\displaystyle=2\times\textit{rand}(0,1)\times\tau\\ \displaystyle f&\displaystyle=2\times\textit{rand}(0,1)\times\tau\\ \displaystyle c&\displaystyle=2\times\textit{rand}(0,1)\times\tau\\ \displaystyle e&\displaystyle=\tau\end{split}$ (10)

where $\textit{rand}(0,1)$ involves generating a random number between 0 and 1. The parameter $\tau$ also varies between 0 and 1 and controls the magnitude of the five parameters mentioned earlier. Therefore, determining the value of $\tau$ is crucial for the entire algorithm. In previous studies [40, 55], $\tau$ is a parameter that depends on the current and maximum iteration, and other parameters are introduced to determine its value. However, this may increase the computational complexity of the algorithm.

In DA, the initial populations are usually generated randomly, which can lead to a search lacking direction and cause DA to become stuck in local optima. To avoid this, it is crucial to improve DA’s capability to explore a wide range of solutions, which entails careful consideration of $\tau$ .

Chaos is a dynamic system that is extremely sensitive to its initial conditions and parameters. As a result of its properties, such as ergodicity, randomness, and irregularity [56], chaos can be utilized in optimization to generate chaotic numbers between 0 and 1 instead of relying on a pseudo-random number generator. Studies have demonstrated that utilizing chaotic sequences for population initialization, selection, crossover, and mutation can have a significant impact on the algorithm’s overall performance, often resulting in superior outcomes compared to using pseudo-random numbers [57].

This paper utilizes a chaotic map called Tent to modify the value of $\tau$ during each iteration. The Tent map is chosen because it exhibits unstable dynamic behavior and has shown to perform the best in our preliminary tests. The description of the map is as follows:

$\displaystyle\tau[k+1]=\left\{\begin{array}[]{ll}\frac{\tau[k]}{\lambda},&0<% \tau[k]\leqslant\lambda\\ \frac{(1-\tau[k])}{(1-\lambda)},&\lambda<\tau[k]\leqslant 1\end{array}\right.$ (11)

The variable $k$ represents the index of the chaotic sequence, and thus $\tau[k]$ refers to the $k$ th element of the sequence. Additionally, $\lambda$ is a variable that ranges between 0 and 1. In this paper, the initial values of both $\tau$ and $\lambda$ are set to 0.7 (i.e., $\tau[0]=0.7$ and $\lambda=0.7$ ), which has been found to attain the optimal results in preliminary testing (refer to Section 5.2.1 for details) and also is consistent with [43, 58, 59]. Figure 1 displays the curve of the map for $\tau$ over 500 iterations.

Figure 1.

The curve of $\tau$ using chaotic map.

Specifically, $\omega$ in Eq. (6) is calculated as follows:

$\displaystyle\omega=\psi-t\times\frac{0.5}{t_{\max}}$ (12)

where $\psi$ is the inertia weight that controls the size of $\omega$ and $\omega\in[\psi,\psi-0.5]$ .

4.1.2 EPD mechanism in CBDA

EPD is a technique that falls under the umbrella of self-organized criticality (SOC) [60]. SOC refers to a phenomenon where a small modification to a specific population can be amplified throughout the entire population and influence them in a complex manner without any external intervention [61]. EPD is a process inspired by evolutionary algorithms (EAs) that involves removing the worst solutions (i.e., individuals with the lowest fitness values) by repositioning them around the guide solutions (i.e., the selected individuals with the highest fitness values). In EPD, mutation and crossover operations are performed to eliminate the worst solutions, rather than simply replacing them with guide solutions. This approach can effectively maintain the diversity of the population. Hence, we have integrated EPD into CBDA to balance its exploitation and exploration capabilities.

The EPD mechanism in CBDA comprises the following fundamental steps: (i) selection of guide solutions, (ii) identification of the worst solutions, (iii) application of mutation to the guide solutions, (iv) execution of crossover between the worst solutions and the guide solutions, and (v) updating of the populations.

(a) The selection method for EPD

The selection method plays a crucial role in balancing the intensification and diversification aspects of EPD. There are several selection methods available, including best-based selection (BB), roulette wheel selection (RWS), linear rank-based selection (LRS), stochastic universal sampling (SUS), and others. However, not all selection mechanisms are appropriate for DA. Li et al. discovered in their study [40] that using LRS in DA yields better results. We have adopted a fitness-based selection method called exponential rank selection (ERS) as the selection method in EPD, inspired by their research. To maintain the diversity of populations in CBDA, we have introduced a novel approach called EPD_ERS.

Algorithm 4.1 outlines the primary steps of EPD_ERS, and additional information on ERS can be found in [62].

: EPD_ERS[1] $N$ : the population size $X_{i}$ ( $i=$ 1, 2, …, $N$ ): the populations (dragonflies) $X^{\textit{EPD\_ERS}}_{i}$ ( $i=$ 1, 2, …, $\frac{N}{2}$ ): the populations of EPD_ERS $X$ Sort the fitness values of populations $X_{i}$ ( $i=$ 1, 2, …, $N$ ) in ascending order and record the index of $D_{i}$ in an array $S$ ;for $k=0:\frac{N}{2}$ Calculate the chosen probability of the $k^{\text{th}}$ solution in $X_{S}$ by using Eq. (13); $X^{\textit{EPD\_ERS}}_{k}=$ the chosen probability; end for

for $j=\frac{N}{2}:N$ Select a solution $X^{E}$ from $X_{S}$ according to their probabilities (i.e., $X^{\textit{EPD\_ERS}}$ ); Mutate $X^{E}$ by using Eq. (14) and generate $X^{\textit{NEW}}$ ; Cross $X^{\textit{NEW}}$ with $X_{S[j]}$ by using Eq. (16); Update the population $X_{S[j]}$ by using EPD_ERS algorithm;end forReturn $X$ .// $X$ is the updated populations

As illustrated in Algorithm 4.1, each individual is sorted in ascending order based on its fitness value. Subsequently, each individual from the top half of the population will calculate its own chosen probability as follows:

$\displaystyle P_{k}=\frac{c^{M-k}}{\sum_{g=1}^{M}c^{M-g}}$ (13)

where $M=\frac{N}{2}$ , $c$ represents the base of the exponent which range from 0 to 1. In addition, $\sum_{g=1}^{M}c^{M-g}$ normalizes the probabilities to ensure that $\sum_{k=1}^{M}P_{k}=1$ . Furthermore, EPD_ERS will be used to select each individual from the worst 50% of the populations and generate a new solution by repositioning it around the guide solutions.

Figure 2.

The mechanism of EPD in CBDA.

(b) The mutation and crossover operation for EPD

Once the guide solutions have been identified for each member in the worst half of the population, mutation and crossover operations are employed to maintain the variety of population. This approach improves EPD_ERS’s ability to explore a broader range of solution space in CBDA. Furthermore, if the solution dimension is $N_{\textit{dim}}$ , it is unnecessary to modify every dimension of the solution in $N_{\textit{dim}}$ . Instead, it is more appropriate to employ mutation and crossover operators. In EPD_ERS, the mutation operator is defined as follows [29].

$\displaystyle x^{d}=\left\{\begin{array}[]{ll}1-{x^{d}},&\eta\geqslant\textit{% rand}(0,1)\\ {x^{d}},&\textit{otherwise}\end{array}\right.$ (14)

where $x^{d}$ represent the $d$ th dimension of a solution, $\eta$ is the mutation rate, and it is defined as follows.

$\displaystyle\eta=0.8-\frac{0.79}{1+{e^{5-\frac{10t}{t_{\max}}}}}$ (15)

The present guide solution, $X^{E}$ , is subjected to mutation using Eq. (14) to create a new solution, $X^{\textit{NEW}}$ .

The last step of EPD_ERS involves creating new solutions that lie between the worst solutions and guide solutions. To achieve this, the crossover operator is used with the aim of exploiting EPD. The crossover operator used in EPD_ERS is defined as follows, based on [63].

$\displaystyle{x^{d}}=\left\{\begin{array}[]{ll}x_{a}^{d},&\textit{rand}(0,1)% \leqslant 0.5\\ x_{b}^{d},&\textit{otherwise}\end{array}\right.$ (16)

$x^{d}$ denotes the $d$ th dimension of a solution in the above equation, while $x_{a}^{d}$ and $x_{b}^{d}$ represent the two dimensions of distinct solutions that are being crossed. Additionally, $x_{a}\in[X^{\frac{N}{2}},X^{N}]$ and $x_{b}\in X^{\textit{NEW}}$ .

As a result, we have successfully carried out all the steps of the EPD mechanism in CBDA. The proposed enhancements of EPD in CBDA are depicted in Fig. 2. The figure illustrates the process of selecting guide solutions using the ERS method, choosing the worst solutions in populations, applying the mutation operation to guide solutions, crossing the two solutions, and ultimately repositioning the worst solutions around the guide solutions. By doing so, we have achieved a balance between the exploitation and exploration capabilities of CBDA, making it a potentially superior algorithm for solving feature selection problems compared to other methods.

4.1.3 Binarization strategy

Conventional DA was designed to tackle continuous optimization problems, but feature selection involves binary dimensions in the form of discrete variables. Therefore, DA cannot be directly employed for feature selection. To address binary optimization problems without modifying their structure, transfer functions are frequently utilized [64]. There are three primary types of binarization methods [30]: two-step binarization, operator transformation, and K-means Transition Algorithm.

In this study, we have employed a $V$ -shape transfer function, which falls under the category of two-steps binarization and was initially introduced in [65]. The $V$ -shape transfer function can be recursively defined as follows.

$\displaystyle x_{t+1}=\left\{\begin{array}[]{ll}1-x_{t},&\textit{rand}(0,1)% \leqslant T(\Delta{X}_{t+1})\\ x_{t},&\textit{otherwise}\end{array}\right.$ (17)

where $x_{t+1}$ represents the binary solution for each dimension at the $(t+1)$ th iteration. Additionally, $T(\Delta{X}_{t})$ denotes the transfer rate, which can be computed using the following formula.

$\displaystyle T(\Delta{X}_{t})=\left|1-\frac{2}{1+{e^{\Delta{X}_{t}}}}\right|$ (18)

where $\Delta{X}_{t}$ indicates the step factor in Eq. (7).The curve of $T(\Delta{X}_{t})$ against $\Delta{X}_{t}$ is illustrated in Fig. 3.

Figure 3.

The curve of $T(\Delta{X}_{t})$ against $\Delta{X}_{t}$ .

Accordingly, in each iteration, the binarization technique has been integrated into CBDA to convert continuous solutions to binary ones without altering the structure of DA. Additionally, by integrating the chaotic map, EPD mechanism, ERS method, mutation and crossover operation, CBDA can effectively address feature selection problems.

4.2 Complexity analysis

This section provides an analysis of the complexity of CBDA. Assuming that the maximum number of iterations is $t_{\max}$ , the population size is $N$ , and the solution dimension is $N_{\textit{dim}}$ . As can be seen from Algorithm 4.1, there are three nested loops of CBDA. The complexity of conventional DA is $\mathcal{O}(t_{\max}\times N)$ , which means that CBDA only requires $N_{\textit{dim}}$ additional calculations for the binarization operation. Therefore, the complexity of our proposed algorithm is $\mathcal{O}(t_{\max}\times N\times N_{\textit{dim}})$ . However, CBDA may take longer computational time than DA as it involves extra operations, such as chaotic initialization and resorting. The exact computation time is difficult to predict and will be discussed in the followings.

Table 1
Benchmark data sets

No.	Data set	No. of features	No. of instance
1	Arrhythmia	278	452
2	Breastcancer	10	699
3	BreastEW	30	569
4	Dermatology	34	366
5	Diabets	8	768
6	Exactly	14	1000
7	Exactly-II	14	1000
8	Glass	9	214
9	HeartEW	14	294
10	Hepatitis	19	142
11	Hillvalley	100	606
12	Ionosphere	34	351
13	Krvskp	36	3196
14	Lung	326	72
15	Lung-Cancer	56	32
16	Lymphography	18	148
17	M-of-N	14	1000
18	Movementlibras	90	360
19	Semeion	265	1593
20	Sonar	60	208
21	Spect	22	267
22	Tic-tac-toe	9	958
23	Vowel	10	901
24	WDBC	31	569

Figure 4.

An example of binary representation of dragonfly.

Table 2

Parameter tunning results of $\psi$ on various data sets

Data set	Indicators	$\psi=$ 0.6		$\psi=$ 0.7		$\psi=$ 0.8		$\psi=$ 0.9		$\psi=$ 1
Hillvalley	Accuracy	0	.6153	0	.5906	0	.5890	0	.5903	0	.5909
	Features	44	.3	44	.8333	42	.9333	40	.8667	44	.1333
	Fitness	0	.3853	0	.4098	0	.4112	0	.4097	0	.4094
	Time	87	.2180	228	.9810	179	.9838	179	.6043	178	.3977
Movementlibras	Accuracy	0	.8259	0	.8065	0	.8062	0	.8041	0	.8044
	Features	39	.9667	43	.2	43	.1	42	.7667	44	.6
	Fitness	0	.1768	0	.1964	0	.1966	0	.1987	0	.1986
	Time	64	.5774	110	.6099	87	.3613	88	.0809	88	.1602
Sonar	Accuracy	0	.9221	0	.8862	0	.8884	0	.8871	0	.8838
	Features	22	.2333	26	.4667	29	.7667	28	.4667	29	.1
	Fitness	0	.0809	0	.1171	0	.1154	0	.1165	0	.1199
	Time	57	.3624	59	.7972	72	.2880	71	.9469	72	.0044

Table 3

Parameter tuning results of $\tau[0]$ and $\lambda$ in the Tent map

$\tau[0]$	Indicators	$\lambda=$ 0.1		$\lambda=$ 0.3		$\lambda=$ 0.5		$\lambda=$ 0.7		$\lambda=$ 0.9
0.1	Accuracy	0	.9154	0	.8842	0	.9029	0	.8833	0	.8875
	Features	125	.7333	136	.5	126	.8333	142	.4333	146	.5333
	Fitness	0	.0876	0	.1189	0	.1000	0	.1199	0	.1159
	Time	54	.9148	58	.6204	53	.5672	56	.9378	53	.1728
0.3	Accuracy	0	.8929	0	.9163	0	.9108	0	.8929	0	.8892
	Features	147	.3333	125	.7667	126	.1	130	.9	138	.6333
	Fitness	0	.1105	0	.0868	0	.0922	0	.1100	0	.1140
	Time	63	.1808	46	.2257	52	.9679	66	.8107	61	.8252
0.5	Accuracy	0	.8879	0	.8846	0	.91	0	.8863	0	.8875
	Features	140	.9	140	.3667	122	.3667	139	.1	136	.3
	Fitness	0	.1153	0	.1186	0	.0929	0	.1169	0	.1156
	Time	61	.5629	65	.6223	46	.9615	51	.2486	67	.8870
0.7	Accuracy	0	.8913	0	.8871	0	.9079	0	.9183	0	.8829
	Features	138	.0667	139	.7	122	.4333	126	.3667	139	.7333
	Fitness	0	.1119	0	.1161	0	.0949	0	.0847	0	.1202
	Time	48	.1014	50	.6427	61	.2871	47	.7112	61	.4752
0.9	Accuracy	0	.8854	0	.8854	0	.9075	0	.8842	0	.9158
	Features	141	.5333	133	.2667	120	.3667	141		131	.7667
	Fitness	0	.1178	0	.1175	0	.0953	0	.1190	0	.0874
	Time	54	.8142	55	.4756	54	.6168	53	.9002	50	.1134

4.3 Solution representation of Feature selection with CBDA

The solution obtained from CBDA is a binary vector of length $N_{\textit{dim}}$ , where $N_{\textit{dim}}$ represents the number of features in the given data set. Each dimension of the solution can either be 1 (representing that the feature is selected) or 0 (representing that the feature is not selected). For a given data set, the number of feasible solutions is $2^{N_{\textit{dim}}}$ , which implies that the solution space can be enormous if this data set contains numerous features, making a complete search impractical. Therefore, each dragonfly is tasked with finding its optimal solution within the solution space during the execution of CBDA. An example of the binary representation of a dragonfly is shown in Fig. 4. If the population size is $N$ , the population of the proposed CBDA can be represented as follows:

$\displaystyle\textit{pop}=\left[\begin{array}[]{c}X_{1}\\ X_{2}\\ \vdots\\ X_{i}\\ \vdots\\ X_{N}\end{array}\right]=\left[\begin{array}[]{cccc}D_{1}^{1}&D_{2}^{1}&\cdots&% D_{N_{\textit{dim}}}^{1}\\ D_{1}^{2}&D_{2}^{2}&\cdots&D_{N_{\textit{dim}}}^{2}\\ \vdots&\vdots&&\vdots\\ D_{1}^{i}&D_{2}^{i}&\cdots&D_{N_{\textit{dim}}}^{i}\\ \vdots&\vdots&&\vdots\\ D_{1}^{N}&D_{2}^{N}&\cdots&D_{N_{\textit{dim}}}^{N}\end{array}\right]$ (19)

where $X_{i}=(D_{1}^{i},D_{2}^{i},D_{3}^{i},\ldots,D_{N_{\textit{dim}}}^{i})$ is the $i$ th population.

Table 4

The setups of the comparative algorithms in the ablation experiments

No.	Algorithm	Chaos	EPD mechanism	$V$ shape transfer function
1	CBDA	$\checkmark$	$\checkmark$	$\checkmark$
2	CBDA-V1	$\times$	$\checkmark$	$\checkmark$
3	CBDA-V2	$\checkmark$	$\times$	$\checkmark$
4	CBDA-V3	$\checkmark$	$\checkmark$	$\times$

Table 5

Fitness function values for various versions of CBDA

Data set	CBDA		CBDA-V1		CBDA-V2		CBDA-V3
	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Arrhythmia	0.3199	0.0123	0.3247	0.0083	0.3383	0.0087	0.3501	0.0062
Breastcancer	0.0272	0	0.0272	0	0.0290	0.0021	0.0274	0.0005
BreastEW	0.0389	0.0001	0.0391	0.0005	0.0436	0.0033	0.0430	0.0016
Dermatology	0.0132	0.0022	0.0129	0.0031	0.0182	0.0036	0.0186	0.0020
Diabets	0.2557	0	0.2557	0	0.2558	0.0002	0.2557	0
Exactly	0.0046	0	0.0046	0	0.0183	0	0.0234	0.0278
Exactly-II	0.2123	0.0063	0.2118	0.0044	0.2234	0.0097	0.2110	0.0037
Glass	0.3071	0	0.3071	0	0.3102	0.0064	0.3071	0
HeartEW	0.1527	0.0060	0.1545	0.0075	0.1643	0.0084	0.1591	0.0061
Hepatitis	0.2690	0.0079	0.2708	0.0094	0.2852	0.0138	0.2878	0.0067
Hillvalley	0.3853	0.0113	0.3868	0.0093	0.3908	0.0120	0.4045	0.0069
Ionosphere	0.0936	0.0134	0.0991	0.0102	0.1130	0.0137	0.1130	0.0137
Krvskp	0.0245	0.0034	0.0258	0.0030	0.0349	0.0068	0.0392	0.0040
Lung	0.0847	0.0199	0.0939	0.0162	0.1038	0.0222	0.1240	0.0098
Lung-cancer	0.0268	0.0009	0.0269	0.0006	0.0306	0.0078	0.0391	0.0120
Lymphography	0.5617	0.0088	0.5623	0.0085	0.5773	0.0168	0.5705	0.0107
M-of-N	0.0046	0	0.0046	0	0.0061	0.0020	0.0136	0.0109
Movementlibras	0.1768	0.0068	0.1771	0.0069	0.1830	0.0079	0.1961	0.0035
Semeion	0.0248	0.0029	0.0264	0.0024	0.0289	0.0020	0.0316	0.0012
Sonar	0.0809	0.0121	0.0851	0.0139	0.1037	0.0130	0.1120	0.0070
Spect	0.2681	0.0098	0.2641	0.0125	0.2826	0.0128	0.2786	0.0081
Tic-Tac-Toe	0.1564	0	0.1564	0	0.1821	0.0281	0.1564	0
Vowel	0.0590	0	0.0590	0	0.0598	0.0016	0.0590	0
WDBC	0.0389	0.0001	0.0395	0.0017	0.0435	0.0037	0.0439	0.0018
Rank (F-test)	1.4167		1.8333		3.4375		3.3125
Wins	16.82		4.82		0		2.32

Table 6

$p$ -values of the Wilcoxon sum-rank test for the fitness values of CBDA versus its ablated versions ( $p\leqslant 0.05$ are significant and shown in bold)

Data set	CBDA-V1	CBDA-V2	CBDA-V3
Arrhythmia	2.1153E-01	4.6836E-08	5.2251E-11
Breastcancer	1.0000E+00	5.0000E-06	4.0085E-02
BreastEW	3.4536E-01	7.9788E-12	4.2808E-11
Dermatology	5.4886E-01	8.3892E-07	7.5513E-10
Diabets	1.0000E+00	3.1731E-01	1.0000E+00
Exactly	1.0000E+00	2.0000E-06	1.3942E-11
Exactly-II	5.5136E-01	6.0000E-06	3.0823E-01
Glass	1.0000E+00	6.2800E-04	1.0000E+00
HeartEW	3.8198E-01	7.2131E-07	2.3900E-04
Hepatitis	4.8633E-01	2.0000E-06	2.0003E-10
Hillvalley	6.0997E-01	1.6015E-01	4.9882E-09
Ionosphere	6.7819E-02	5.0000E-06	4.0000E-06
Krvskp	7.8498E-02	1.2556E-08	4.2855E-11
Lung	9.3295E-02	3.4600E-04	1.3917E-10
Lung-cancer	4.0214E-01	2.0000E-06	1.1741E-10
Lymphography	7.4868E-01	1.1500E-04	3.2700E-04
M-of-N	1.0000E+00	1.3700E-04	5.5824E-10
Movementlibras	9.4695E-01	5.1990E-03	4.2697E-11
Semeion	1.6276E-02	4.1082E-07	5.7660E-11
Sonar	1.8328E-01	7.6695E-08	6.3499E-11
Spect	6.2524E-01	7.0381E-08	4.1022E-08
Tic-Tac-Toe	1.0000E+00	2.6000E-05	1.0000E+00
Vowel	1.0000E+00	1.0575E-02	1.0000E+00
WDBC	1.3060E-02	3.7673E-12	3.7841E-12

Figure 5.

Convergence curves for various versions of CBDA in first 12 data sets.

Figure 6.

Convergence curves for various versions of CBDA in last 12 data sets.

Table 7

Classification accuracies for various versions of CBDA

Data set	CBDA		CBDA-V1		CBDA-V2		CBDA-V3
	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Arrhythmia	0.6812	0.0124	0.6765	0.0084	0.6628	0.0087	0.6530	0.0040
Breastcancer	0.9786	0	0.9786	0	0.9768	0.0021	0.9785	0.0004
BreastEW	0.9614	0	0.9613	0.0003	0.9584	0.0033	0.9598	0.0008
Dermatology	0.9922	0.0022	0.9925	0.0031	0.9876	0.0039	0.9861	0.0033
Diabets	0.7468	0	0.7468	0	0.7468	0.0000	0.7468	0
Exactly	1	0	1.0000	0	0.9867	0.0000	0.9814	0.0332
Exactly-II	0.7869	0.0051	0.7875	0.0033	0.7777	0.0081	0.7880	0.0027
Glass	0.6955	0	0.6955	0.0000	0.6923	0.0069	0.6955	0.0000
HeartEW	0.8500	0.0059	0.8479	0.0077	0.8382	0.0088	0.8483	0.0056
Hepatitis	0.7318	0.0076	0.7298	0.0094	0.7158	0.0138	0.7196	0.0058
Hillvalley	0.6153	0.0114	0.6137	0.0092	0.6097	0.0115	0.5966	0.0046
Ionosphere	0.9078	0.0132	0.9027	0.0099	0.8888	0.0131	0.8832	0.0062
Krvskp	0.9806	0.0038	0.9791	0.0033	0.9705	0.0067	0.9641	0.0041
Lung	0.9183	0.0202	0.9092	0.0164	0.8996	0.0224	0.8867	0.0109
Lung-cancer	0.9750	0	0.9750	0	0.9725	0.0076	0.9725	0.0076
Lymphography	0.4369	0.0091	0.4362	0.0090	0.4211	0.0172	0.4311	0.0111
M-of-N	1	0	1	0	0.9988	0.0017	0.9916	0.0142
Movementlibras	0.8259	0.0069	0.8257	0.0067	0.8197	0.0079	0.8098	0.0047
Semeion	0.9799	0.0028	0.9781	0.0024	0.9757	0.0022	0.9733	0.0009
Sonar	0.9221	0.0120	0.9179	0.0140	0.9000	0.0131	0.8948	0.0100
Spect	0.7337	0.0098	0.7378	0.0125	0.7200	0.0126	0.7251	0.0071
Tic-Tac-Toe	0.8521	0	0.8521	0	0.8250	0.0295	0.8502	0.0103
Vowel	0.9495	0	0.9495	0	0.9486	0.0019	0.9471	0.0027
WDBC	0.9614	0	0.9609	0.0017	0.9585	0.0034	0.9601	0.0008
Rank (F-test)	1.3958		1.8542		3.4167		3.3333
Wins	16.58		5.58		0.25		1.58

Table 8

$p$ -values of the Wilcoxon sum-rank test for the classification accuracies of CBDA versus its ablated versions ( $p\leqslant 0.05$ are significant and shown in bold)

Data set	CBDA-V1	CBDA-V2	CBDA-V3
Arrhythmia	2.3542E-01	7.0694E-08	4.8930E-11
Breastcancer	1.0000E+00	5.0000E-06	4.0085E-02
BreastEW	3.1731E-01	1.0235E-09	6.8781E-09
Dermatology	5.9734E-01	3.0000E-06	3.2406E-08
Diabets	1.0000E+00	1.0000E+00	1.0000E+00
Exactly	1.0000E+00	2.0000E-06	1.3942E-11
Exactly-II	8.9538E-01	1.1000E-05	7.4033E-02
Glass	1.0000E+00	5.3200E-03	1.0000E+00
HeartEW	3.9012E-01	9.2792E-07	1.4600E-04
Hepatitis	4.3324E-01	2.0000E-06	6.0024E-10
Hillvalley	7.0548E-01	1.6374E-01	1.9778E-09
Ionosphere	7.0025E-02	4.0000E-06	2.0000E-06
Krvskp	6.3352E-02	2.9226E-08	5.7065E-11
Lung	8.4177E-02	1.7510E-03	4.5386E-10
Lung-cancer	1.0000E+00	7.8040E-02	1.2300E-04
Lymphography	6.7858E-01	5.6000E-05	8.2500E-04
M-of-N	1.0000E+00	1.3700E-04	5.5824E-10
Movementlibras	9.4640E-01	4.7650E-03	4.1574E-11
Semeion	1.1927E-02	4.5275E-07	1.4767E-10
Sonar	2.6414E-01	1.6002E-07	6.9814E-11
Spect	5.4193E-01	6.8864E-08	7.4584E-08
Tic-Tac-Toe	1.0000E+00	2.6000E-05	1.0000E+00
Vowel	1.0000E+00	1.0563E-02	1.0000E+00
WDBC	4.0174E-02	3.3521E-08	6.4100E-12

Table 9

CPU running time for various versions of CBDA (unit: seconds)

Data set	CBDA		CBDA-V1		CBDA-V2		CBDA-V3
	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Arrhythmia	132.7140	24.1561	123.7801	68.4973	128.7005	47.2411	187.7014	36.1294
Breastcancer	111.1555	12.1363	109.5283	2.2165	111.5254	4.6058	98.5639	4.2633
BreastEW	122.4751	2.5040	109.4492	12.0065	95.1268	5.9881	90.1860	0.5885
Dermatology	64.4641	2.7943	110.6360	31.3739	81.3629	15.6498	71.7093	15.4222
Diabets	127.5080	3.2313	96.8591	7.0329	103.6624	10.6824	116.8891	1.5657
Exactly	136.8022	2.0120	146.0902	9.6531	150.5411	11.4694	154.4473	6.9173
Exactly-II	129.6584	6.8367	125.7781	5.7811	130.2918	13.6554	151.9577	10.5933
Glass	54.7229	2.6522	62.8163	0.8887	63.1975	1.8355	58.7629	0.8570
HeartEW	67.8323	2.2770	63.7590	0.5888	63.6692	1.2262	65.5760	0.5079
Hepatitis	49.7728	0.3034	54.1268	0.4118	53.1787	1.6929	50.0605	0.3482
Hillvalley	87.2180	0.2941	90.6468	4.5637	89.7113	4.4418	89.9374	0.9297
Ionosphere	74.6088	0.7857	73.6253	0.8471	75.3173	3.2480	75.3173	3.2480
Krvskp	671.4306	22.9274	850.9055	99.0412	757.9151	51.5596	774.4277	42.7218
Lung	47.7112	0.3046	49.2094	0.4092	47.6832	0.8284	49.3515	0.3867
Lung-cancer	37.2211	0.7502	36.2836	0.5913	36.6555	1.5731	36.0222	0.1334
Lymphography	48.7474	0.2911	48.3308	0.3119	48.2442	0.5290	48.9554	0.2336
M-of-N	156.0127	16.2207	138.8007	0.9791	141.0121	3.2649	149.5911	3.1781
Movementlibras	64.5774	0.3226	65.6866	0.3177	64.7146	0.3928	66.1732	0.5143
Semeion	306.6203	4.3851	357.1970	60.8150	361.4434	50.2149	350.3256	37.6053
Sonar	57.3624	1.5805	53.9277	0.2632	53.5336	0.3505	53.6316	0.4761
Spect	61.2698	0.7403	60.6576	0.4512	61.2580	1.7826	59.2015	0.2597
Tic-Tac-Toe	175.0410	5.8974	156.8271	7.2825	156.9415	9.2693	145.7773	9.1107
Vowel	120.3700	1.0681	112.3534	1.0494	121.6968	1.9847	112.5386	0.6607
WDBC	90.5595	0.9031	89.4882	0.9403	90.5934	2.7553	90.8353	1.2852
Rank (F-test)	2.5		2.2917		2.5625		2.6458
Wins	8		7		4		5

Table 10

Number of selected features for various versions of CBDA

Data set	CBDA		CBDA-V1		CBDA-V2		CBDA-V3
	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Arrhythmia	118.8	13.8125	122.6	10.6499	122.3	22.3578	117.2333	42.7549
Breastcancer	6	0	6	0	6.0333	0.6687	6	0
BreastEW	2.2	0.4068	2.3667	0.6687	7.1	2.0736	10.2	2.9054
Dermatology	18.6	2.5134	18.5667	2.2695	20.2	3.7637	21.5	2.7637
Diabets	4	0	4	0	4.0333	0.1826	4	0
Exactly	6	0	6	0	6.6667	0.6609	7.1	0.48066
Exactly-II	1.7333	1.6595	1.8333	1.4875	4.3333	2.8567	1.3	0.9879
Glass	5	0	5	0	5	0.5252	5	0
HeartEW	5.4333	1.1943	5.1	0.8847	5.3667	1.3257	5.3333	1.4700
Hepatitis	6.5	1.0748	6.2	1.0954	7.3333	1.0283	9.2	1.6060
Hillvalley	44.3	4.3004	44.2667	5.0305	43.9667	8.9923	30.8333	22.2727
Ionosphere	7.6667	2.2489	9.2333	2.9088	9.7667	3.2022	9.7667	3.2022
Krvskp	19	2.6392	18.5333	2.0126	20.3	2.6801	20.7	2.4090
Lung	126.3667	17.4780	128.6	13.4923	142.6333	18.8268	155.5	40.5520
Lung-cancer	11.6333	5.0205	12.1667	3.1741	19.1	4.9852	24.8333	5.6268
Lymphography	7.6667	1.6884	7.5	1.9073	7.5667	1.8323	8.8	1.5844
M-of-N	6	0	6	0	6.4	0.4983	7.0667	0.6397
Movementlibras	39.9667	4.1893	41.3333	5.1215	40.5	6.0898	47.4667	6.8919
Semeion	129.7	10.7034	127.4667	10.2107	129.7	18.0137	143.0667	19.3122
Sonar	22.2333	4.5081	23.1667	4.3078	28.1	5.6223	29.1333	6.3449
Spect	9.9	1.9538	9.9	1.4227	11.8333	2.4925	12.0333	2.0254
Tic-Tac-Toe	9	0	9	0	8.0333	1.0662	9	0
Vowel	9	0	9	0	8.9333	0.6397	9	0
WDBC	2.1333	0.3457	2.5	0.6823	7.5	1.9608	10.7333	3.4435
Rank (F-test)	1.9792		2		2.8125		3.2083
Wins	9.41		8.41		2.25		3.91

Table 11

The instructions of the comparative algorithms in this paper

Algorithm	Expansions	Year	Instructions	Parameter settings
IBDA	Improved binary dragonfly algorithm	2020	IBDA introduces a novel evolutionary population dynamics (EPD) mechanism, an adaptive crossover (AC) factor and a binary strategy to improve the performance of conventional DA and make it more suitable for feature selection.	$w=$ [0.6, 0.1], $s=$ [0.2, 0], $a=$ [0.2, 0], $c=$ [0.2, 0], $f=$ [0.2, 0], $e=$ [0, 0.1]
BDA	Binary dragonfly algorithm	2018	BDA integrates eight transfer functions into DA to leverage the impact of the step vector on balancing exploration and exploitation. Therefore, BDA can be regarded as the fundamental algorithm for CBDA and IBDA. In this paper, we adopt the time-varying S-shaped transfer function that perform best in their research to conduct the numberical experiments.	$w=$ [0.9, 0.4], $s=$ [0.2, 0], $a=$ [0.2, 0], $c=$ [0.2, 0], $f=$ [0.2, 0], $e=$ [0, 0.1]
BWO	Beluga whale optimization	2022	BWO is inspired from the behaviors of beluga whales in nature that contains three phases of search, which are exploration, exploitation and whale fall, respectively. In addition to notice that we employ the $S$ -shape transfer function (the same as Eq. (20)) in BWO to enable it solve the discrete feature selection problem.	$W_{f}=$ [0.1, 0.05]
BES	Bald eagle search	2020	BES mimics the hunting strategy or intelligent social behaviour of bald eagles as they search for fish. BES can be divided into three stages, which are selecting place, searching in space and swooping phase. In order to make BES be suitable for solving feature selection problem, we collectively integrate $S$ -shape transfer function (the same as Eq. (20)) into the three stages of BES.	$d=$ 10, $v=$ 2, $G=$ 1.5, $m_{1}=m_{2}=$ 2
CNN-CNN	Dual Convolutional Neural Network	2023	CNN-CNN is a combination of two convolutional neural networks (CNN-CNN), wherein the first CNN model is leveraged to select the significant features that contribute to the certain optimized problem. The second CNN utilizes the features identified by the first CNN to build a robust detection model.	$lr=$ 0.01 (learning rate), $bs=$ 64 (batch size)

Table 12

Fitness function values for CBDA versus other optimizers

Data set	CBDA		IBDA		BDA		BWO		BES		CNN-CNN
	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Arrhythmia	0.320	0.012	0.321	0.010	0.334	0.013	0.348	0.004	0.349	0.004	0.344	0.009
Breastcancer	0.027	0	0.027	0	0.029	0.002	0.027	0.000	0.027	0.001	0.027	0
BreastEW	0.039	1E-04	0.039	4E-04	0.042	0.003	0.043	1E-03	0.043	1E-03	0.041	0.003
Dermatology	0.013	0.002	0.014	0.003	0.019	0.006	0.019	0.003	0.018	0.003	0.013	0.003
Diabets	0.256	0	0.256	0	0.258	0.004	0.256	0	0.256	0	0.256	0
Exactly	0.005	0	0.005	0	0.049	0.107	0.024	0.033	0.009	0.003	0.005	0
Exactly-II	0.212	0.006	0.213	0.004	0.215	0.008	0.211	0.003	0.214	0.006	0.212	0.005
Glass	0.307	0	0.307	0.002	0.313	0.009	0.307	0.000	0.307	2.0E-04	0.307	0
HeartEW	0.153	0.006	0.153	0.007	0.163	0.010	0.154	0.005	0.155	0.006	0.153	0.006
Hepatitis	0.269	0.008	0.270	0.009	0.287	0.015	0.282	0.006	0.285	0.008	0.272	0.009
Hillvalley	0.385	0.011	0.378	0.010	0.391	0.013	0.404	0.005	0.408	0.004	0.390	0.009
Ionosphere	0.094	0.013	0.095	0.014	0.105	0.014	0.118	0.006	0.125	0.006	0.108	0.014
Krvskp	0.024	0.003	0.024	0.003	0.040	0.011	0.041	0.004	0.037	0.003	0.028	0.003
Lung	0.085	0.020	0.090	0.024	0.106	0.022	0.117	0.011	0.121	0.010	0.130	0.014
Lung-cancer	0.027	0.001	0.026	0.001	0.031	0.009	0.031	0.007	0.032	0.007	0.028	0.005
Lymphography	0.562	0.009	0.565	0.011	0.575	0.018	0.568	0.011	0.568	0.006	0.565	0.011
M-of-N	0.005	0	0.005	0	0.015	0.039	0.014	0.015	0.007	0.002	0.005	0
Movementlibras	0.177	0.007	0.174	0.009	0.185	0.008	0.193	0.005	0.193	0.004	0.182	0.006
Semeion	0.025	0.003	0.025	0.002	0.028	0.004	0.031	0.001	0.031	0.001	0.029	0.002
Sonar	0.081	0.012	0.083	0.010	0.105	0.018	0.108	0.010	0.108	0.007	0.093	0.013
Spect	0.268	0.010	0.262	0.009	0.282	0.012	0.277	0.007	0.274	0.008	0.267	0.009
Tic-Tac-Toe	0.156	0	0.156	0	0.196	0.029	0.158	0.010	0.156	0	0.156	0
Vowel	0.059	0	0.059	0	0.063	0.007	0.061	0.002	0.059	0.001	0.059	0
WDBC	0.039	1E-04	0.039	6E-04	0.042	0.004	0.043	9E-04	0.044	2E-03	0.042	4E-03
Rank (F-test)	1.75		2.1875		4.9167		4.6458		4.7083		2.7917
Wins	11.5		6		0		1.5		0		5

Table 13

$p$ -values of the Wilcoxon sum-rank test for the fitness values of CBDA versus other optimizers ( $p\leqslant 0.05$ are significant and shown in bold)

Data set	IBDA	BDA	BWO	BES	CNN-CNN
Arrhythmia	9.5873E-01	2.3200E-04	3.1732E-11	2.8701E-11	9.3085E-10
Breastcancer	1.0000E+00	3.1576E-07	1.5377E-01	2.0676E-02	1.0000E+00
BreastEW	1.5287E-01	6.1551E-09	5.5169E-12	5.4535E-12	1.0000E-06
Dermatology	5.9914E-01	2.0000E-06	2.1499E-08	2.1499E-08	6.3555E-01
Diabets	1.0000E+00	2.9800E-04	1.0000E+00	1.0000E+00	1.0000E+00
Exactly	1.0000E+00	1.3900E-04	1.1312E-07	1.4260E-08	1.0000E+00
Exactly-II	2.5360E-01	2.0814E-01	4.5880E-01	4.0717E-02	6.7568E-01
Glass	3.1731E-01	2.7000E-05	1.0000E+00	1.0000E+00	1.0000E+00
HeartEW	8.6208E-01	6.4000E-05	7.3557E-02	1.6384E-02	9.9271E-01
Hepatitis	8.6148E-01	2.0000E-06	9.9286E-08	2.9871E-08	1.0278E-01
Hillvalley	1.1951E-02	1.3731E-01	1.9504E-09	1.4600E-10	8.7697E-02
Ionosphere	9.4106E-01	1.4030E-03	2.2137E-09	2.1384E-10	3.0900E-04
Krvskp	6.0481E-01	1.1202E-09	3.1732E-11	2.8683E-11	1.0850E-03
Lung	3.9935E-01	2.5600E-03	1.3441E-09	1.3885E-10	1.3876E-10
Lung-cancer	7.0900E-04	2.1280E-02	5.7634E-09	8.9452E-11	1.6026E-02
Lymphography	3.2570E-01	3.4700E-04	2.1700E-03	5.8400E-04	1.2050E-01
M-of-N	1.0000E+00	2.6650E-03	4.4164E-08	1.5400E-09	1.0000E+00
Movementlibras	4.4197E-01	3.0900E-04	5.2673E-10	1.0366E-10	1.3541E-02
Semeion	6.1000E-01	2.4600E-04	8.5323E-11	2.4794E-10	8.5075E-07
Sonar	4.5520E-01	2.0000E-06	8.4600E-10	9.3944E-11	6.9018E-04
Spect	2.4541E-02	5.2677E-08	1.6466E-07	4.0887E-06	6.2259E-02
Tic-Tac-Toe	1.0000E+00	1.2224E-07	3.1731E-01	1.0000E+00	1.0000E+00
Vowel	1.0000E+00	6.2000E-05	5.9000E-05	1.0000E+00	1.0000E+00
WDBC	2.4477E-01	3.3495E-11	3.5514E-12	3.7115E-12	1.4647E-08

Figure 7.

Convergence curves for CBDA and other optimizers in first 12 data sets.

Figure 8.

Convergence curves for CBDA and other optimizers in last 12 data sets.

5. Experiments

This section showcases the outcomes of CBDA on various data sets. Initially, we describe the data sets employed in this study. Subsequently, we elaborate on the experimental configuration. Lastly, we present and scrutinize the experimental findings.

5.1 Benchmark data sets

In this study, we evaluate the effectiveness of CBDA by conducting experiments on 24 data sets obtained from the UCI data repository [66]. The specifics of these data sets are outlined in Table 1.

5.2 Parameter tuning and experiment setups

5.2.1 Parameter tuning

As discussed in Subsection 4.1.1, the values of $s$ , $a$ , $f$ , $c$ , and $e$ in Eq. (11) are determined by $\tau$ using a chaotic sequence. Therefore, we now turn our attention to discussing $\psi$ in Eq. (12). Since there are 24 data sets are adopted as benchmarks in this study, it is arduous to calibrate $\psi$ for each data set, and the optimal value of $\psi$ for one data set may not be applicable to another. In [40], the authors selected a standard data set with a median number of features (Lung-cancer) to tune $\psi$ . Inspired by their work and without losing generality, we select three data sets with median number of features (i.e., Hillvalley, Movementlibras and Sonar) in Table 1 to determine the value of $\psi$ in the preliminary experiments. It should be noticed that $\psi$ varies from 0.5 (more precisely, $\psi$ should be greater than 0.5) to 1 according to its definition in Eq. (12), thus we set $\psi$ start from 0.6 and the step size is 0.1. In order to reduce the randomness, the experiments are conducted in 30 times and the average results of fitness value, classification accuracy, the number of selected features and CPU running time are manifested in Table 2. As can been seen from the obtained results, CBDA achieves the best on the most indicators when $\psi=0.6$ . Therefore, we apply this value to all data sets in the comparative experiments.

The determination of crucial parameters, such as the initial value of the chaotic sequence and the value of $\lambda$ in the Tent map, requires careful consideration. It is widely acknowledged that the performance of chaos is closely tied to its coefficient and initial value. To address this, we conducted extensive experiments to fine-tune the parameters $\lambda$ and $\tau[0]$ in this particular aspect. We set the initial values of these variables to 0.1 and used a step vector of 0.2, resulting in 25 different combinations of values. In our preliminary experiments, we selected the Lung data set, which has a moderate number of selected features (326 features and 72 instances), to jointly optimize $\lambda$ and $\tau[0]$ . The results obtained from this tuning process are presented in Table 3. Based on the results, apart from the number of selected features, CBDA achieved optimal performance when $\lambda=0.7$ and $\tau[0]=0.7$ , as indicated in Table 3. This suggests that these specific values of the chaotic map are suitable for the proposed algorithm.

5.2.2 Experiment setups

Python 3.10 was utilized to implement the proposed CBDA algorithm, and the experiments were performed on a computer equipped with an 8th generation Intel processor (i5-8500U) and a 16 GB RAM. As mentioned in Section 3.3, the $K$ -NN method, with $K$ set to 5, was utilized as the classifier. In the subsequent experiments, the complete data set is divided into three subsets: a learning set for model learning and training, a validation set for feature selection evaluation, and a separate test set for final model evaluation on completely unseen data. The learning set and validation set encompasses 80% of the complete data sets, and the split ratio of learning set and validation set is 4:1. This partitioning method aligns with the methodology described in a prior study [67]. During each iteration, the classifier is trained using the feature subset derived from the learning set, and its performance is evaluated on the validation set. Additionally, $\alpha$ and $\beta$ in the fitness function illustrated in Eq. (9) were set to 0.99 and 0.01, respectively, which is consistent with most of the previous relevant studies, such as [29, 30, 31].

Moreover, all of these algorithms were configured with the same values of $\alpha$ , $\beta$ , and employed the $K$ -NN method. To ensure impartial comparison, each algorithm was executed 30 times on the same 24 benchmark data sets to minimize the random fluctuations. Additionally, the population size was set to 24 (with the solution dimension equivalent to the number of features in each data set), and the maximum number of iterations was set to 100 for all methods.

5.3 Experiment results

This section showcases the feature selection outcomes achieved by various algorithms, accompanied by corresponding analyses. The optimal values attained by an algorithm in a variety of metrics are emphasized in bold typeface.

5.3.1 Evaluation metrics

The fundamental metrics utilized in the outcomes obtained from various experiments are the average value (AVG) and standard deviation (STD). Additionally, two non-parametric statistical tests, Wilcoxon rank-sum and Friedman, are conducted at a 5% significance level for each algorithm to demonstrate the significance of the results obtained in this study.

Furthermore, the cumulative scores of the four metrics are also used to represent a more equitable distribution of wins for each method. The term “win” is employed to count the number of methods that achieve the best outcomes, and this “win” count is normalized to “1” for all comparative methods within each data set. Consequently, a win score of 1 is assigned to a method when it is the sole winner. On the other hand, if multiple methods tie for the best result, the “1 win” is divided among all tied methods, with each method receiving a fractional win score inversely proportional to the number of tied methods. For instance, if five methods all achieve the best result in one data set, each winner would receive a win score of 0.2, and so on. Therefore, the terms “wins” represents the sum of these win scores across the entire data set for each method.

5.3.2 The results of ablation experiments of CBDA

In this experimental section, we conduct ablation experiments to compare the distinct impacts of various enhanced factors, including the chaotic map, EPD mechanism, and binarization strategy, on the algorithm. The purpose of these experiments is to analyze and understand the individual contributions of these factors to the overall performance.

In this section, we consider four comparative algorithms: IBDA-V1, IBDA-V2, and IBDA-V3 to conduct the comprehensive experiments with CBDA. These algorithms incorporate different enhanced factors, which have been introduced to improve their performance. The details of these algorithms are brought out in Table 4.

In CBDA-V3, we utilize the Sigmoid transfer function (referred to as the “ $S$ -shape” function) to facilitate the mapping of continuous solution spaces into binary ones. This allows us to evaluate the effectiveness of the proposed binarization strategy, which employs the $V$ -shape transfer function. It should be noted that apart from the difference in transfer rate, all other aspects of the $S$ -shape function used in the proposed methods remain the same as the $V$ -shape function described in Section 4.1.3. The mathematical representation of the transfer rate of the widely used Sigmoid function is as follows [68]:

$\displaystyle P(\Delta{X}_{t})=\frac{1}{1+e^{-\Delta{X}_{t}}}$ (20)

Table 5 presents the fitness function values of CBDA compared to its ablated versions, namely CBDA-V1, CBDA-V2, and CBDA-V3. CBDA demonstrates a superior performance compared to its ablated counterparts, with the fitness values ranked as follows: CBDA $>$ CBDA-V1 $>$ CBDA-V3 $>$ CBDA-V2. CBDA achieves optimal average fitness values in 21 instances, surpassing the other methods. CBDA-V1 achieves optimal results on 9 data sets, followed by CBDA-V3 with 5 data sets. Unfortunately, CBDA-V2 does not achieve any optimal results. Moreover, CBDA exhibits stability in terms of standard deviation, achieving optimal results on 10 data sets. The wins and F-test statistic results for average fitness values of CBDA are both ranked first, providing further evidence of CBDA’s superiority. Based on these experimental findings, it can be concluded that the combined use of chaos, EPD, and the $V$ -shape transfer function enables CBDA to more effectively obtain optimal solutions. Specifically, chaos allows CBDA to explore a wider solution space, while the mutation and crossover operators help eliminate poor-quality solutions. Additionally, the $V$ -shape transfer function of CBDA effectively transforms continuous solutions into binary form, outperforming the Sigmoid function. Overall, CBDA overcomes the primary limitation of getting trapped in local optima better than the comparative algorithms, enhancing its ability to balance between exploration and exploitation.

Table 6 illustrates the significance obtained by Wilcoxon sum-rank test for CBDA against its ablated versions. Intuitively, CBDA outperforms CBDA-V1 and CBDA-V2 on almost all the data sets. Specifically, for CBDA-V2, CBDA shows its superiority on 22 data sets, except the data sets named Diabets and Hillvalley. For CBDA-V3, CBDA shows a better significance on 19 data sets. Theses results imply that the EPD mechanism and $V$ -shape transfer function play a crucial role inside the proposed algorithm. As for CBDA-V1, it exhibits the same significance compared to CBDA on majority of the data sets because it incorporate the EPD mechanism and $V$ -shape transfer function, thus it also a candidate effective method that can tackle the proposed feature selection problem well.

The convergence curves obtained from CBDA and its ablated versions are presented in Figs 5 and 6. It is important to note that these curves correspond to the 15th test of each experiment. Upon visual inspection, it is evident that CBDA exhibits a faster convergence rate on 18 data sets. Specifically, CBDA achieves optimal convergence on data sets such as Arrhythmia, Hepatitis, Ionosphere, Lung, and Tic-Tac-Toe. Additionally, CBDA-V1 demonstrates exclusive optimal convergence on 5 data sets, namely Dermatology, Hillvalley, Movementlibras, Semeion, and Spect. However, CBDA-V2 and V3 fail to achieve any minimum convergence across the 24 data sets when compared to the other algorithms. These observations indicate that the utilization of chaos in the proposed algorithm may expedite the search for optimal solutions more effectively than the other two improvement factors in CBDA.

The average classification accuracies achieved by different versions of CBDA are presented in Table 7. The distribution of results aligns with the fitness values shown in Table 5. CBDA demonstrates significantly superior performance compared to its counterparts, with a total of 16.58 wins, accounting for 69.08% of the total number of data sets. CBDA-V1 ranks second with 5.58 wins (23.25%), and the $S$ -shaped CBDA achieves higher wins compared to the version without EPD. Additionally, CBDA-V1 outperforms CBDA-V3 on 8 data sets, indicating that the $V$ -shape transfer function is more suitable than the $S$ -shape function in the proposed algorithm. Regarding the standard deviation (STD) results, CBDA-V3 ranks first on 13 data sets, followed by CBDA with 11 data sets. This indicates that our proposed CBDA exhibits stable performance during the solution-searching process. The F-test results further support the superiority of CBDA over the other algorithms, therefore CBDA is capable of achieving higher-quality solutions compared to its ablated versions.

To assess the significance of the classification accuracies obtained by CBDA in comparison to other competing methods, we perform the Wilcoxon sum-rank test shown in Table 8. The results indicate that CBDA exhibits significant advantages over CBDA-V2 and CBDA-V3. This demonstrates the effectiveness and rationality of the optimization strategy (more specifically, it refers to the EPD mechanism and $V$ -shape transfer function) employed in the proposed algorithm. As for CBDA-V1, CBDA shows better significance on two data sets named Semeion and WDBC, thus the chaos utilization also takes effect in fact.

Table 9 presents the average computational time of CBDA and its various versions. CBDA demonstrates the lowest CPU time consumption on 8 data sets, achieving the highest number of wins. CBDA-V1 follows closely with 7 data sets. In terms of standard deviation (STD) results, CBDA maintains a leading position among the four algorithms, achieving the minimum value on 11 data sets. CBDA-V3 performs well on 8 data sets and ranks second in this aspect. Overall, CBDA exhibits the characteristics of low time complexity and good stability. The F-test statistic further supports our argument in favor of CBDA’s superiority in this regard.

The final comparison between CBDA and its ablated versions focuses on the number of selected features and the corresponding results are presented in Table 10. It is widely acknowledged that a higher number of features theoretically leads to better classification accuracy. Therefore, it is challenging to achieve a satisfactory classification accuracy while maintaining a lower number of selected features. However, CBDA consistently selects fewer features compared to other methods, achieving a remarkable 9.41 wins in the overall comparison, which ranks it first. Following closely, CBDA-V1 ranks second with 8.41 wins. The analysis of standard deviations for CBDA demonstrates that it is a relatively stable algorithm across various data sets. Additionally, the F-test results for CBDA show that it consistently achieves the highest ranking, suggesting that CBDA is capable of obtaining statistically optimal solutions in terms of selected features. These findings indicate that CBDA effectively selects a reasonable number of features while maintaining superior classification accuracy. In other words, the improved features of CBDA synergize well, enabling it to achieve high efficiency. Consequently, CBDA can be considered as a promising approach to address the proposed feature selection problem.

5.3.3 The results of CBDA versus other well-known optimizers

In this research, we compared the performance of CBDA with five cutting-edge optimizers: IBDA [40], BDA [69], BWO [70], BES [71], and CNN-CNN [72]. The detailed instructions of these optimization algorithms are listed in Table 11.

Table 12 exhibits the comparative numerical outcomes of fitness values attained by CBDA and other comparative algorithms on each data set. CBDA attains the optimal average fitness value on 13 data sets (the number of wins is 11.5, accounting for near half of the total wins), followed by IBDA with 7 data sets (the number of wins is 6, accounting for 25% of the total wins) and CNN-CNN with 5 data sets (the number of wins is also 5). The STD outcomes suggest that CBDA performs satisfactorily on most data sets because it reached to the optimal on 10 data sets and ranked first among the algorithms. The F-test statistic conducted on average fitness values shows that CBDA achieves the best rank compared to other algorithms, which supports the fact that CBDA performs better in terms of fitness value.

Table 13 displays the $p$ -values of the Wilcoxon sum-rank test for the fitness values on each data set. The results indicate that the fitness value obtained by each iteration in CBDA is significantly better than BDA, BWO, and BES on most data sets. Additionally, CBDA slightly outperforms IBDA on three data sets, namely Hillvalley, Lung-cancer, and Spect. This is because both algorithms are derived from DA and employ similar improved factors, but CBDA is more adept in handling certain data sets. Moreover, CBDA achieves better significance than CNN-CNN on 10 data sets, which demonstrates that the neural network method is also proficient in resolving the proposed feature selection problem. The results presented in Tables 12 and 13 highlight the effectiveness of the improved factors introduced in CBDA, such as the chaotic map, EPD_ERS, and binarization strategy.

Figures 7 and 8 illustrate the convergence curves of CBDA and other algorithms during their execution. It should be noted that we used data from the middle of the running times, specifically the 15th test, as the data source for each curve. As can be observed from these figures, CBDA achieves a faster or equivalent convergence rate compared to other algorithms on 17 data sets. In particular, CBDA achieves the minimum on 4 data sets, which are named Arrhythmia, Hepatitis, Krvskp and Sonar, respectively. These results confirm that the introduced improved factors can enhance CBDA’s ability to converge rapidly to optimal solutions.

Table 14 illustrates the classification accuracies attained by CBDA versus other algorithms. The numerical outcomes are akin to those in Table 12. It clearly shows that CBDA achieves the highest average accuracy on 17 data sets and get a high sore of wins, which is 12.52 in 30 times of experiments, which is notably superior to other algorithms. IBDA is ranked second with 6.27 wins. Moreover, the STDs of accuracies obtained by CBDA are the best on 11 data sets, ranking second among the six algorithms. The F-test outcomes of average accuracy suggest that CBDA outperforms the comparative algorithms. Consequently, CBDA is a competitive algorithm in terms of fitness value and classification accuracy.

As the comparative results obtained by the ablation experiments in Section 5.3.2, the main reason of the superiority of CBDA can be primarily attributed to and the EPD mechanism and $V$ -shape transfer function employed in CBDA, which can expand the possibility of searching more high-quality solutions. Additionally, the chaos integrated into the searching stage of the algorithm will broad the solution space, which contribute to balance the exploitation and exploration capability of CBDA further. Therefore, CBDA is capable of achieving the competitive results among the comparative algorithms.

Consequently, CBDA has a higher likelihood of discovering optimal solutions in comparison to other algorithms. We also employed the Wilcoxon sum-rank test for classification accuracy as a statistical analytical method to further verify the outcomes. Table 15 corroborates the aforementioned arguments. It can be observed that CBDA significantly outperforms BDA, BWO and BES on most data sets. Furthermore, CBDA surpasses IBDA on four data sets, namely Exactly, Hillvalley, Movementlibras, and Spect.

Table 14
Classification accuracies for CBDA versus other optimizers

Data set	CBDA		IBDA		BDA		BWO		BES		CNN-CNN
	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Arrhythmia	0.681	0.012	0.680	0.010	0.666	0.013	0.653	0.004	0.653	0.004	0.657	0.009
Breastcancer	0.979	0	0.979	0	0.976	0.003	0.978	3.6E-04	0.978	0.001	0.979	0
BreastEW	0.961	0	0.961	3.2E-04	0.960	0.003	0.960	0.001	0.960	0.001	0.960	0.003
Dermatology	0.992	0.002	0.992	0.003	0.986	0.007	0.986	0.003	0.987	0.003	0.992	0.003
Diabets	0.747	0	0.747	0	0.745	0.004	0.747	0	0.747	0.000	0.747	0
Exactly	1	0	1.000	0	0.956	0.108	0.981	0.033	0.996	0.003	1	0
Exactly-II	0.787	0.005	0.787	0.003	0.784	0.007	0.788	0.003	0.785	0.006	0.788	0.004
Glass	0.695	0	0.695	0.002	0.689	0.010	0.695	0.000	0.695	0.000	0.695	0
HeartEW	0.850	0.006	0.850	0.006	0.840	0.011	0.848	0.006	0.847	0.005	0.850	0.006
Hepatitis	0.732	0.008	0.730	0.009	0.713	0.016	0.720	0.006	0.716	0.008	0.728	0.010
Hillvalley	0.615	0.011	0.622	0.010	0.608	0.013	0.597	0.005	0.593	0.004	0.611	0.009
Ionosphere	0.908	0.013	0.907	0.014	0.896	0.013	0.883	0.006	0.877	0.006	0.894	0.013
Krvskp	0.981	0.004	0.981	0.004	0.964	0.011	0.964	0.004	0.968	0.000	0.978	0.003
Lung	0.918	0.020	0.913	0.024	0.895	0.022	0.887	0.011	0.883	0.010	0.873	0.014
Lung-cancer	0.975	0	0.975	0	0.972	0.009	0.973	0.008	0.973	0.008	0.974	0.005
Lymphography	0.437	0.009	0.434	0.011	0.423	0.019	0.431	0.011	0.430	0.006	0.434	0.011
M-of-N	1	0	1	0	0.989	0.039	0.992	0.014	0.998	0.001	1	0
Movementlibras	0.826	0.007	0.828	0.009	0.817	0.008	0.810	0.005	0.810	0.004	0.821	0.007
Semeion	0.980	0.003	0.980	0.002	0.976	0.004	0.973	0.001	0.974	0	0.976	0.002
Sonar	0.922	0.012	0.919	0.010	0.898	0.018	0.895	0.010	0.896	0	0.910	0.014
Spect	0.734	0.010	0.740	0.009	0.719	0.013	0.725	0.007	0.728	0.008	0.736	0.009
Tic-Tac-Toe	0.852	0	0.852	0	0.810	0.031	0.850	0.010	0.852	0	0.852	0
Vowel	0.949	0	0.949	0	0.945	0.008	0.947	0.003	0.949	0.001	0.949	0
WDBC	0.961	0	0.961	0.000	0.959	0.004	0.960	0.001	0.959	0.001	0.959	0.003
Rank (F-test)	1.7292		2.2083		5.25		4.5		4.4792		2.8333
Wins	12.52		6.27		0		0.45		0.7		2.02

Table 15

$p$ -values of the Wilcoxon sum-rank test for the classification accuracies of CBDA versus other optimizers ( $p\leqslant 0.05$ are significant and shown in bold)

Data set	IBDA	BDA	BWO	BES	CNN-CNN
Arrhythmia	8.9979E-01	5.1000E-05	2.8176E-11	2.6093E-11	1.4146E-09
Breastcancer	1.0000E+00	3.1297E-07	1.5377E-01	2.0676E-02	1.0000E+00
BreastEW	3.1731E-01	2.6000E-05	1.3467E-10	4.4178E-10	2.6550E-03
Dermatology	4.9935E-01	2.4279E-07	1.3057E-08	4.0496E-08	6.0970E-01
Diabets	1.0000E+00	1.0587E-02	1.0000E+00	1.0000E+00	1.0000E+00
Exactly	1.0456E-02	3.2422E-01	1.1312E-07	1.4260E-08	1.0000E+00
Exactly-II	1.8190E-01	1.2923E-01	6.0885E-01	7.4033E-02	9.0930E-01
Glass	3.1731E-01	6.2000E-05	1.0000E+00	1.0000E+00	1.0000E+00
HeartEW	9.3434E-01	6.8000E-05	5.2697E-02	7.3680E-03	9.6351E-01
Hepatitis	6.1958E-01	6.2000E-05	6.7446E-08	2.8332E-08	1.6212E-01
Hillvalley	1.3151E-02	4.4830E-02	1.9504E-09	1.2208E-10	1.2304E-01
Ionosphere	9.7636E-01	1.0820E-03	1.9374E-09	2.1773E-10	3.9700E-04
Krvskp	5.3432E-01	1.5305E-09	4.0245E-11	4.4218E-11	1.2930E-03
Lung	3.2240E-01	2.6100E-04	1.8101E-08	1.7544E-09	4.0253E-10
Lung-cancer	1.0000E+00	4.0085E-02	7.8040E-02	7.8040E-02	3.1731E-01
Lymphography	2.3733E-01	4.1800E-04	5.9420E-03	1.2000E-03	1.4852E-01
M-of-N	1.0000E+00	1.3040E-03	4.4164E-08	1.5400E-09	1.0000E+00
Movementlibras	7.4487E-07	6.6000E-05	3.7130E-10	1.8282E-10	8.5020E-03
Semeion	7.4951E-01	7.0000E-06	7.4012E-11	2.1731E-10	4.8277E-07
Sonar	4.1234E-01	2.0000E-06	9.8429E-10	1.4254E-10	1.3796E-03
Spect	1.6170E-02	4.2293E-08	1.4884E-07	5.5399E-06	6.3306E-02
Tic-Tac-Toe	1.0000E+00	1.2224E-07	3.1731E-01	1.0000E+00	1.0000E+00
Vowel	1.0000E+00	6.2000E-05	5.6000E-05	1.0000E+00	1.0000E+00
WDBC	1.0000E+00	1.2000E-05	1.3951E-09	8.8227E-12	1.3610E-04

Table 16 exhibits the running time in seconds for all algorithms. It is evident that CBDA delivers commendable computing performance on six data sets. Although it is not the swiftest, the difference is not noteworthy, which is also substantiated by the F-test statistic. Consequently, we can infer that the enhanced factors in CBDA do not necessitate much supplementary computing time in comparison to conventional ones, and the overall overhead of CBDA is reasonable.

Table 16

CPU running time for CBDA versus other optimizers (unit: seconds)

Data set	CBDA		IBDA		BDA		BWO		BES		CNN-CNN
	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Arrhythmia	132.7	24.2	133.0	24.0	111.8	23.7	179.4	0.5	116.2	0.5	126.1	2.9
Breastcancer	111.2	12.1	110.5	11.8	112.1	3.2	183.8	3.2	95.6	0.5	106.0	4.0
BreastEW	122.5	2.5	122.4	2.8	103.9	7.7	199.1	8.9	96.5	0.9	144.9	15.7
Dermatology	64.5	2.8	65.3	2.2	248.6	169.4	132.7	1.8	66.4	0.3	287.6	167.7
Diabets	127.5	3.2	127.0	3.0	119.7	20.6	185.8	0.7	100.3	1.5	107.9	5.1
Exactly	136.8	2.0	136.3	1.5	140.0	5.5	267.3	1.3	142.9	0.8	163.2	33.3
Exactly-II	129.7	6.8	132.8	4.2	126.7	10.8	272.6	1.4	140.6	0.8	139.9	4.0
Glass	54.7	2.7	55.0	4.2	59.3	4.3	102.0	0.2	53.8	1.1	59.9	1.3
HeartEW	67.8	2.3	68.0	2.2	68.4	1.6	123.6	1.2	63.6	0.2	119.1	6.4
Hepatitis	49.8	0.3	49.5	0.2	55.8	5.1	97.0	0.3	52.0	0.8	56.6	1.1
Hillvalley	87.2	0.3	87.6	0.2	191.4	213.3	172.4	0.4	105.2	3.3	150.6	15.2
Ionosphere	74.6	0.8	72.9	1.6	80.2	2.5	145.1	3.1	77.7	6.1	103.1	10.3
Krvskp	671.4	22.9	671.1	17.8	886.6	113.5	1236.5	3.5	635.5	9.2	983.2	79.7
Lung	47.7	0.3	48.7	0.3	53.0	1.7	112.3	0.6	82.8	0.5	99.2	21.6
Lung-cancer	37.2	0.8	37.8	0.7	41.2	1.7	71.8	0.4	41.0	0.1	38.9	0.5
Lymphography	48.7	0.3	48.6	0.4	52.3	0.7	96.5	0.1	48.9	0.0	56.8	2.6
M-of-N	156.0	16.2	155.3	15.8	138.9	5.9	268.9	2.2	143.3	0.4	212.2	45.2
Movementlibras	64.6	0.3	64.8	0.3	80.8	25.8	126.8	2.5	71.7	0.2	118.4	8.8
Semeion	306.6	4.4	307.1	5.3	789.5	304.8	556.9	0.6	315.7	1.0	396.2	69.0
Sonar	57.4	1.6	52.8	1.3	139.9	31.2	101.5	1.7	57.3	0.1	132.8	25.4
Spect	61.3	0.7	61.2	0.6	68.5	2.3	113.0	0.5	61.4	0.1	79.1	2.6
Tic-Tac-Toe	175.0	5.9	169.7	6.5	97.8	7.5	216.1	1.0	120.5	0.7	136.1	3.9
Vowel	120.4	1.1	117.6	1.6	133.1	5.9	202.3	1.1	109.9	0.7	145.2	7.8
WDBC	90.6	0.9	90.3	1.0	141.6	43.9	177.6	1.0	91.7	0.4	148.2	19.7
Rank (F-test)	2.4583		2.25		3.6667		5.75		4.4583		4.4167
Wins	6		7		4		0		7		0

Moreover, the results in Table 17 reveal the average number of selected features obtained by the proposed CBDA and other optimizers. However, CBDA is not dominant on most data sets, as selected feature numbers and classification accuracies are not consistently correlated. Although CBDA has the best classification accuracies in Table 14, its selected feature numbers are often higher than other algorithms. Nonetheless, CBDA’s stability is superior to other algorithms in fact (it reaches to optimal on 11 data sets). The F-test statistic on average numbers of selected features shows that CBDA performs better than BWO, BES, and CNN-CNN. Overall, CBDA can achieve the best average classification accuracy by selecting slightly more features, which is acceptable in most cases.

Table 17

Numbers of selected features for CBDA versus other optimizers

Data set	CBDA		IBDA		BDA		BWO		BES		CNN-CNN
	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD	AVG	STD
Arrhythmia	118.80	13.81	107.60	15.62	84.40	27.95	123.87	8.65	138.50	9.56	122.53	9.66
Breastcancer	6	0	6	0	5.57	0.86	6	0	5.93	0.25	6	0
BreastEW	2.20	0.41	2.07	0.25	5.27	1.89	9.10	1.75	10.20	1.35	3.63	1.35
Dermatology	18.60	2.51	18.23	2.43	16.33	3.84	18.63	2.75	20.07	2.43	18.80	1.85
Diabets	4	0	4	0	4.13	0.63	4	0.00	4	0	4	0
Exactly	6	0	6	0	6	1.46	6.83	0.70	6.73	0.45	6	0
Exactly-II	1.73	1.66	1.87	1.22	2.13	2.08	1.23	0.57	1.83	0.99	1.73	1.57
Glass	5	0	5	0	4.80	0.61	5	0	5.03	0.18	5	0
HeartEW	5.43	1.19	5.47	1.07	5.23	1.33	4.43	0.77	4.87	1.28	5.40	1.10
Hepatitis	6.50	1.07	5.97	1.35	6.07	1.68	7.97	1.77	8.03	1.94	6.40	1.45
Hillvalley	44.30	4.30	41.03	5.12	30.57	12.39	43.30	4.54	47.50	4.95	45.97	5.77
Ionosphere	7.67	2.25	8.23	3.18	7.57	2.86	9.87	2.46	11.23	2.79	10.33	2.96
Krvskp	19.00	2.64	19.57	2.24	17.57	3.42	18.77	2.25	19.70	2.55	18.90	2.29
Lung	126.37	17.48	122.27	17.62	96.03	32.17	144.73	8.22	156.93	7.17	140.23	9.05
Lung-cancer	11.63	5.02	7.60	4.03	13.87	5.10	21.10	3.06	24.43	3.46	14.30	3.56
Lymphography	7.67	1.69	7.40	2.11	7.03	2.17	8.63	1.63	8.10	1.69	8.40	1.71
M-of-N	6.00	0.00	6.00	0.00	6.27	0.45	6.93	0.78	6.80	0.41	6	0
Movementlibras	39.97	4.19	37.97	5.11	32.80	7.91	39.70	5.04	43.93	4.03	39.93	5.42
Semeion	129.70	10.70	132.50	11.19	105.90	23.24	122.37	8.17	132.53	7.52	130.50	8.22
Sonar	22.23	4.51	21.27	3.98	20.50	5.58	25.80	2.83	27.50	3.79	24.23	3.47
Spect	9.90	1.95	10.40	2.14	8.93	2.66	9.67	1.99	10.60	1.94	11.00	1.44
Tic-Tac-Toe	9	0	9	0	7.17	1.49	8.93	0.37	9.00	0	9.00	0
Vowel	9	0	9	0	8.17	1.02	8.30	0.99	8.93	0.37	9.00	0
WDBC	2.13	0.35	2.37	0.76	5.13	2.29	9.83	1.46	12.00	2.35	4.30	1.95
Rank (F-test)	3.2038		3		2.0625		3.75		5.0625		3.9167
Wins	1.78		3.78		14.25		2.2		0.2		0.78

5.4 Comparison with other algorithms from the literature

In this section, we compare the classification accuracy of CBDA with that of five well-known filter-based algorithms: CFS, FCBF, F-score, IG, and Spectrum, as reviewed in [73]. The comparative results of classification accuracy on the same benchmark data sets are reported in [74], and Table 18 shows the results of other established algorithms.

As seen in Table 18, CBDA outperforms those filter-based algorithms on 76.92% of the data sets. A more visual comparison of classification accuracy is presented in Fig. 9, which shows that CBDA performs better than most algorithms in all evaluation criteria. The only three data sets where CBDA does not achieve the best accuracies are Breastcancer, Lymphography, and Spect, which share similar characteristics, such as having a small number of features and relatively few instances (except for Breastcancer). However, due to the randomness of initial populations, CBDA may reach the optimal solution on Breastcancer in another experiment. Therefore, we can conclude that CBDA is more advantageous than the five filter-based algorithms in feature selection optimizations. The main reason for the superior accuracy of CBDA is the effectiveness of the introduced improved factors, which enable CBDA to alleviate stagnation problems and avoid getting trapped in local optima, thus discovering more high-quality solutions and achieving better performance on most data sets. But here we would like to illustrate that this does not means that CBDA is always better than the filter-based algorithms on dealing with other feature selection problems, especially in the scene that the tackled problems need a method with a faster running time to address, which indicates that the filter-based method may be more suitable than CBDA.

Figure 9.

Visualization of classification accuracy results.

Table 18

Classification accuracies for CBDA versus other filter-based algorithms

	CBDA	CFS	FCBF	F-Score	IG	Spectrum
Breastcancer	0.98	0.96	0.99	0.98	0.96	0.96
BreastEW	0.96	0.83	0.80	0.93	0.93	0.77
Exactly	1	0.67	0.44	0.60	0.62	0.58
Exactly-II	0.79	0.71	0.55	0.68	0.62	0.66
HeartEW	0.85	0.65	0.65	0.76	0.76	0.80
Ionosphere	0.92	0.86	0.86	0.73	0.80	0.83
Krvskp	0.98	0.77	0.93	0.96	0.93	0.38
Lymphography	0.44	0.50	0.57	0.67	0.67	0.77
M-of-N	1	0.79	0.82	0.82	0.82	0.58
Semeion	0.98	0.88	0.88	0.88	0.87	0.88
Sonar	0.92	0.31	0.21	0.05	0.19	0.05
Spect	0.73	0.74	0.77	0.79	0.79	0.74
Tic-Tac-Toe	0.85	0	0	0.01	0.01	0.17

5.5 The superiority of the proposed algorithm

After thoroughly analyzing the comparative experimental results presented in this study, it becomes evident that the incorporation of improvement factors contributes significantly to CBDA in the following ways: (i) The integration of a chaotic map expands the search space for CBDA, enabling it to explore a wider range of potential solutions. As a result, CBDA exhibits a robust global search capability compared to other algorithms. (ii) The EPD mechanism enhances the performance of CBDA by effectively eliminating inferior solutions through mutation and crossover operations during the iterative search process. Consequently, CBDA generates a greater number of high-quality solutions compared to its counterparts. (iii) The binarization strategy employed by CBDA facilitates the efficient conversion of solution spaces from continuous to binary representation. This strategy enhances the stability and reliability of feature selection, improving the overall performance of CBDA.

In summary, the comprehensive evaluation of comparative experimental results highlights the significant contributions of incorporated improvement factors to CBDA, including its strong global search capability, efficient performance through EPD, and enhanced stability and reliability through the binarization strategy. Moreover, the superiority of CBDA and the specific circumstances that CBDA can be applied to are summarized as follows:

Global search capability: CBDA possesses strong global search capability, allowing it to quickly find the optimal feature subset within the search space. This enables it to handle feature selection problems in high-dimensional data sets and identify the most representative and discriminative features.

Efficient performance: BDA has parallel search strategies and the improvement factors in CBDA, enables simultaneous exploration of multiple solution spaces and accelerates the search process. Additionally, it utilizes adaptive parameter adjustment and adaptive neighborhood search mechanisms, further enhancing the algorithm’s efficiency and performance.

Scalability: CBDA exhibits flexibility in adapting to different problems and data sets, and can be combined with other optimization algorithms. It can adapt to different feature selection tasks by adjusting parameters and modifying fitness functions, and is capable of handling large-scale data sets.

Robustness: CBDA demonstrates a certain level of robustness against noise and outliers. It can adapt to different data feature distributions and noise conditions through adaptive neighborhood search and parameter adjustment, thereby improving the stability and reliability of feature selection.

In conclusion, CBDA possesses strong global search capability, efficient performance, scalability, and robustness in feature selection, making it an effective method in this domain.

6. Conclusions and future works

This work conducts a specific study on the problem of feature selection in machine learning. Firstly, the fitness function is designed to reduce the number of selected features while improving classification accuracy. Based on conventional DA, we propose a new algorithm called CBDA with several improved factors, including chaos, EPD_ERS mechanism, and a $V$ -shape binary transfer, to address the problem. The chaotic map called Tent is adopted to improve the exploitation capability and find more diverse solutions. Then, we introduce the EPD mechanism to balance the exploration and exploitation capabilities of CBDA and use the ERS selection method to select guide solutions. Afterwards, we apply the mutation and crossover operations in CBDA to further enhance the exploration and exploitation capabilities of the algorithm. Next, we use a $V$ -shape transfer function to transform continuous solutions to binary ones in each iteration. Besides we evaluate the proposed CBDA on 24 UCI data sets and compare the results with two groups of experiments. To begin with, we performed ablation experiments to compare CBDA with its ablated versions (CBDA-V1, CBDA-V2, and CBDA-V3). The findings indicate that the EPD mechanism and the utilized $V$ -shape transfer function effectively eliminate solutions with the lowest quality and facilitate the conversion of continuous solutions into binary ones. Moreover, the incorporation of a chaotic map expands the search space of CBDA. Subsequently, we conducted experiments to evaluate CBDA against five state-of-the-art algorithms (IBDA, BDA, BWO, BES, and CNN-CNN) using various evaluation metrics. The results consistently demonstrate the superiority of CBDA over all other algorithms. Lastly, we compared CBDA with the reported results from five filter-based algorithms. This comparison further confirms the advantages of CBDA across most tested data sets.

However, CBDA has some drawbacks naturally, such as selecting more feature numbers than others to ensure classification accuracy and possibly being unstable in dealing with specific data sets. For the future work, we plan to overcome these drawbacks and apply CBDA to solve extensive multi-objective formulation solving problems. We also intend to introduce more effective improved measurements to other evolutionary computation algorithms to solve feature selection problems.

Footnotes

Acknowledgments

This study is supported in part by the National Natural Science Foundation of China (62172186, 62002133, 61872158, 62272194), in part by the Science and Technology Development Plan Project of Jilin Province (20210101183JC, 20210201072GX), and in part by the Young Science and Technology Talent Lift Project of Jilin Province (QT202013).

References

Khaire

U.M.

and Dhanalakshmi

, Stability of feature selection algorithm: A review, Journal of King Saud University-Computer and Information Sciences 34(4) (2022), 1060–1073.

MÃ¼ller

Segin

Weigand

and Schmitt

, Feature selection for measurement models, International Journal of Quality & Reliability Management, ahead-of-print, 2022. doi: 10.1108/IJQRM-07-2021-0245.

Bolón-Canedo

Alonso-Betanzos

Morán-Fernández

and Cancela

, Feature Selection: From the Past to the Future, 2022, 11–34. ISBN 978-3-030-93051-6. doi: 10.1007/978-3-030-93052-3_2.

Cichos

Gustavsson

Mehlig

and Volpe

, Machine Learning for Active Matter, Nature Machine Intelligence 2(2) (2020), 94–103.

Taylor

Griffiths

Hall

and Mouzakitis

, Feature Selection for Supervised Learning and Compression, Applied Artificial Intelligence, 2022, 1–35. doi: 10.1080/08839514.2022.2034293.

Crase

Hall

and Thennadil

, Feature selection for cluster analysis in spectroscopy, Computers, Materials & Continua 71 (2022), 2435–2458. doi: 10.32604/cmc.2022.022414.

Rao

Shi

Rodrigue

A.K.

Feng

Xia

Elhoseny

Yuan

and Gu

, Feature selection based on artificial bee colony and gradient boosting decision tree, Applied Soft Computing 74 (2019), 634–642.

Sun

Zhang

and Xiao

, Bio-inspired feature selection: An improved binary particle swarm optimization approach, IEEE Access 8 (2020), 85989–86002.

Ghosh

Guha

Sarkar

and Abraham

, A wrapper-filter feature selection technique based on ant colony optimization, Neural Computing and Applications 32(12) (2020), 7839–7857.

10.

Barddal

J.P.

Enembreck

Gomes

H.M.

Bifet

and Pfahringer

, Merit-guided dynamic feature selection filter for data streams, Expert Systems with Applications 116 (2019), 227–242.

11.

Mafarja

and Mirjalili

, Whale optimization approaches for wrapper feature selection, Applied Soft Computing 62 (2018), 441–453.

12.

Wootton

A.J.

Taylor

S.L.

Day

C.R.

and Haycock

P.W.

, Optimizing echo state networks for static pattern recognition, Cognitive Computation 9(3) (2017), 1–9.

13.

Brezočnik

Fister Jr

and Podgorelec

, Swarm intelligence algorithms for feature selection: A review, Applied Sciences 8(9) (2018), 1521.

14.

A.-D.

Xue

and Zhang

, Multi-objective particle swarm optimization for key quality feature selection in complex manufacturing processes, Information Sciences 641 (2023), 119062.

15.

Sathiyadhas

S.S.

and Soosai Antony

M.C.V.

, A network intrusion detection system in cloud computing environment using dragonfly improved invasive weed optimization integrated Shepard convolutional neural network, International Journal of Adaptive Control and Signal Processing 36(5) (2022), 1060–1076.

16.

Karimi

Dowlatshahi

M.B.

and Hashemi

, SemiACO: A semi-supervised feature selection based on ant colony optimization, Expert Systems with Applications 214 (2023), 119130.

17.

Pan

Chen

and Xiong

, A high-dimensional feature selection method based on modified Gray Wolf Optimization, Applied Soft Computing 135 (2023), 110031.

18.

Sahu

Singh

B.K.

and Nirala

, An improved feature selection approach using global best guided Gaussian artificial bee colony for EMG classification, Biomedical Signal Processing and Control 80 (2023), 104399.

19.

Ewees

A.A.

Gaheen

M.A.

Yaseen

Z.M.

and Ghoniem

R.M.

, Grasshopper optimization algorithm with crossover operators for feature selection and solving engineering problems, Ieee Access 10 (2022), 23304–23320.

20.

Mirjalili

, Dragonfly algorithm: A new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems, Neural Computing and Applications 27(4) (2016), 1053–1073.

21.

Mahalakshmi

Balamurugan

S.A.A.

Chinnadurai

and Vaishnavi

, Nature-Inspired Feature Selection Algorithms: A Study, in: Sustainable Communication Networks and Application: Proceedings of ICSCN 2021, Springer, 2022, pp. 739–748.

22.

Rahman

C.M.

Rashid

T.A.

Alsadoon

Bacanin

Fattah

and Mirjalili

, A survey on dragonfly algorithm and its applications in engineering, Evolutionary Intelligence, 2021, 1–21.

23.

Maldonado

Riff

M.C.

and Neveu

, A review of recent approaches on wrapper feature selection for intrusion detection, Expert Systems with Applications 198 (2022), 116822.

24.

Sadeghian

Akbari

Nematzadeh

and Motameni

, A review of feature selection methods based on meta-heuristic algorithms, Journal of Experimental & Theoretical Artificial Intelligence, 2023, 1–51.

25.

Eskandari

and Seifaddini

, Online and offline streaming feature selection methods with bat algorithm for redundancy analysis, Pattern Recognition 133 (2023), 109007.

26.

Pan

J.-S.

Chu

S.-C.

and Sun

, Multi-surrogate assisted binary particle swarm optimization algorithm and its application for feature selection, Applied Soft Computing 121 (2022), 108736.

27.

Rajammal

R.R.

Mirjalili

Ekambaram

and Palanisamy

, Binary grey wolf optimizer with mutation and adaptive k-nearest neighbour for feature selection in Parkinson’s disease diagnosis, Knowledge-Based Systems 246 (2022), 108701.

28.

Venkateswaran

Ramachandran

Chinnasamy

Sivaji

and Amudha

, An extensive study on gravitational search algorithm, Materials and Its Characterization 1(1) (2022), 9–16.

29.

Hou

and Li

, BIFFOA: A novel binary improved fruit fly algorithm for feature selection, IEEE Access 7 (2019), 81177–81194.

30.

Thaher

Chantar

Too

Mafarja

Turabieh

and Houssein

E.H.

, Boolean Particle Swarm Optimization with various Evolutionary Population Dynamics approaches for feature selection problems, Expert Systems with Applications 195 (2022), 116550.

31.

Mafarja

Qasem

Heidari

A.A.

Aljarah

Faris

and Mirjalili

, Efficient hybrid nature-inspired binary optimizers for feature selection, Cognitive Computation 12(1) (2020), 150–175.

32.

Xue

and Zhang

, Self-adaptive particle swarm optimization for large-scale feature selection in classification, ACM Transactions on Knowledge Discovery from Data (TKDD) 13(5) (2019), 1–27.

33.

Arora

and Anand

, Binary butterfly optimization approaches for feature selection, Expert Systems with Applications 116 (2019), 147–160.

34.

Paniri

Dowlatshahi

M.B.

and Nezamabadi-Pour

, MLACO: A multi-label feature selection algorithm based on ant colony optimization, Knowledge-Based Systems 192 (2020), 105285.

35.

Ibrahim

R.A.

Ewees

A.A.

Oliva

Abd Elaziz

and Lu

, Improved salp swarm algorithm based on particle swarm optimization for feature selection, Journal of Ambient Intelligence and Humanized Computing 10(8) (2019), 3155–3169.

36.

Zhang

Heidari

A.A.

Chen

and Li

, Gaussian mutational chaotic fruit fly-built optimization and feature selection, Expert Systems with Applications 141 (2020), 112976.

37.

Mafarja

M.M.

Eleyan

Jaber

Hammouri

and Mirjalili

, Binary Dragonfly Algorithm for Feature Selection, in: 2017 International Conference on New Trends in Computing Sciences, 2017, pp. 12–17.

38.

Hammouri

A.I.

Mafarja

Al-Betar

M.A.

Awadallah

M.A.

and Abu-Doush

, An improved dragonfly algorithm for feature selection, Knowledge-Based Systems 203 (2020), 106131.

39.

Cui

Fan

Wang

and Zheng

, A hybrid improved dragonfly algorithm for feature selection, IEEE Access 8 (2020), 155619–155629.

40.

Kang

Sun

Feng

and Ji

, IBDA: Improved binary dragonfly algorithm with evolutionary population dynamics and adaptive crossover for feature selection, IEEE Access PP(99) (2020), 1–1.

41.

Tawhid

M.A.

and Dsouza

K.B.

, Hybrid binary dragonfly enhanced particle swarm optimization algorithm for solving feature selection problems, Mathematical Foundations of Computing 1(2) (2018), 181–200.

42.

Chen

Liu

Han

Yao

Jin

and Hu

, A Spark-based Distributed Dragonfly Algorithm for Feature Selection, in: 2020 15th International Conference on Computer Science & Education (ICCSE), IEEE, 2020, pp. 419–423.

43.

Sayed

G.I.

Tharwat

and Hassanien

A.E.

, Chaotic dragonfly algorithm: An improved metaheuristic algorithm for feature selection, Applied Intelligence 49(1) (2019), 188–205.

44.

Wolpert

D.H.

and Macready

W.G.

, No free lunch theorems for optimization, IEEE Transactions on Evolutionary Computation 1(1) (1997), 67–82.

45.

Telikani

Gandomi

A.H.

and Shahbahrami

, A survey of evolutionary computation for association rule mining, Information Sciences 524 (2020), 318–352.

46.

Moslehi

and Haeri

, An evolutionary computation-based approach for feature selection, Journal of Ambient Intelligence and Humanized Computing 11 (2020), 3757–3769.

47.

Rostami

Berahmand

Nasiri

and Forouzandeh

, Review of swarm intelligence-based feature selection methods, Engineering Applications of Artificial Intelligence 100 (2021), 104210.

48.

Zaman

E.A.K.

Mohamed

and Ahmad

, Feature selection for online streaming high-dimensional data: A state-of-the-art review, Applied Soft Computing, 2022, 109355.

49.

Sayed

G.I.

Tharwat

and Hassanien

A.E.

, Chaotic dragonfly algorithm: An improved metaheuristic algorithm for feature selection, Applied Intelligence 49 (2019), 188–205.

50.

Chantar

Tubishat

Essgaer

and Mirjalili

, Hybrid binary dragonfly algorithm with simulated annealing for feature selection, SN Computer Science 2(4) (2021), 295.

51.

Al-Tashi

Kadir

Rais

H.M.

Mirjalili

and Alhussian

, Binary Optimization Using Hybrid Grey Wolf Optimization for Feature Selection, IEEE Access, 2019.

52.

Altman

N.S.

, An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician 46(3) (1992), 175–185.

53.

Aljarah

Al-Zoubi

Faris

Hassonah

M.A.

Mirjalili

and Saadeh

, Simultaneous feature selection and support vector machine optimization using the grasshopper optimization algorithm, Cognitive Computation 10(3) (2018), 478–495.

54.

Khalilpourazari

and Khalilpourazary

, Optimization of time, cost and surface roughness in grinding process using a robust multi-objective dragonfly algorithm, Neural Computing and Applications 32(1) (2020).

55.

Chatterjee

Biswas

Majee

Sen

Oliva

and Sarkar

, Breast cancer detection from thermal images using a Grunwald-Letnikov-aided Dragonfly algorithm-based deep feature selection method, Computers in Biology and Medicine 141 (2022), 105027.

56.

Bilal, Alatas, Erhan, Akin, A., Bedri and Ozer, Chaos embedded particle swarm optimization algorithms, Chaos, Solitons & Fractals 40(4) (2009), 1715–1734.

57.

Demir

F.B.

Tuncer

and Kocamaz

A.F.

, A chaotic optimization method based on logistic-sine map for numerical function optimization, Neural Computing and Applications 32(17) (2020), 14227–14239.

58.

Sayed

G.I.

Hassanien

A.E.

and Azar

A.T.

, Feature selection via a novel chaotic crow search algorithm, Neural Computing and Applications 31(1) (2019), 171–188.

59.

Saremi

Mirjalili

and Lewis

, Biogeography-based optimisation with chaos, Neural Computing & Applications 25(5) (2014), 1077–1097.

60.

Ansari Shiri

and Mansouri

, Hybrid filter-wrapper feature selection using equilibrium optimization, Journal of Algorithms and Computation 55(1) (2023), 101–122.

61.

Walter

and Hinterberger

, Self-organized criticality as a framework for consciousness: A review study, Frontiers in Psychology 13 (2022), 911620.

62.

Shehab

Alshawabkah

Abualigah

and AL-Madi

, Enhanced a hybrid moth-flame optimization algorithm using new selection schemes, Engineering with Computers 37 (2021), 2931–2956.

63.

Faris

Mafarja

M.M.

Heidari

A.A.

Aljarah

and Fujita

, An Efficient Binary Salp Swarm Algorithm with Crossover Scheme for Feature Selection Problems, Knowledge-Based Systems, 2018.

64.

Mirjalili

and Lewis

, S-shaped versus V-shaped transfer functions for binary particle swarm optimization, Swarm and Evolutionary Computation 9 (2013), 1–14.

65.

Liu

J.H.

Yang

R.H.

and Sun

S.H.

, The analysis of binary particle swarm optimization, Journal of Nanjing University (Natural Sciences), 2011.

66.

Bache

and Lichman

, UCI Machine Learning Repository, 2013.

67.

Emary

Zawbaa

H.M.

and Hassanien

A.E.

, Binary ant lion approaches for feature selection, Neurocomputing 213 (2016), 54–65.

68.

Abdel-Basset

El-Shahat

and Sangaiah

A.K.

, A modified nature inspired meta-heuristic whale optimization algorithm for solving 0–1 knapsack problem, International Journal of Machine Learning and Cybernetics 10 (2019), 495–514.

69.

Mafarja

Aljarah

Heidari

A.A.

Faris

Fournier-Viger

and Mirjalili

, Binary dragonfly optimization for feature selection using time-varying transfer functions, Knowledge-Based Systems 161 (2018), 185–204.

70.

Zhong

and Meng

, Beluga whale optimization: A novel nature-inspired metaheuristic algorithm, Knowledge-Based Systems 251 (2022), 109215.

71.

Alsattar

H.A.

Zaidan

and Zaidan

, Novel meta-heuristic bald eagle search optimisation algorithm, Artificial Intelligence Review 53 (2020), 2237–2264.

72.

Alabsi

B.A.

Anbar

and Rihan

S.D.A.

, CNN-CNN: Dual convolutional neural network approach for feature selection and attack detection on internet of things networks, Sensors 23(14) (2023), 6507.

73.

Zhong

Chen

and Peng

, Feature selection based on a novel improved tree growth algorithm, International Journal of Computational Intelligence Systems 13(1) (2020), 247–258.

74.

Mafarja

Aljarah

Heidari

A.A.

Hammouri

A.I.

Faris

Al-Zoubi

A.M.

and Mirjalili

, Evolutionary population dynamics and grasshopper optimization approaches for feature selection problems, Knowledge-Based Systems 145 (2018), 25–45.

CBDA: Chaos-based binary dragonfly algorithm for evolutionary feature selection

Abstract

Keywords

1. Introduction

2. Related work

3. Background

3.1 Population-based evolutionary computation algorithms for feature selection

3.2 Conventional DA

4.1 CBDA

4.1.1 Chaotic map in CBDA

(a) The selection method for EPD

(b) The mutation and crossover operation for EPD

Table 1 Benchmark data sets

5.1 Benchmark data sets

5.2 Parameter tuning and experiment setups

5.2.1 Parameter tuning

5.2.2 Experiment setups

5.3 Experiment results

5.3.1 Evaluation metrics

5.3.2 The results of ablation experiments of CBDA

Table 14 Classification accuracies for CBDA versus other optimizers

6. Conclusions and future works

Footnotes

Acknowledgments

References

Table 1
Benchmark data sets

Table 14
Classification accuracies for CBDA versus other optimizers