A discrete equilibrium optimization algorithm for breast cancer diagnosis

Abstract

Illness diagnosis is the essential step in designating a treatment. Nowadays, Technological advancements in medical equipment can produce many features to describe breast cancer disease with more comprehensive and discriminant data. Based on the patient’s medical data, several data-driven models are proposed for breast cancer diagnosis using learning techniques such as naive Bayes, neural networks, and SVM. However, the models generated are hardly understandable, so doctors cannot interpret them. This work aims to study breast cancer diagnosis using the associative classification technique. It generates interpretable diagnosis models. In this work, an associative classification approach for breast cancer diagnosis based on the Discrete Equilibrium Optimization Algorithm (DEOA) named Discrete Equilibrium Optimization Algorithm for Associative Classification (DEOA-AC) is proposed. DEOA-AC aims to generate accurate and interpretable diagnosis rules directly from datasets. Firstly, all features in the dataset that contains continuous values are discretized. Secondly, for each class, a new dataset is created from the original dataset and contains only the chosen class’s instances. Finally, the new proposed DEOA is called for each new dataset to generate an optimal rule set. The DEOA-AC approach is evaluated on five well-known and recently used breast cancer datasets and compared with two recently proposed and three classical breast cancer diagnosis algorithms. The comparison results show that the proposed approach can generate more accurate and interpretable diagnosis models for breast cancer than other algorithms.

Keywords

Breast cancer diagnosis interpretable diagnosis associative classification population-based intelligence discrete equilibrium optimization algorithm

1. Introduction

Breast cancer results from specific breast cells that multiply and form a mass called a tumor. The tumor is benign if its cells are normal, whereas it is malignant when the cells overgrow (cancerous cells). Breast cancer is the most diagnosed cancer and is the principal cause of death from cancer among women worldwide. In Canada, for example, one in eight women will develop breast cancer in her life, and one in thirty women will die from breast cancer. However, chances of recovery depend on the type of cancer and when it is detected. Currently, the essential thing in the medical field is the early diagnosis and detection of breast cancer that helps to cure it in the early stage. In China, early cancer detection makes about 30% of cancerous persons survive a long time [1]. Hence, early detection, diagnosis, and monitoring the women’s health are significant issues that must be considered carefully.

Over the past years, with the development of biomedical and electronic equipment on the one hand and the evolution of information technology, many data related to breast cancer disease have been collected and saved in large medical databases. Hence, the doctor cannot manually process this enormous amount of data to diagnose the presence or absence of cancer in women’s breasts, requiring computer-aided diagnosis. Besides, in recent years, researchers in computer science and especially in the data mining field have developed advanced techniques for discovering knowledge from data. This advancement motivates many researchers to develop reliable data mining methods applied to medical data. Because the Breast Cancer Diagnosis (BCD) serves to classify the tumor type into benign or malignant, it is considered a classification problem in several research works [2]. In recent years, several classification approaches have been developed and successfully applied to BCD problems, including Support Vector Machine (SVM) [3] and Deep Learning (DL) [4]. These methods generate black-box models that are not understandable by users. These methods only maximize the model’s classification accuracy. However, they overlook the models’ understandability or interpretability, which is very important in diagnosis. Especially in medical diagnosis, the doctors need to interpret their decisions because it concerns about a human’s life. Hence, the Associative Classification (AC) answers this concern by generating interpretable models. Then, it is very important to develop accurate AC models for BCD. The main contribution of this work is to propose a novel AC approach for generating accurate and interpretable BCD models employing a recent efficient optimization algorithm called Equilibrium Optimization Algorithm (EOA) [5]. In the proposed approach, the main measures, including the diagnosis models’ accuracy and interpretability, are considered evaluation metrics.

The outline of this paper is structured as follows. In the second section, we describe the AC task in data mining. The third Section presents a literature review of the BCD problem. In the fourth Section, we present this work’s research motivation and contributions. Section 5 describes the details of our approach. The experimental results and discussion are presented in Section 6. Finally, Section 7 concludes our research work.

2. Related works

In the last decades, BCD has been an active research area and has attracted researchers’ attention. Several new diagnosis approaches and models are proposed in the literature for the diagnosis in general and especially for medical diagnosis. In the literature, various data-driven and Machine learning (ML) classification methods have been proposed to generate accurate BCD models based on recorded clinical data. In this section, we summarize and classify BCD approaches. The classification algorithms investigated in BCD are categorized into two main categories: black-box and white-box classification algorithms. The first category includes SVM, Neural Network (NN), and DL approaches. The second category includes the algorithms based on Decision Trees (DT) and AC or rule-based classifiers.

2.1 Black-box classification approaches

Many researchers for BCD have introduced NN methods. The authors in [9] used a multilayer perception with retro-propagation to classify breast cancer, and the model generated has an accuracy of 96.21%. In 2009, Murat [10] proposed a BCD system based on association rules and NN. It proposed a NN architecture for the classification, which contains only one hidden layer in addition to input and output layers. However, the ARs are used to reduce the dimensionality of the datasets. The proposed system obtains an accuracy of 95%. Alkim et al. [11] proposed a fast and adaptive BCD system called learning vector quantization artificial NNs, which considerably reduces the diagnosis decision time. In 2016, Zribi and Boujelbene [12] used the NN with an incremental learning algorithm for BCD. This approach was compared with previous algorithms. It obtains a model with an accuracy of 99.95% over the Wisconsin breast cancer dataset

As another data-driven classification technique, SVM has been widely used for medical diagnosis because of its capability to handle high-dimensional data. In recent years, SVM has attracted the most attention from researchers in the BCD field [13]. Liu et al. [14] applied SVM for clinic BCD in 2003. The SVM method is compared with the NN method and several other machine learning techniques, and the experimental results show that SVM had the best performance. There are generally many irrelevant features in the datasets that increase the computational complexity of the SVM training phase [15]. However, to improve the computational result of SVM, many feature selection techniques are applied in several works with SVM before constructing the classifier. F-score feature selection algorithm with SVM for breast cancer prediction was proposed by Akay [16]. Chen et al. [17] applied a rough set technique for feature selection and SVM for classification in another work. The number of features was halved in the Wisconsin original breast cancer dataset. In 2004, Genetic Algorithms (GA) were used with SVM for feature selection for BCD [18]. The authors use the Wisconsin breast cancer dataset to validate this approach. The experimental results show that the performance of the generated SVM diagnosis models is considerably improved. Recently, in 2018 an SVM-based weighted ensemble learning method called WAUCE [4] with six kernel functions was proposed for BCD. The WAUCE model achieves a higher accuracy with significantly lower variance than other methods.

DL methods have dominated the literature for the past few years and have been effective in classification accuracy, taking advantage of increasing hardware computational power [5]. The DL approaches, including Convolutional Neural Networks (CNNs), Deep Autoencoders (DANs), Deep Belief Networks (DBNs), Stacked Autoencoders (SAE) and Recurrent Neural Network (RNN), has been recently applied for BCD with great success. The CNNs method is used for image analysis [19], for breast cancer classification [20], and mammographic BCD [21]. Although the CNN approaches obtain good results on big-size datasets, it fails to achieve significant results on small-sized datasets [22].

To accelerate the training time of CNNs, transfer learning has been combined with CNNs in some works [22, 23]. On another side, several works have been realized to improve the performance of DL approaches for BCD. Feng et al. [24] have successfully applied a deep manifold preserving autoencoder for classifying breast cancer images. Abdel-Zaher et al. [25] proposed a DBN for breast cancer classification. The RNN system is proposed in [26] for the breast lesion classification.

2.2 White-box classification approaches

To overcome the drawback of black-box classification methods, interpretable models have been introduced to allow the decision-makers and doctors in BCD to explain their decisions.

DTs and random forest are commonly used as white-box classification methods in several domains, especially for BCD, because of their easy conversion into a set of classification rules understandable by decision-makers. The C4.5, CART, and ID3 are the most well-known decision tree methods. However, in DTs, the training’s simple change may produce a more significant change in the generated model [27].

Many AC algorithms have been developed based on classification rules. The classifiers are based on classification rules generated from dependencies between the class feature and other features. AC algorithms are implemented based on two principal stages: the rule generation stage and the prediction stage. Based on the techniques used in the rule generation stage, we distinguish two groups of algorithms: the first group contains the association rule-based algorithms, which first generate the association rule set and select from them the classification rules used. The latter group contains the Classification Rule-based Algorithms (CRA). Contrary to the first group, these CRA generate the classification rules directly from the dataset. In [7], the authors proposed a new algorithm that generated an associative classifier from the association rule set. It used the Apriori algorithm[28] to generate the association rules, and after that, the classification rules were extracted from the association rule set.

Li et al. [29] proposed an algorithm called Classification Based on Multiple Association Rules (CMAR), in which they implemented the CR-Tree and FP-Tree as algorithms for the rule generation. CMAR was tested and compared with CBA and other classical classification algorithms using UCI datasets.

To resolve the multi-scanning problem of CBA, Thabtah et al. [30] proposed an algorithm called Multi-class Classification Based on Association Rules (MCAR), in which the Tid-list method is used during the rule generation stage. Hadi [31] developed the Enhancement Class Association Rules (ECAR) and also a new Fast Associative Classification Algorithm (FACA) [31]. In FACA, to enhance the training stage’s speed, the Diffset method is used as an algorithm for the rule generation from the dataset. FACA is evaluated and compared with CBA, CMAR, MCAR, and ECAR and obtains the best results.

The weighted Classification Based on Association Rules (WCBA) is proposed by Alwidian et al. [32] for resolving the estimation measures used in the rule generation process.

Recently, in another way, Wang et al. [33] proposed a method called Improved Random Forest-based Rule Extraction (IRFRE) for rules extraction from DTs. This method aims to derive classification rules for BCD from a set of decision trees generated by the classical classification algorithm Random Forest Algorithm (RFA). It employed a multi-objective evolutionary algorithm for optimizing the CR set extracted.

However, one of the main challenges in the classification rule generation from association rules is that it tends to produce many association rules. The number of rules grows exponentially with the number of features in the datasets. Simultaneously, the associative classifier’s performance highly depends on the classifier size (number of rules) and the rule size (number of conditions in the rule). The direct CR generation approach from the dataset can build classifiers whose size is independent of the dataset’s features.

Several recent approaches have been proposed in the meta-heuristic optimization field and showed promising results in several optimization problems, such as EOA proposed in 2020 [5]. More recently, González-Patiño et al. [34] proposed a new AC method for BCD using an old algorithm called Artificial Immune System (AIS) [35] (which was proposed in 2001) as a meta-heuristic optimizer for the generation of classification rules directly from the dataset. The classification model generated achieves higher accuracy than association rule-based AC but is still worse than the results from recent deep learning methods. According to this limitation, we aimed in this paper to propose a discrete version of EOA and use it for CR mining. Our approach presents the advantages of both direct CR generation from the dataset and the efficiency of the recent meta-heuristic EOA.

3. Associative classification overview

Data mining is a set of tasks and approaches to extracting knowledge from datasets for decision support purposes in several domains, including engineering, economic, biological, and medical. Data classification is among the interesting and widely used tasks among data mining tasks [6]. In 1998, Liu et al. [7] introduced a new type of classification called AC or rule-based classification. AC aims to generate classification models in the form of a Classification Rule (CR) set, which is easily comprehensible by decision-makers to interpret their decisions [8]. The AC is the hybridization of the two data mining tasks: Association Rule (AR) mining and data classification using a set of CRs. Each CR is a particular case of the AR. The CR has the form $X\Rightarrow Y$ , where $X$ is a sub-set from $F-{\{}C{\}}$ and $Y={\{}C{\}}$ . Let a dataset $D={\{}t_{1},t_{2},\ldots,t_{n}{\}}$ where $t_{i}$ is the $i^{\text{th}}$ transaction in D; each transaction is represented by a sample of $m+1$ features $F={\{}f_{1},f_{2},\ldots,f_{m},C{\}}$ where $f_{i}$ , $1\leqslant i\leqslant m$ are called items, and C is called class item. An AR is an association between two sub-sets of features from the features set of D. For example, the rule R: X $\Rightarrow$ Y where X and Y are sub-sets from F, and X $\cap$ Y $=$ Ø is an association rule, X is called antecedent, and Y is called consequent of the rule R. However, in the case of CR the consequent Y must be a singleton containing a class feature.

The associative classifier is defined as a set of classification rules $AC=\{CR_{1},CR_{2},\ldots CR_{N}\}$ , where $N$ is the number of classification rules (classifier size), and $CR_{i}$ is the $i^{\text{th}}$ classification rule. Because the AC can be applied only to discrete data (categorical attribute), the AC’s methods generally operate in two phases. First is data discretization if the dataset contains continuous data. Second, CR Mining (CRM) from the discrete dataset. There are many proposed literature approaches for mining the classification rules. In the new Section, we summarize and review these approaches.

4. Motivation and contributions

The main aim of the recently proposed algorithms for classification is usually to improve the accuracy of classifiers. However, in the BCD case, the diagnosis decision directly affects the patient’s safety. Then, it requires high classification accuracy and high interpretability of the diagnosis models, allowing the doctors to make safe decisions. That is another challenge for researchers. AC methods generate interpretable models composed of classification rules which are easily readable and interpretable [36]. Rule generation is an essential task in AC, which needs intelligent approaches. In the literature, all proposed approaches generate rules indirectly, i.e., generate the association rules and extract the classification rules from them. However, these approaches generate many rules that complicate the interpretability of the classifier. Based on a recent and efficient meta-heuristic, this study proposes an intelligent CR generation algorithm for an efficient AC approach. Our approach generates the classification rules directly from data. Our work has three main contributions:

First, our study designs an effective AC method that generates interpretable classification models for BCD.

Second, the proposed approach uses the recent meta-heuristic algorithm EOA to generate classification rules directly from the dataset, improving the models’ accuracy and interpretability. EOA obtained the best results in several optimization problems [5].

Third is proposing a new discrete version of the EOA called DEOA to solve the CR generation problem. The proposal of new discrete operators can improve the performance of the discrete algorithm.

5. Proposed approach methodology

This work aims to design an AC approach for generating interpretable BCD models. The proposed approach comprises two main components: associative classifier for BCD and discrete EOA for classification rule generation. Based on the proposed discrete EOA, the associative classifier is generated for BCD. The proposed approach is presented in detail in the rest of this Section. We further evaluate and compare the proposed approach performance with other AC approaches.

5.1 Associative classification-based BCD framework

We describe, in general, the proposed DEOA-AC in this Section. In Fig. 1, the workflow of the proposed approach is presented. It consists of two main parts: the diagnosis module generation and breast cancer diagnosis.

Figure 1.

Flowchart of the AC based BCD framework.

The first part is involved two stages (1) data discretization and (2) diagnosis module generation. In the second part, the user exploits the generated diagnosis module for breast cancer diagnosis of each patient using an intermediate interface. The user can also input new data with its diagnosis result in the training dataset for a possible enrichment of the training dataset.

Stage 1: Data discretization

We used several benchmark datasets from the UCI dataset repository. Generally, these datasets contain discrete-valued (nominal) and real-valued features, while our approach works only with nominal data. Then, all real features were converted to nominal (discrete) features. This process is called data discretization, which isdetailed in sub-Section 6.1.2.

Stage 2: Associative classifier generation

The diagnosis module (associative classifier) generation method used in this work aims to generate the diagnosis module from a Training Set (TS). The rules that compose the associative classifier are mined through an iterative process. In each iteration, one rule is generated using the proposed DEOA. A rule is generated for each class, starting with the highest number of instances class in the TS. Our method operates in four phases, as presented in Algorithm 1:

The Rule Set (RS) is initialized as empty.

While the TS is not empty, a class C with the highest number of non-covered instances in TS is chosen, and a New Training Set (NTS) is created from the instances of class C. After that, all these instances are removed from the TS.

While the NTS is not empty, the proposed DEOA is called for generating one CR from the NTS at each iteration until all NTS instances are covered.

The generated CR is added to RS, and all instances correctly classified by the CR are removed from the NTS. This process continues until all instances are covered. Finally, the RS is considered an associative classifier once the TS is empty.

Stage 3: Breast cancer diagnosis

For any new patient, the user inputs its data to the diagnosis module through an interface and receives the diagnosis result. The associative classifier checks the classification rules in order one by one. Once it has found a rule that matches the new data, i.e., the new data corresponds to the rule’s antecedent, it classifies it according to the class label of the rule. If no rule matches the new data, it is considered as cannot be classified by the classifier.

Algorithm 1: Pseudo code of the Associative classifier generation algorithm
Algorithm (1). Associative classifier generation algorithm
Input: training dataset (TS)
Output: list of rules RS (Associative classifier)
RS $\leftarrow$ Ø //Initialize the rules set as empty;
while (TS in not empty)
Choose the class (C) with the highest number of non-covered instances in TS dataset;
Create a new training dataset (NTS) for the instances of the class C;
Remove all instances of the class C from the TS dataset;
while (NTS is not empty)
CR $\leftarrow$ DEOA(NTS); //the pseudo code of the function DEOA() is
//defined in the Algorithm 2
RS $\leftarrow$ RS $+$ CR; //Add the rule to the set of rules.
Consider the instances covered by this rule as correctly covered and remove it from the TS dataset;
End while
End while
End

5.2 Proposed Discrete Equilibrium Optimization Algorithm for AC (DEOA-AC)

5.2.1 Original Equilibrium Optimizer Algorithm (EOA)

The original EOA is a physical-based meta-heuristic algorithm proposed recently by Faramarzi et al. [5] for solving continuous optimization problems. Like other population-based algorithms, in this algorithm, the search agents are represented by the particles of the solution, and their concentration represents the positions (solutions of the problem). The particles update their concentrations using the physical mass balance equation and the best-so-far solutions, namely equilibrium pool [5] and seek to find the equilibrium state, which is considered the optimal solution.

Because EOA is a population-based algorithm, the particles start with random initial positions (concentration) in the search space. Then, the particle’s position is updated with the physical mass balance equations as in Eq. (1):

$\displaystyle X_{i+1}=X_{eq}+\frac{G}{\lambda\ast V}\ast({1-F})+({X_{i}-X_{eq}% })\ast F$ (1)

Where $X_{i}$ and $X_{i+1}$ are the particle’s current and the new position vectors, respectively $X_{eq}$ is the equilibrium position randomly chosen from the equilibrium pool ( $X_{eq_{\textit{pool}}}$ ) calculated as given in Eqs (2) and (3). $\lambda$ is a random vector in the interval [0, 1]. The two parameters $F$ and $G$ , are essential factors in EOA as they assure the balance between exploitation and exploration of the search space. $F$ is the exponential term defined in Eq. (4), and $G$ is the generation rate defined in Eq. (6). The vector of the equilibrium pool $X_{eq_{\textit{pool}}}$ is composed of five equilibrium candidates: the first four elements are the four best-so-far solutions obtained in the whole optimization process. These four elements help the optimization algorithm to have a better exploration capability of the search space. The fifth element is the average of the above four elements calculated as in Eq. (3), which helps the optimization algorithm better exploit the search space.

$\displaystyle X_{eq_{\textit{pool}}}=\{{X_{eq0},X_{eq1},X_{eq2},X_{eq3},X_{% \textit{ave}}}\}$ (2) $\displaystyle X_{\textit{ave}}=\frac{\{{X_{eq0},X_{eq1},X_{eq2},X_{eq3}}\}}{4}$ (3)

The exponential term $F$ is introduced in Eq. (1) to control the exploration procedure and is defined in Eq. (4):

$\displaystyle F=a_{1}\textit{sign}({r-0.5})({e^{-\lambda t}-1})$ (4)

Where $a_{1}$ is a tuned parameter used to control the optimization algorithm’s diversification capability, $r$ and $\lambda$ are random vectors in the interval [0, 1], and $t$ is the time that is decreased with the number of iterations such as presented in Eq. (1).

$\displaystyle t=\left({1-\frac{\textit{iter}}{\textit{Max}_{\textit{iter}}}}% \right)^{\left({a2\frac{\textit{iter}}{\textit{Max}_{\textit{iter}}}}\right)}$ (5)

Where iter and Max_iter are the current and maximum iterations time, respectively, $a_{2}$ is another tuned parameter constraining the optimization algorithm’s exploration capability.

$G$ is another important vector called generation rate. It is introduced in Eq. (1) for improving the exploitation capability of the EOA and calculated as in Eq. (6):

$\displaystyle G=G_{0}\ast F$ (6)

Where $G_{0}$ is calculated by Eq. (7).

$\displaystyle G_{0}=\textit{GCP}\ast({X_{eq}-\lambda\ast X_{i}})$ (7)

Where GCP is calculated as in Eq. (8).

$\displaystyle\textit{GCP}=\left\{{{\begin{array}[]{ll}0.5\ast r_{1}&\text{if }% r_{2}\geqslant GP\\ 0&\text{if }r_{2}<GP\\ \end{array}}}\right.$ (8)

Where $F$ is the exponential term calculated as in Eq. (4), $C_{eq}$ is the randomly chosen vector from the equilibrium pool, $\lambda$ is a random vector in the interval [0, 1], $X_{i}$ is the current concentration inside the control volume, $r_{1}$ and $r_{2}$ are two random numbers in [0, 1]. GP is another constant value controlling the exploration and exploitation capability of the algorithm.

5.2.2 Proposed Discrete EOA (DEOA)

The original EOA is proposed for solving continuous optimization problems, while our optimization problem (CR mining) is modeled as a combinatorial optimization problem, which is by nature a discrete problem. According to the advantages of the original EOA, investigating its performance on discrete problems is indispensable. So, the original EOA has to be discretized for solving discrete problems. In this sub-section, the discrete equilibrium optimization algorithm version is designed and adapted to solve CR mining problems in AC for BCD. In the following, the steps for building our DEOA are shown.

Step 1: Solution encoding and particle representation

In a CR mining problem, a solution is a generated CR for a given class from the training dataset. Each particle in the DEOA represents a CR, as presented in Section 2. The CR is composed of a subset of features. At each feature, a value from their predefined possible values is assigned. Therefore, we must represent, on the one hand, the selection of a subset of features in the CR (first sub-problem) and, on the other hand, the selection of one feature’s value from their possible predefined values (second sub-problem). Each sub-problem is represented by a d-dimensions vector, where $d$ is the total number of features in the dataset. Then, the particle is represented by two d-dimensional vectors. The first row represents the selection of features in the CR. Each element’s value in the first row means that the corresponding feature is present or not in the CR. The first row values are 0 (feature not selected) or 1 (feature selected). The second row represents the selection of the feature’s value from their predefined values. Then, the $i^{\text{th}}$ element’s value in the second row is a natural number representing a value of the $i^{\text{th}}$ feature in the CR. Therefore, the first row is a binary vector, and the second row is a discrete vector as presented in Fig. 2.

Figure 2.

Particle’s position encoding.

Step 2: Initialization

In the initialization step, a population of $n$ particles with $d$ dimensions have been randomly generated and represented by a 2-dimensional matrix, as shown in Eq. (9).

$\displaystyle X_{i}^{j}=\left[\begin{array}[]{c}x_{1}^{1}x_{2}^{1}\ldots x_{d}% ^{1}\\ x_{1}^{2}x_{2}^{2}\ldots x_{d}^{2}\\ \vdots\\ x_{1}^{n}x_{2}^{n}\ldots x_{d}^{n}\\ \end{array}\right]$ (9)

Each element of the matrix has two values that represent the feature selection in the CR and the feature’s value in the CR, and are generated using Eqs (10) and (11), respectively.

$\displaystyle x_{i}^{j}[1]=\left\{\begin{array}[]{ll}0,&\text{if }r<0.5\\ 1,&\text{if }r\geqslant 0.5\\ \end{array}\right.\textit{where},1\leqslant i\leqslant n;1\leqslant j\leqslant d$ (10) $\displaystyle x_{i}^{j}[2]=\textit{randomvalueval }({f_{i}})\textit{ where},1% \leqslant i\leqslant n;1\leqslant j\leqslant d$ (11)

Where $r$ is a random value in the interval [0, 1], $\text{val}(f_{i})$ is a possible value of the feature $f_{i}$ in the dataset.

Step 3: Fitness function definition

To evaluate the quality of each particle’s position (Rule), we use the coverage fitness [37] defined in Eq. (12):

$\displaystyle\textit{Fitness}({R_{i}})=\frac{NC}{|{TS}|}$ (12)

Where $N C$ is the number of covered instances by the rule $R_{i}$ , and $|TS|$ is the total number of instances in the dataset.

Step 4: Concentration (particle’s position) update

The positions of the particles are updated using Eq. (1) in the original EOA. However, in the discrete EOA, Eq. (1) is modified using discrete operators and replaced by Eqs (13) and (14).

$\displaystyle X_{i+1}=X_{eq}+B=\left\{{{\begin{array}[]{l}\textit{CeqifCeq}% \neq 0\wedge B=0\\ {\textit{BifCeq}=0\wedge B\neq 0}\\ {\textit{Ceq}\vee\textit{BrandomlyifCeq}\neq 0\wedge B\neq 0}\\ {0\textit{ifCeq}=0\wedge B=0}\\ \end{array}}}\right.$ (13)

Where $B$ is calculated as in Eq. (14).

$\displaystyle B=({X_{i}-X_{eq}})\ast F+\frac{G}{\lambda}\ast({1-F})$ (14)

The multiplication operator “*” in Eq. (14) is redefined as in Eq. (15).

$\displaystyle B=\left\{{{\begin{array}[]{ll}X_{i}-X_{eq}&\text{if }F\geqslant 0% \\ \frac{G}{\lambda}&\text{if }F<0\\ \end{array}}}\right.$ (15)

Where $F$ is the exponential parameter vector calculated using Eq. (4), the subtraction operator “ $-$ ” in Eq. (15) is redefined as in Eq. (16)

$\displaystyle X_{i}-X_{eq}=\left\{{{\begin{array}[]{ll}X_{eq}&\text{if }X_{i}% \neq X_{eq}\\ 0&\text{if }X_{i}=X_{eq}\\ \end{array}}}\right.$ (16)

Where $X_{i}$ is the current position vector, $X_{eq}$ is the chosen element from the equilibrium pool. The division operator “/” in Eq. (15) is redefined as in Eq. (17).

$\displaystyle\frac{G}{\lambda}=\left\{{{\begin{array}[]{ll}G&\text{if }\lambda% =0\\ 0&\text{if }G=0\wedge\lambda=1\\ G\vee\lambda\textit{randomly}&\text{if }G=1\wedge\lambda=1\\ \end{array}}}\right.$ (17)

Where $G$ is the generation rate parameter vector calculated using Eq. (18) and $\lambda$ is a random discrete vector.

The vectors, $X_{i}$ , $X_{eq}$ , G, and $\lambda$ are discrete and have the structure as in Fig. 3. However, the vector $F$ is a real vector because it balances exploration and exploitation.

The generation rate parameter calculation (G)

In the original EOA, the parameter $G$ is calculated using Eq. (6). So, in our DEOA, the multiplication operator “*” is redefined, and Eq. (6) is replaced by Eqs (18)–(20).

$\displaystyle G=\left\{{{\begin{array}[]{ll}G_{0}&\text{if }F>0\\ |{1-G_{0}}|&\text{if }F\leqslant 0\\ \end{array}}}\right.$ (18)

Where $G$ is calculated as in Eq. (19) and $F$ is the exponential parameter vector calculated using Eq. (4). In this equation the subtraction operator “ $-$ ” is also redefined.

$\displaystyle G_{0}=X_{eq}-T=\left\{{{\begin{array}[]{ll}X_{eq}&\text{if }X_{% eq}\neq T\\ 0&\text{else}\\ \end{array}}}\right.$ (19)

Where $X_{eq}$ is the chosen element from the equilibrium pool, and $T$ is calculated as in Eq. (20).

$\displaystyle T=\lambda\ast X_{i}=\left\{{{\begin{array}[]{l}0\text{ if }% \lambda=0\vee X_{i}=0\\ {\lambda\vee X_{i}\text{ randomlyelse}}\\ \end{array}}}\right.$ (20)

Based on the modifications introduced above, the details of the proposed DEOA are presented in Algorithm 1.

6. Experimental design and results

6.1 Experimental design

The experimental setup is presented in terms of (1) The benchmark breast cancer datasets used in this study. We have chosen datasets from the UCI repository as presented in Table 1. (2) All numerical attributes of the chosen datasets have been discretized as explained in sub-section 6.1.2. (3) The measure metrics used for evaluating the experimented algorithms. (4) The parameter setting and evaluation of the proposed DEOA-AC, (5) the DEOA-AC comparison with benchmark algorithms.

Our proposed algorithm was developed and implemented in the Java programming language. All experiments of this study were run on a personal computer equipped with a processor, Intel Core i5 3.2 GHz, and a memory RAM of 8 GB.

6.1.1 Datasets

To test and evaluate the performance of the proposed DEOA-AC, we selected various well-known and recently used benchmark breast cancer datasets from the UCI repository [38] for our analysis, which includes Breast Cancer Dataset (BCDS), Wisconsin Original Breast Cancer Dataset (WOBC), Wisconsin Diagnostic Breast Cancer Dataset (WDBC), Wisconsin Prognostic Breast Cancer Dataset (WPBC), and Mammographic Mass Data Set (MMDS).

The WDBC and WOBC datasets are collected from the University of Wisconsin Hospitals, Madison. The samples in the WDBC dataset comprised of visually measured atomic features taken from the patient’s breast. Each instance represents FNA test measurements. WDBC does not contain missing values; the instances are divided into 212 malignant and 357 benign. The WOBC dataset includes 65,52% benign instances and 34.48% malignant instances. Sixteen instances include missing values in the feature “bare nuclei”.

The WPBC dataset was also extracted as digitized image of a fine needle aspirate (FNA) from the breast of 198 people and contains 151 benign instances and 47 malignant instances, in which four instances include missing data. It was collected from digital mammograms of patients between 2003 and 2006 at the Institute of Radiology of the University Erlangen-Nuremberg. MMDS contains 961 instances divided into 516 benign and 445 malignant. Seventy-six instances have missing values.

The instances containing missing values in all datasets were removed before the experiments since their number was very low, as presented in Table 1.

The dataset’s informations are presented in Table 1, including the number of attributes, number of instances, number of missing values, and number of classes for each dataset.

Table 1
Benchmark datasets used in the study

N ${}^{\circ}$	Datasets	# Attributes	# Instances	Missing values	# Classes
1	WDBC	32	569	0	2
2	WOBC	9	699	16	2
3	BCDS	9	286	9	2
4	BCWP	33	198	4	2
5	MMDS	5	961	76	2

Algorithm 2: Pseudo code of the Discrete Equilibrium Optimization Algorithm for Classification Rule Mining
Function DEOA (TS)
Input: training set (TS) for each class C, nb_agents, it_tmax, a1, a2
Output: classification rule CR
Initialize randomly the positions of all agents using Eq. (9)
for (it $=$ 1 to it_tmax)
Evaluate the fitness of all agents using Eq. (12)
Construct the equilibrium vector Ceq $=$ [Ceq0, Ceq ${}_{1}$ , Ceq ${}_{2}$ , Ceq ${}_{3}$ ] using Eq. (2)
// where Ceq ${}_{0}$ , Ceq ${}_{1}$ , Ceq ${}_{2}$ , Ceq ${}_{3}$ are the four best agents in the //population
Calculate Ceq ${}_{\text{avg}}$ using Eq. (3)
Ceq $=$ [Ceq0, Ceq ${}_{1}$ , Ceq ${}_{2}$ , Ceq ${}_{3}$ , Ceq ${}_{\text{avg}}$ ]
Calculate t using Eq. (5)
// Update the agent’s positions
for(i $=$ 1 to nb_agents)
Accomplish the memory sizing
Choose randomly one candidate from Ceq
Generate two discrete vectors r and ${\rm{\bf\lambda}}$ randomly
Calculate the vector F using Eq. (4)
Calculate G using Eq. (18)
Calculate $\frac{G}{{\rm{\bf\lambda}}}$ using Eq. (17)
Update the i ${}^{\text{th}}$ agent position using Eq. (13)
end for
end for
Return (Ceq ${}_{0}$ )
end

6.1.2 Data discretization

Most breast cancer datasets contain many continuous attributes that take real values, whereas the AC works only with discrete values. Then, the discretization of continuous values is necessary. In this work, the unsupervised discretization [39] called Entropy Minimization Heuristic (EMH) [40] is performed on all continuous attributes. The EMH is offered by the Weka tool [41], in which the range of a real-valued attribute is partitioned into $n$ intervals, such as $n$ is a tuned parameter of the EMH algorithm and called bins In order to analyze the effect of the parameter bins through the performance of the proposed DEOA-AC approach, we use several experiments by varying the value of the parameter bins. We set different numbers of bins from 2 to 10, and we examine the classification accuracy of the proposed approach on all used datasets, as shown in Table 2. The results are obtained with the following DEOA-AC parameters: Population size $=$ 10, Maximum number of iterations $=$ 10 a ${}_{1}$ $=$ 2, a ${}_{2}$ $=$ 1.

Table 2
Classification accuracy of the proposed approach obtained by varying the number of intervals of the discretization algorithm

Datasets	2	3	4	5	6	7	8	9	10
WDBC	91.84	95.15	96.49	92.19	92.98	79.12	79.73	75.43	71.49
BCDS	65.17	70.51	76.72	66.89	70.68	65.34	65.51	72.93	68.62
WOBC	85.14	55.42	58.07	49.71	48.99	51.14	43.14	40.14	35.85
BCWP	82.75	93.25	97.5	87.5	75.75	68.5	74.5	68.75	59.25
MMDS	66.08	46.28	35.41	41.95	32.21	18.81	20.15	20.77	13.81

It is clear that the accuracy changes with the discretization interval number in almost all datasets. However, it proves that the interval number plays an essential role in the discretization algorithm. Table 2 shows that the value 4 of the parameter bins in the discretization algorithm gets the highest classification accuracy on almost all datasets. Then, in all the following experiments, we fixed the value of the parameter bins at 4.

In this Section, several experiments are conducted to evaluate the proposed DEOA-AC approach’s effectiveness; the experimental results are presented in detail and discussed.

6.1.3 Evaluation metrics

In order to evaluate and compare the performance of our approach with the compared algorithms, we consider not only the classification accuracy (CA) but also the interpretability (or simplicity) of the obtained classification model [42]. In our work, the model’s interpretability is measured by two metrics [43]. First, the Classifier Size (CS) is evaluated by the number of rules in the obtained classification model. Second, the Average Rules Size (ARS) is evaluated by the average number of terms in all rules’ antecedent of the classification model.

The calculation formulas of the three metrics and the statistical standard deviation metric (Std) are presented in Eqs (21)–(24).

$\displaystyle\textit{ClassificationAccuracy}=\frac{\textit{CCI}}{TI}$ (21)

Where CCI is the number of correctly classified instances, $T I$ is the total number of instances in the test dataset.

$\displaystyle CS=|C|$ (22)

Where $C S$ is the classifier size, it measures the number of rules in the classifier. Let a classifier $C$ that includes $n$ rules; $|C|$ is the number of rules in the classifier $C$ .

$\displaystyle\textit{ARS}=\mathop{\sum}\limits_{i=1}^{n}\frac{1}{n}\textit{% Terms}({R_{i}})$ (23)

Where ARS measures the average number of terms in the antecedent of all classification rules. $n$ is the number of rules in the classifier, $R_{i}$ is the $i^{\text{th}}$ rule, and $\textit{Terms}(R_{i})$ is the number of terms that rule $R_{i}$ includes in its antecedent.

In addition to the three previous metrics, to compare the reliability of the generated models by different runs of the algorithms, statistical standard deviation (Std) is measured using the formula presented in the Eq. (24). It highlights that when Std is smaller, the algorithm always converges to the same solution.

$\displaystyle\textit{Std}=\sqrt{\frac{1}{M-1}\mathop{\sum}\nolimits({\textit{% Res}_{i}-AR})^{2}}$ (24)

Where $M$ is the number of runs of the algorithm tested, $\textit{Res}_{i}$ is the result obtained by the $i^{\text{th}}$ run of the algorithm, $A R$ is the average of results obtained with the $M$ runs of the algorithm, and $A R$ is calculated as in Eq. (25).

$\displaystyle AR=\frac{1}{M}\mathop{\sum}\limits_{i=1}^{M}BR_{i}$ (25)

Where $BR_{i}$ is the best result obtained from the $i^{\text{th}}$ run by the algorithm.

6.2 Experiments

In this Section, several experiments are conducted to evaluate the effectiveness of the proposed DEOA-AC approach.

6.2.1 DEOA-AC parameter setting

Like all meta-heuristic algorithms, DEOA-AC has four tuned parameters that must be set, including particle number ( $N$ ), max number of iterations (Max_iter), ${a_{1}}$ an ${a_{2}}$ . The best values of the two last parameters ${a_{1}}$ an ${a_{2}}$ , are inspired from the original EO algorithm version [5], $a_{1}$ equal to 2 and ${a_{2}}$ equal to 1. To analyze the two first parameters setting effect on the performance of DEOA-AC, we have used the F-Race [44] racing algorithm and conducted several experiments to find a better configuration of these parameters. To determine the best values for the two parameter $N$ and Max_iter, we test three different values (10, 50, and 100) for each parameter. So, there are nine combinations in total. DEOA-AC was run 20 times for each one of nine combinations of values. The average values of the fitness function presented in Eq. (12) for the 20 separate runs on each dataset were calculated and presented in Fig. 3.

Figure 3 shows that DEOA-AC finds maximum performance in almost all datasets when the number of particle $N$ is set to 10, and the number of iterations Max_iter is set to 10. So, these values are used in all the rest of this study.

Figure 3.

DEOA-AC’s performance effects of varying population size and maximum number of iterations parameters.

6.2.2 Comparison of DEOA-AC with existing methods

In this sub-section, our goal was to compare the performance of DEOA-AC with other well-known classification algorithms that generate rule-based classifiers. These algorithms adopt different rule mining mechanisms. The three classical and well-known algorithms, including C4.5 (Algorithm J48 in Weka) [45], OneR [46], and PART [47] provided by the Weka tool, have been chosen and run with the default values of their parameters. We also chose the two recently proposed rule-based classifier algorithms Artificial Immune System for Associative Classification (AISAC) [34] and Improved Random Forest-based Rule Extraction (IRFRE) [33]. The parameters setting for DEOA-AC, AISAC, and IRFRE are shown in Table 3. Our DEOA-AC algorithm was implemented in Java programming language, while the Weka data mining tool is used to run C4.5, OneR, and PART algorithms. Ten-fold cross-validation [48] has been used for all experiments, where a dataset is divided into ten partitions. Nine partitions are used for training, while the tenth partition is used for testing. Since DEOA-AC is a stochastic algorithm, it was run 20 independent executions, and the average results are considered. The results of the algorithms AISAC and IRFRE are inspired by his studies.

The comparison was carried out using the criteria presented in sub-section 6.1.3. Tables 4–7 summarize the obtained results.

Table 3
Parameter setting for DEOA-AC and tested algorithms

Algorithm	Parameter	Value
DEOA-AC	Number of particles	10
	Number of iterations	10
	a ${}_{1}$	2
	a ${}_{2}$	1
IRFRE	Population size	250
	Max depth	10
	Tree estimators of random forest	500
	Crossover’s probability	0.5
	Mutation’s probability	0.1
	Patience	5
AISAC	Number of antibodies	90
	Number of iterations	100

Table 4

Average accuracy and standard deviation of the rule-based classifier obtained when applying DEOA-AC and other algorithms

No	Datase	DEOA-AC			AISAC	IRFR		C4.5	OneR	PART
		A_Ac	B_Ac	Sd	A_Ac	A_Ac	Sd
1	WDBC	0.973	1.00	0.02	0.936	0.950	0.024	0.94	0.89	0.93
2	WOBC	0.970	1.00	0.48	0.974	0.964	0.037	0.91	0.90	0.92
3	BCDS	0.796	1.00	0.20	0.737	–	–	0.72	0.65	0.69
4	BCWP	0.975	1.00	1.80	0.792	–	–	0.74	0.76	0.73
5	MMDS	0.873	1.00	0.62	0.789	–	–	0.77	0.76	0.76

To evaluate the predictive ability of the proposed algorithm, 20 independent runs of DEOA-AC are performed for each dataset. The average accuracy (A_Ac) and the best accuracy (B_Ac) of 20 runs for the tested datasets are evaluated and presented in Table 4. The best values are highlighted in bold. The standard deviation (Sdt) is also evaluated and provided in the third column for each dataset to compare the reliability of the results obtained by the tested algorithms. The symbol ‘ $-$ ’ means that the result does not exist, and the dataset concerned was not tested.

For the two first datasets, WDBC and WOBC, the comparison results in our algorithm’s average accuracy show that the proposed DEOA-AC outperforms the two recent algorithms, AISAC and IRFRE, and the old algorithms C4.5, OneR, and PART. The best accuracy achieved by the proposed DEOA-AC is 100% on the two datasets.

For the three last datasets, BCDS, BCWP, and MMDS, the proposed DEOA-AC algorithm obtains the highest average accuracies and considerably outperforms the AISAC algorithm. For example, in the BCWP dataset, DEOA-AC obtains an accuracy of 97.5%, while AISAC obtains only 79.29%, which is an 18.21% improvement. In other words, the best accuracies obtained by DEOA-AC for all datasets are 100%.

We further evaluate the model interpretability and complexity using the classifier size and the rules’ average length in addition to predictive power evaluation.

We report in Table 5 the average number of rules generated by each algorithm over each dataset. Note that the column (A_Nr) refers to the average number of rules from 20 runs of the DEOA-AC algorithm, and (B_Nr) refers to the number of rules of the best solution in terms of A_Ac from 20 runs of the DEOA-AC algorithm. However, the results of the other algorithms are inspired by his study. The symbol “ $-$ ” means that the result is not obtained.

According to Table 5, the proposed DEOA-AC algorithm generates a lower number of rules than IRFRE for the WDBC dataset and significantly better for the WOBC dataset.

Even though IRFRE is slightly better in terms of accuracy for the WOBC dataset, the number of rules generated by the proposed algorithm is considerably better than IRFRE. The DEOA-AC algorithm obtained an average of 6.05 rules by the classifier. In comparison, IRFRE generates a classifier with 12.5 rules, confirming our initial objective to improve BCD accuracy while reducing the diagnosis interpretability.

For the length of rule antecedent, our algorithm is compared with the IRFRE algorithm, and Table 6 shows the results. IRFRE outperforms our algorithm slightly for the WDBC dataset, while our algorithm outperforms the IRFRE for the WOBC dataset.

This result proves that the proposed algorithm can generate accurate and interpretable classification models with modest complexity, which is very important in BCD.

Table 5

Average number of generated rules when applying DEOA-AC and other methods

No	Dataset	DEOA-AC			IRFR		C4.5	OneR	PART
		A_Nr	B_Nr	Sdt	A_Nr	Sdt
1	WDBC	9.95	10.00	2.620	12.90	4.346	4.00	2.00	4.00
2	WOBC	6.05	5.00	0.270	12.50	0.500	22.00	4.00	12.00
3	BCDS	6.8	4.00	2.800	–	–	4.00	13.00	12.00
4	BCWP	11.8	11.00	2.280	–	–	10.00	4.00	23.00
5	MMDS	3	4.00	0.910	–	–	18.00	8.00	11.00

Table 6

Antecedent average length for each rule obtained when applying DEOA-AC and other methods

No	Dataset	DEOA-AC			IRFR
		A_ARS	B_ARS	Sdt	A_ARS	Sdt
1	WDBC	2.56	2.30	0.160	2.169	0.229
2	WOBC	2.02	1.80	0.051	2.210	0.209
3	BCDS	1.25	0.39	0.260	–	–
4	BCWP	5.14	4.67	0.740	–	–
5	MMDS	1.43	1.50	0.410	–	–

Based on Eq. (24), the three first performance measures’ variance are calculated and provided by the column “Sdt” in Tables 4–6. DEOA-AC is better than IRFRE for WDBC and WOBC datasets, whether for the average accuracy variance or the interpretability variance in terms of the average rule number in the classifier and the average rule length.

In conclusion, DEOA-AC has extracted a few and short rules because of the rule representation and the equilibrium optimization algorithm’s power search. Besides, the proposed discrete operators can guide the DEOA-AC to find straightforward rules.

7. Classification rules obtained via DEOA-AC

In AC, the classifier comprises effective rules that are interpretable and comprehensible by users (doctors in the medical area). In our study, among 20 runs on tested datasets, WDBC, WOBC, BCDS, and BCWP, the run’s generated rules give the best performance (accuracy equal to 100%) are listed in Tables 7–10. Each rule is represented by the rank of the rule in the classifier (column 1), the antecedent of the rule (column 2), the class in the consequent of the rule (column 3), and the complexity of the rule evaluated by the number of nodes or terms in the rule (column 4).

All generated diagnosis models in Tables 7–10 diagnose all instances of the tenth-fold in the dataset that matches the table. As shown in Tables 7–10, the diagnosis models for the WDBC, WOBC, BCDS, and BCWP are composed of 10, 5, 4, and 11 rules, respectively. The average rule number is also around 2.7, 1.4, 1.5, and 1.8, respectively, easily readable and explainable by the doctors.

The results indicate that the accuracy is good, but the length of generated rules is short, and there are a small number of rules for each dataset, so the rules are easily comprehensible.

Table 7
Extracted rules of dataset WDBC

N ${}^{\circ}$	Extracted rule	Class	Rule length
1	compactness_mean $<$ 0.100885 & concavity_mean $<$ 0.1067 & concavity_se $<$ 0.099	Benign	3
2	radius_mean $<$ 12.26325 & area_mean $<$ 732.875 & area_se $<$ 140.6515 & fractal_dimension_se $<$ 0.008131 & radius_worst $<$ 14.9575 & fractal_dimension_worst $<$ 0.093155	Benign	6
3	smoothness_mean $<$ 0.080322	Benign	1
4	perimeter_se $<$ 6.06275	Benign	1
5	symmetry_worst $<$ 0.283325 & fractal_dimension_worst $<$ 6.06275	Malignant
6	symmetry_se $<$ 0.025649	Malignant	1
7	smoothness_se $<$ 0.009067 & compactness_se $<$ 0.035539 & radius_worst $>$ 14.9575 & radius_worst $<$ 21.985 & fractal_dimension_worst $<$ 6.06275	Malignant	5
8	radius_mean $>$ 17.5455 & radius_mean $<$ 22.82775 & perimeter_se $<$ 6.06275 & radius_worst $>$ 14.9575 & radius_worst $<$ 21.985 & fractal_dimension_worst $<$ 0.093155	Malignant	6
9	concave_points_worst $>$ 21825	Malignant	1
10	texture_se $<$ 1.4914 & concave_points_worst $>$ 0.1455 & concave_points_worst $<$ 0.21825	Malignant	3

Table 8

Extracted rules of dataset WOBC

N ${}^{\circ}$	Extracted rule	Class	Rule length
1	Cell_Size_Uniformity $<=$ 3.25 & Normal Nucleoli $<=$ 3.25	Malignant	2
2	Single_Epithelial_Cell_Size $>$ 7.75	Malignant	1
3	Clump Thickness $>$ 7.75 & Mitoses $>$ 5.5 & Mitoses $<=$ 7.75	Malignant	2
4	Mitoses $<=$ 3.25	Malignant	1
5	Cell_Size_Uniformity $<=$ 3.25	Benign	1

Table 9

Extracted rules of dataset BCDS

N ${}^{\circ}$	Extracted rule	Class	Rule length
1	irradiat $=$ no	recurrence_events	1
2	inve_nodes $>$ 0 & inve_nodes $<$ 2	no_recurrence_events	2
3	node_caps $=$ yes & deg_malin $=$ 2	no_recurrence_events	2

Table 10

Extracted rules of dataset BCWP

N ${}^{\circ}$	Extracted rule	Class	Rule nodes
1	texture_se $<=$ 0.092403 & smoothness_mean $<=$ 4.18475 & concavity_se $<=$ 0.003955	N	3
2	area_mean $<=$ 0.062048 & concavity_mean $<=$ 0.021014 & concave_points_worst $<=$ 1356.825	N	3
	& time $>$ 94
3	smoothness_worst $<=$ 0.009783 & concave_points_se $<=$ 121.875	N	3
4	texture_mean $<=$ 833.7	N	1
5	concavity_se $>$ 0.003955 & concavity_se $<=$ 0.006824	N	2
6	area_worst $<=$ 1.147325	N	1
7	smoothness_se $<=$ 89.4925	N	1
8	area_se $>$ 1.0064 & area_se $<=$ 1.4127	N	2
9	fractal_dimension_worst $<=$ 0.093155	R	1
10	texture_worst $>$ 0.112388 & texture_worst $<=$ 0.178725	R	2
11	time $<=$ 32	R	1

8. Conclusion

BCD is an essential way for early detection, helping doctors cure it in the early stage. Several BCD methods are proposed in the literature and generate accurate models. However, the model’s performance is usually evaluated only in accuracy, whereas the doctors need to interpret their decisions. This paper proposes a new approach for generating accurate and interpretable classifiers directly from breast cancer datasets based on a recent and efficient optimization algorithm. We propose a discrete version of the EOA called DEOAfor the classification rules mining in the Associative classifier construction process to address the two aforementioned evaluation criteria. The proposed DEOA is an important part of our approach because it directly affects the generated rules’ interpretability.

To evaluate the performance of our proposed approach, we have tested it against two recent AC algorithms and three classical rule-based classification algorithms running on five well-known breast cancer datasets taken from UCI.

The experimental results show that the proposed DEOA-AC algorithm outperforms all other compared algorithms and can significantly increase BCD performance in terms of accuracy and interpretability. The rules discovered by our algorithm are generally with higher accuracy and comprehensibility.

This new approach contributes to the AC for the disease diagnosis problem. It opens a way for future research in disease diagnosis based on new intelligent optimization algorithms. So, in the future, the proposed approach can be used for many other diseases’ interpretable diagnosis systems, such as Heart diseases, Diabetes, Parkinson’s disease, and Alzheimer’s disease. We will also integrate new optimization algorithms to improve the diagnosis models’ performance by increasing their accuracy and interpretability.

References

Majali

Niranjan

Phatak

and Tadakhe

, Data mining techniques for diagnosis and prognosis of cancer, International Journal of Advanced Research in Computer and Communication Engineering 4(3) (2015), 613–616.

Liu

E.S.

Gao

and Liu

G.Q.

, A novel intelligent classification model for breast cancer diagnosis, Information Processing & Management 56(3) (2019), 609–623.

Wang

Zheng

Yoon

S.W.

and Ko

H.S.

, A support vector machine-based ensemble algorithm for breast cancer diagnosis, European Journal of Operational Research 267(2) (2018), 687–699.

Papandrianos

Papageorgiou

Anagnostis

and Feleki

, A deep-learning approach for diagnosis of metastatic breast cancer in bones from whole-body scans, Applied Sciences 10(3) (2020), 997.

Faramarzi

Heidarinejad

Stephens

and Mirjalili

, Equilibrium optimizer: A novel optimization algorithm, Knowledge-Based Systems 191 (2020), 105190.

Basiri

Taghiyareh

and Faili

, RACER: Accurate and efficient classification based on rule aggregation approach, Neural Computing and Applications 31(3) (2019), 895–908.

Liu

Hsu

and Ma

, Integrating classification and association rule mining, KDD 98 (1998), 80–86.

Freitas

A.A.

Wieser

D.C.

and Apweiler

, On the importance of comprehensible classification models for protein function prediction, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(1) (2008), 172–182.

Guo

and Nandi

A.K.

, Breast cancer diagnosis using genetic programming generated feature, Pattern Recognition 39(5) (2006), 980–987.

10.

Karabatak

and Ince

M.C.

, An expert system for detection of breast cancer based on association rules and neural network, Expert systems with Applications 36(2) (2009), 3465–3469.

11.

Alkım

Gürbüz

and Kılıç

, A fast and adaptive automated disease diagnosis method with an innovative neural network mode, Neural Networks 33 (2012), 88–96.

12.

Zribi

and Boujelbene

, The neural networks with an incremental learning algorithm approach for mass classification in breast cancer, Biomedical Data Mining 5(118) (2016), 2.

13.

Aruleba

et al., Applications of computational methods in biomedical breast cancer imaging diagnostics: A review, Journal of Imaging 6(10) (2020), 105.

14.

Liu

H.X.

et al., Diagnosing breast cancer based on support vector machines, Journal of Chemical Information and Computer Sciences 43(3) (2003), 900–907.

15.

Kumar

and Rath

S.K.

, Classification of microarray using MapReduce based proximal support vector machine classifier, Knowledge-Based Systems 89 (2015), 584–602.

16.

Akay

M.F.

, Support vector machines combined with feature selection for breast cancer diagnosis, Expert Systems with Applications 36(2) (2009), 3240–3247.

17.

Chen

H.L.

Yang

Liu

and Liu

D.Y.

, A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis, Expert Systems with Applications 38(7) (2011), 9014–9022.

18.

Olfati

Zarabadipour

and Shoorehdeli

M.A.

, Feature subset selection and parameters optimization for support vector machine in breast cancer diagnosis, in: Iranian Conference on Intelligent Systems, 2014, February, pp. 1–6.

19.

Rakhlin

Shvets

Iglovikov

and Kalinin

A.A.

, Deep convolutional neural networks for breast cancer histology image analysis, in: International Conference Image Analysis and Recognition, Springer, Cham, 2018, pp. 737–744.

20.

Ting

F.F.

Tan

Y.J.

and Sim

K.S.

, Convolutional neural network improvement for breast cancer classification, Expert Systems with Applications 120 (2019), 103–115.

21.

Zou

et al., A technical review of convolutional neural network-based mammographic breast cancer diagnosis, Computational and mathematical methods in medicine, 2019.

22.

Khan

Islam

Jan

Din

I.U.

and Rodrigues

J.J.C.

, A novel deep learning based framework for the detection and classification of breast cancer using transfer learning, Pattern Recognition Letters 125 (2019), 1–6.

23.

Samala

R.K.

et al., Evolutionary pruning of transfer learned deep convolutional neural network for breast cancer diagnosis in digital breast tomo synthesis, Physics in Medicine & Biology 63(9) (2018), 095005.

24.

Feng

Zhang

and Mo

, Deep manifold preserving autoencoder for classifying breast cancer histopathological images, IEEE/ACM Transactions on Computational Biology and Bioinformatics 17(1) (2018), 91–101.

25.

Abdel-Zaher

A.M.

and Eldeib

A.M.

, Breast cancer classification using deep belief networks, Expert Systems with Applications 46 (2016), 139–144.

26.

Antropova

Huynh

and Giger

, Recurrent neural networks for breast lesion classification based on DCE-MRIs, in: Medical Imaging 2018: Computer-Aided Diagnosis, International Society for Optics and Photonics, vol. 10575, 2018, pp. 593–598.

27.

Han

Kamber

and Pei

, Data mining concepts and techniques third edition, The Morgan Kaufmann Series in Data Management Systems, 2011, 83–124.

28.

Hegland

, The apriori algorithm – a tutorial, in: Mathematics and Computation in Imaging Science and Information Processing, 2007, pp. 209–262.

29.

Han

and Pei

, CMAR: Accurate and efficient classification based onmultiple class-association rules, in: Data Mining, in: ICDM 2001, ProceedingsIEEE International Conference, IEEE, 2001, pp. 369–376.

30.

Thabtah

Cowling

and Peng

, MCAR: multi-class classification based on association rule, in: The 3rd ACS/IEEE International Conference on Computer Systems and Applications, IEEE, 2005, p. 33.

31.

Hadi

W.E.

Aburub

and Alhawari

, A new fast associative classificationalgorithm for detecting phishing websites, Applied. Soft Computung 48 (2016), 729–734.

32.

Alwidian

Hammo

B.H.

and Obeid

, WCBA: Weighted classification based on association rules algorithm for breast cancer disease, Applied Soft Computing 62 (2018), 536–549.

33.

Wang

et al., An improved random forest-based rule extraction method for breast cancer diagnosis, Applied Soft Computing 86 (2020), 105941.

34.

González-Pati no

et al., AISAC: An artificial immune system for associative classification applied to breast cancer detection, Applied Sciences 10(2) (2020), 515.

35.

Watkins

A.B.

and Boggess

L.C.

, A resource limited artificial immune classifier, in: Proceedings of the 2002 Congress on Evolutionary Computation, CEC’02 (Cat. No. 02TH8600), IEEE, vol. 1, 2002, pp. 926–931.

36.

Letham

Rudin

McCormick

T.H.

and Madigan

, Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model, The Annals of Applied Statistics 9(3) (2015), 1350–1371.

37.

Han

and Kamber

, Data mining: concepts and techniques, Morgan Kaufmann Publishers Inc, San Francisco, CA, 2000.

38.

Lichman

, UCI Machine Learning Repository, 2013. Available online: http://archive.ics.uci.edu/ml (accessed on 25 October 2020).

39.

Dougherty

Kohavi

and Sahami

, Supervised and unsupervised discretization of continuous features, in: Machine Learning: Proceedings of the Twelfth International Conference, vol. 12, 1995, pp. 194–202.

40.

Fayyad

and Irani

, Multi-interval discretization of continuous valued attributes for classification learning, in: Thirteenth International Joint Conference on Artificial Intelligence, 1993, pp. 1022–1027.

41.

Holmes

Donkin

and Witten

I.H.

, Weka: a machine learning workbench, in: Proceedings of the 1994 Second Ustralian and New Zealand Conference on Intelligent Information Systems, IEEE, 1994, pp. 357–361.

42.

Huysmans

et al., An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models, Decision Support Systems 51(1) (2011), 141–154.

43.

Daud

N.R.

and Corne

D.W.

, Human readable rule induction in medical data mining, in: Proceedings of the European Computing Conference, Springer, Boston, MA, 2009, pp. 787–798.

44.

Birattari

Stützle

Paquete

and Varrentrapp

, A Racing Algorithm for Configuring Metaheuristics, in: Gecco, 2, 2002, July.

45.

Quinlan

J.R.

, Improved use of continuous attributes in C4.5, Journal of Artificial Intelligence Research 4 (1996), 77–90.

46.

Holte

R.C.

, Very simple classification rules perform well on most commonly used datasets, Machine Learning 11(1) (1993), 63–90.

47.

Witten