Multi-objective techniques for feature selection and classification in digital mammography

Abstract

Feature selection is a crucial stage in the design of a computer-aided classification system for breast cancer diagnosis. The main objective of the proposed research design is to discover the use of multi-objective particle swarm optimization (MOPSO) and Nondominated sorting genetic algorithm-III (NSGA-III) for feature selection in digital mammography. The Pareto-optimal fronts generated by MOPSO and NSGA-III for two conflicting objective functions are used to select optimal features. An artificial neural network (ANN) is used to compute the fitness of objective functions. The importance of features selected by MOPSO and NSGA-III are assessed using artificial neural networks. The experimental results show that MOPSO based optimization is superior to NSGA-III. MOPSO achieves high accuracy with a 55% feature reduction. MOPSO based feature selection and classification deliver an efficiency of 97.54% with 98.22% sensitivity, 96.82% specificity, 0.9508 Cohen’s kappa coefficient, and area under curve $A_{Z}=$ 0.983 $\pm$ 0.003.

Keywords

Multi-objective particle swarm optimization nondominated sorting genetic algorithm-III artificial neural network feature selection mammography classification

1. Introduction

Breast cancer is one of the leading causes of death in women’s all over the world. At Present, no one has discovered any technique for the avoidance of breast disease, so recognition of malignant growth in its initial stage is significant. Mammography is perhaps the best tool available for the discovery of breast malignancy in its primary stage. The radiologist analyses breast malignant growth by examining mammograms, although studying of mammograms is a difficult task. So a tissue is removed for clinical assessment utilizing a biopsy method for testing the presence or absence of cancer. The available statistics show that 10%–30% of cancerous/noncancerous lesions misinterpreted by the radiologist [1]. The problem of misinterpretation could be minimized with the use of computer-aided diagnosis (CADx) systems. These systems act as a second pursuer for the radiologist to improve the diagnosis and minimize the death rate. The design and development of such a classification system require relevant features for the training. These features are obtained from mammograms. The Features extracted from mammograms can be irrelevant or exceptionally correlated with each other, so feature selection methods are used to obtain relevant features with high discriminatory power [2, 3]. The studies show that the predictive ability of the classification system highly depends on feature selection. So, to improve the predictive power of the classifier the proposed methodology uses Multi-objective particle swarm optimization (MOPSO) and Nondominated sorting genetic algorithm-III (NSGA-III) for feature selection. An ANN is used as a classifier to classify the masses.

The remainder of the paper composed of: a review of related work is presented in Section 2. The proposed research methodology in Section 3. Results and discussions presented in Section 4, comparative analysis of proposed work in Section 5, and the conclusion of work in Section 6.

2. Related work

In recent years multi-objective approaches are extensively studied for feature selection and classification of various medical datasets. A brief overview of these techniques used in some recent applications presented below.

Habib et al. [4] proposed a Multi-objective particle swarm optimization (MOPSO) technique for feature selection. The methodology was tested on several medical datasets. The results show that MOPSO is better than both Multi-objective Nondominated sorting algorithm-II (NSGA-II) and Multi-objective evolutionary algorithm based on decomposition (MOEA/D). A survey on recently proposed machine learning algorithms for data processing, classification, clustering, and association rules are presented in Alexandropoulos et al. [5]. The hybrid optimization method for feature selection based on Salp swarm (SA) with particle swarm optimization (PSO) is presented in Ibrahim et al. [6]. The performance of the method tested over two experimental series. In the first, it is compared with similar approaches using benchmarking functions, and in the second-best set of features are evaluated using different UCI databases.

Manohar et al. [7] proposed a binary PSO based feature selection method with a fuzzy support vector machine (FSVM) for the diagnosis of schizophrenia disorder in MR brain images. The proposed method assessed on the database of the National Alliance for Medical Image Computing (NAMIC). The authors prove that BPSO-FSVM is superior to BPSO-SVM.

Agarwalla et al. [8] present a feature selection method for the cancer database. The authors use Multi-objective MOBPSO with a blended Laplacian operator to avoid the problem of local trapping. Results show that the proposed methodology selects relevant features (genes) that are useful for the classification of cancer.

Amoozegar et al. [9] proposed a feature selection method. It uses Multi-objective PSO with feature elitism. In this method features in the archive set are first ranked based on their frequencies. Then these feature ranks are used to guide the particles and refine the archive set. Zhang et al. [10] proposed a multi-objective particle swarm optimization method for cost-based feature selection in classification. An objective of the proposed method is to generate Pareto front of Nondominated solutions. The searching capability of the proposed method is improved by incorporating probability-based encoding, hybrid operator, and external archive along with crowding distance and Pareto domination relationship to PSO.

Taghanaki et al. [11] proposed a Pareto-optimal Multi-objective dimensionality reduction deep auto-encoder for mammography classification. The results of the proposed methodology show that Pareto optimal Multi-objective optimization generates more discriminative features that will improve classifier performance than traditional auto-encoders.

Dioşan et al. [12] proposed a multi-objective approach for breast cancer classification using multi-expression programming. The proposed method uses image features like a histogram of oriented gradients and kernel descriptors for breast cancer classification.

Xue et al. [13] proposed a Multi-objective PSO approach for feature selection. The proposed method investigates two Multi-objective PSO methods; first based on Nondominated sorting and second based on the idea of crowding, mutation, and dominance. These methods compared with two conventional feature selection methods and three evolutionary multi-objective algorithms on 12 benchmark datasets. Results show that the second method is superior to all the algorithms.

Hamdan et al. [14] proposed a feature selection method based on Nondominated sorting genetic algorithm-II (NSGA-II). The technique uses two objective functions, the number of features and classification error for finding feature sets. The method is tested on five UCI databases.

Annavarapu et al. [15] proposed a MOBPSO based method feature selection in cancer microarray data. The scheme uses two objective functions cardinality of feature subsets and unique features of those selected subsets. The proposed methodology tested on benchmark gene expression datasets.

Figure 1.

Proposed methodology.

Zhu et al. [16] proposed a multi-objective scheme using NSGAIII for feature selection in the intrusion detection system. The author further suggested an improved NSGAIII (I-NSGAIII) scheme based on niche preservation procedures. Results show that the I-NSGAIII achieves both higher classification accuracy and minimal computational complexity. Sohrabi and Tajik [17] proposed a Multi-objective feature selection scheme for warfarin dose prediction. The method uses NSGAII and MOPSO with an artificial neural network (ANN). Results show that MOPSO is better than NSGAII. Jain and Deb [18] proposed a multi-objective scheme to solve constrained problems. The author extended NSGA-III and MOEA/D to solve generic constrained multi-objective problems. In earlier work author used these algorithms to solve unconstrained (with box limited alone) problems [19]. Outcomes of constrained NSGA-III and constrained MOEA/D show an edge of the previous especially in tackling issues having countless objectives.

Raj et al. [20] presented a Deep Learning model for feature selection and classification of medical images. The proposed technique is applied for the classification of lung cancer, brain image, and Alzheimer’s disease. An Opposition-based Crow Search (OCS) algorithm was used to reduce the feature space.

Gupta and Ahlawat [21] proposed a modified Binary Bat algorithm for feature selection. The performance of the proposed algorithm is compared with the original Binary Bat algorithm. The authors claim that the modified binary Bat algorithm is better than the original Binary Bat algorithm.

Arora et al. [22] proposed an ensemble-based method for feature selection. The method is evaluated using ten publicly available datasets and compared its performance with Decision Tree, Logistic Regression, K-nearest neighbours (KNN), and Random Forest classifiers.

Gupta et al. [23] presented a modified crow search algorithm for selecting the relevant features that are useful for the development of any software. The authors conclude that the proposed algorithm is better than existing algorithms.

Dash and Misra [24] proposed a Pareto differential evaluation technique for feature selection and classification in microarray data. Optimal features are selected with the use of a wrapper based differential evaluation technique. The significance of features selected is tested with the use of ANN, naïve Bayesian, SVM, and KNN classifiers. In recent years [25, 26, 27, 28, 29, 30] proposed methods for feature selection and classification in medical datasets.

When reviving the literature, we found that researchers implemented NSGA-III for solving problems other than digital mammography, and very few applied MOPSO for mammography. The promising findings of MOPSO and NSGA-III in other fields inspire us to use these algorithms to solve the mammography problem.

3. Proposed methodology

The proposed research methodology comprises two main stages. In the first stage, we utilized MOPSO and NSGA-III as feature selection algorithms. In the second stage, features were chosen by MOPSO, and NSGA-III is used to characterize suspicious lesion into benign and malignant using ANN. A review of the proposed procedure is as presented in Fig. 1.

3.1 Feature selection

In the proposed methodology, mammogram images first segmented using morphological and watershed transformations to obtain the region of interest (ROI). Each ROI represents a segmented mass. Then 25 features based on intensity (six), texture (eleven), and shape (eight) are obtained from each ROI. The six (F1–F6) intensity-based features are average gray level, average contrast, smoothness, skewness, uniformity, and entropy1. The eleven texture-based features obtained from the gray-level co-occurrence matrix are: Energy, entropy2, contrast, mean, std. deviation, variance, correlation, homogeneity, sum average, sum variance and sum entropy. Similarly, the eight shape-based features obtained from the shape of ROI are: area, perimeter, compactness, normalized std. deviation, area ratio, contour roughness, normalized residual value, and the overlapping ratio [31].

Feature selection is an essential step of any classification system that eliminates irrelevant, redundant, or noisy features. A feature set with a large number of an immaterial or unnecessary feature may degrade the prediction power of the classifier. That is why optimal feature selection is essential. It helps to minimize the computational overheads and improve the predictive power of the classifier [32, 33]. The detailed description of objective functions and feature selection using two methods namely MOPSO and NSGA-III presented in the following sections.

3.1.1 Multi-objective functions

Problems that involve two or more contradictory objectives are called multi-objective optimization problems. These problems have many solutions, called Pareto-optimal solutions. These Pareto-optimal solutions are Nondominated solutions. The multi-objective problem be formulated as-

Suppose, decision variables are represented by a vector $X$ and objective functions by $f_{i}(X)$ where $i=1,2,3,\ldots,N$ then,the objective functions to be Minimize $\{f_{1}(X),f_{2}(X),\ldots,f_{N}(X)\}$ by obtaining Pareto-optimal fronts.

As discussed above, the objective of feature selection is to reduce the dimensionality of feature space that will increase the predictive power of classifiers as compared to the use of all features. In the proposed methodology, two conflicting objectives, Ratio of Features selected (RF) and classifier performance (MSE), are used. Both the objective function should be minimized. An ANN is utilizing to compute the fitness of objective functions.

The first objective function RF computed as shown in Eq. (1) below-

$\displaystyle\textit{RF}=\textit{NF}/S$ (1)

where NF is the number of features selected by the classification algorithm, and S is the total number of features.

The second objective function, Mean Squared Error (MSE) is defined in Eq. (2) as shown below-

$\displaystyle\textit{MSE}=\frac{1}{n}\sum^{n}_{i=1}(T_{i}-O_{i})^{2}$ (2)

where $O_{i}$ is the predicted value and $T_{i}$ is the target value.

3.1.2 Multi-objective particle swarm optimization (MOPSO)

Particle swarm optimization (PSO) is a widely used bio-inspired algorithm developed by Eberhart and Kennedy in 1995 [34]. It is an optimization technique based on the social behavior of birds flock. In this method, each possible solution to the problem is represented as a particle moving in the search space, and the location of the particle represents the optimal solution. During each iteration of the algorithm, the adaptable velocity vector used to modify the populations of each particle as well as to favor the best-performing individual.

The MOPSO algorithm proposed by Coello and Lechunga [35], based on the idea of the repository. The particles use the pool to deposit their flight experience, after each generation in the repository. The repository is used by the particle to identify a leader who will guide the search. The fundamental objective of MOPSO is to solve multi-objective problems where the goal is to select Pareto optimal solutions. The PSO and MOPSO update particle velocity and position in the same way. The velocity and location of the $i^{\text{th}}$ particle at $t^{\text{th}}$ iteration is updated using Eqs (3) and (4) as given below:

$\displaystyle v^{i}(t)=w\times v^{i}(t-1)+c_{1}\times\text{rand}_{1}\times(p^{% i}_{\text{best}}-x^{i})+c_{2}\times\text{rand}_{2}\times(g^{i}_{\text{best}}-x% ^{i})$ (3) $\displaystyle x^{i}(t)=x^{i}(t-1)+v^{i}(t)$ (4)

In the above equations, the velocity of the particle represented by ‘ $v$ ’ and inertia weight by $w$ . $c_{1}$ and $c_{2}$ represents the cognitive and social learning vectors. rand ${}_{1}$ and rand ${}_{2}$ represent the two independent random variables in the range $[0,1]$ . The solution vector is given by $x$ while personal best and global bests for the entire population represented by ${p}_{\text{best}}$ and ${g}_{\text{best}}$ , respectively. The MOPSO for feature selection presented as in Algorithm 3.1.2.

MOPSO Algorithm

Load dataset# It consists of 651 samples.

Initialize MOPSO parameters # number of generations # population size # number of decision variables # local and global learning paramters # Repository size # Mutation parameter

Define objective functions: $f_{1}(x)$ and $f_{2}(x)$

Create initial population ( $K=1,2,\ldots$ , Pop_size)

Compute the fitness of initial population for two objective functions

Determine personal best and domination from initial population

Forfordoend loop # iteration

iteration $=$ 1 to Max_it

Forfordoend loop # K

$K=1$ to Pop_size

Select the leader from repository

Update velocity of $i^{\text{th}}$ particle with the use of Eq. (1)

Update the position of the particle with the use of Eq. (2)

Maintain the particles within the search space

Evaluate the fitness of particles in the population

Updated the Personal_best Pareto dominance

Update the repository with all nondominated solutions

Remove any dominated member from the repository

Generate the Pareto optimal front

Select the optimal features

NSGA-III Algorithm

Load dataset #It consist of 651 samples.

Initialize NSGA-III parameters # Number of generation # Population size # Crossover probability # Mutation parameter

Define objective functions: $f_{1}(x)$ and $f_{2}(x)$

Create the initial population ( $K=1,2,\ldots$ , Pop_size)

Compute the fitness of initial population using two objective functions

Sort and select the population using nondominated sorting with reference point

Forfordoend loop # iteration

iteration $=$ 1 to Max_it Perform crossover operation and evaluate the fitness of generated population.

Perform mutation operation and evaluate the fitness of the generated population.

Merge the current population with the population generated by crossover and mutation

Sort and select the population using nondominated sorting with a reference point

Generate the Pareto optimal front

Select the optimal features

3.1.3 Nondominated sorting genetic algorithm-III (NSGA-III)

The multi-objective NSGA-III is proposed by Deb and Jain in 2013 [36]. The basic structure of NSGA-III is analogous to the original NSGA-II [37] with a substantial change in the selection operator. In NSGA-III, the following approaches are used instead of crowding distance operators.

•
NSGA-III uses the same domination princple [38] as used by NSGA-II for the identification of nondominated fronts.
•
The Multi-objective NSGA-III ensures diversity in selected solutions with the use of predefined reference points. These reference points are widely distributed over normalized hyperplane; also, the selected solutions are widely distributed on or near the Pareto optimal front.
•
In NSGA-III each population member is adaptively normalized.
•
Once the population is normalized, each member of the population is associated with a referencing point.
•
NSGA-III uses a method to protect the niches. The niche preservation strategy aims to identify one entity with one reference point.
•
After this process, the new population is generated by performing crossover and mutation operations. The Multi-objective NSGA-III for feature selection is presented in Algorithm 3.1.2.

3.2 Artificial neural network (ANN)

The proposed methodology uses an artificial neural network for the classification of mammograms. The artificial neural net based on the neural structure of the brain. It consists of highly interconnected nodes called neurons with an ability to learn and adopt. This helps the network to address a large number of problems. In general, supervised and unsupervised training methods utilized for training the system.

In this article, a Multilayer Perceptron (MLP) is used for the classification of breast lesions into benign and malignant. MLP is also called as feed-forward neural network because in this network, data flows only in the forward direction. It generally consists of three layers: input, hidden and output [39]. The neuron at each layer is connected to every other neuron at the next layer via a weighted link. The network trained using a pair of input-output samples. The training of the system halted when the MSE values quit diminishing and when there is an increase in the MSE values, which means that over-training [40, 41]. Once the ANN trained, its performance evaluated with the help of unseen data.

Figure 2.

Pareto-optimal fronts of MOPSO and NSGA-III.

4. Results and discussion

A total of 651 (337 malignant and 314 benign) mammogram images are collected from the digital database for screening mammography (DDSM) and employ to determine the efficiency of the suggested methods [42, 43].

4.1 Performance evaluation of mopso and NSGA-III

As examined in Sections 3.1.2 and 3.1.3, MOPSO, and NSGA-III are used to choose the best features from a set of 25 features extracted from each segmented ROI. The parameter’s value used by the selection algorithms is as shown in Table 1.

Table 1
MOPSO and NSGA-III parameters

MOPSO parameters
Feature variables	25
Inertia weight	0.5
Population size	25
Personal learning coefficient	2
Maximum generations	25/30/40/50/100
Global learning coefficient	2
Repository size	25
Mutation rate	0.1
NSGA-III parameters
Feature variables	25
Maximum generations	25/30/40/50/100
Crossover probability	0.8
Mutation rate	0.02
Population size	25

The Selection of optimal features by both MOPSO and NSGA-III depends on the minimization of two objective functions; the ratio of features selected (RF), and the mean squared error (MSE). The Pareto optimal fronts generated by MOPSO and NSGA-III for the iterations; 25, 30, 40, 50 and 100 are as shown in Fig. 2.

The subset of optimal features selected by MOPSO from Pareto-optimal fronts based on two objective functions shown in Table 2. One can observe from Table 2; MOPSO selects five feature sets (F1 to F5). The feature set F1 appear to be best with 11 features. These features are selected based on two objective values MSE and RF as 0.154183 and 0.44. Similarly, the subset of features selected by NSGA-III from Pareto-optimal fronts based on two objective functions shown in Table 3. From Table 3; one can see that NSGA-III select five feature sets (F1 to F5). The feature set F1 appear to be best with 12 features. These features are selected based on two objective values MSE and RF as 0.149308 and 0.48.

Table 2

Features selected by MOPSO method

Input data size (features $\times$ no. of cases)	Number of iterations	Population size	Feature set	No. of feature selected	Features selected	MSE	RF
25 $\times$ 651	25	25	MOPSO-F1	11	3, 6, 8, 9, 10, 14, 15, 20, 22, 23, 25	0.154183	0.44
25 $\times$ 651	30	25	MOPSO-F2	10	1, 4, 6, 8, 12, 16, 18, 20, 23, 24	0.155373	0.40
	40	25	MOPSO-F3	9	1, 3, 4, 6, 10, 17, 20, 21, 25	0.165541	0.36
	50	25	MOPSO-F4	9	4, 6, 7, 8, 11, 13, 17, 20, 23	0.163486	0.36
	100	25	MOPSO-F5	11	4, 6, 8, 9, 13, 15, 16, 17, 20, 22, 23	0.15624	0.44

Table 3

Features selected by NSGA-III method

Input data size (features $\times$ no. of cases)	Number of iterations	Population size	Feature set	No. of feature selected	Features selected	MSE	RF
25 $\times$ 651	25	25	NSGA-F1	12	4, 6, 7, 9, 12, 13, 17, 19, 20, 22, 23, 24	0.149308	0.48
25 $\times$ 651	30	25	NSGA-F2	12	5, 6, 11, 12, 13, 14, 17, 18, 20, 22, 23, 24	0.151512	0.48
	40	25	NSGA-F3	9	6, 8, 12, 14, 16, 18, 19, 21, 23	0.157453	0.36
	50	25	NSGA-F4	11	1, 3, 5, 6, 8, 11, 18, 19, 20, 21, 24	0.152759	0.44
	100	25	NSGA-F5	11	1, 6, 8, 11, 12, 14, 18, 19, 20, 21, 22	0.154066	0.44

The accuracy of features selected by both MOPSO and NSGA-III algorithms is shown in Fig. 3. From Fig. 3 we can observe that the features selected by MOPSO achieve a classification accuracy of 97.54% for feature set-1, 96.46% for feature set-2, 96.13% for feature set-3, 96.15% for feature set-4 and 96.77% for the feature set-5 respectively. Similarly, features selected by NSGA-III achieve a classification accuracy of 97.08% for feature set-1, 96% for feature set-2, 96.31% for feature set-3, 96% for feature set-4 and 95.85% for the feature set -5 respectively.

From, the above discussion we can observe that, feature set F1 appear to be the best-performing feature set for both MOPSO and NSGA-III. It accomplishes the most exceptional classification accuracy (or minimum error) while keeping the most modest number of features. Although feature set F1 of MOPSO appear to be best when compared with feature set F1 of NSGA-III. The feature set F1 selected by MOPSO removes 55% of irrelevant features. It reduces the dimensionality of original feature set from 25 features to 11 features. The features of F1 selected by MOPSO and NSGA-III that contribute to achieving high performance is as shown in Tables 4 and 5 respectively.

Table 4

Features of MOPSO-F1

Smoothness, entropy1, entropy2, contrast, mean, homogeneity, sum average, compactness, area ratio, contour roughness, overlapping ratio.

Table 5

Features of NSGA-III-F1

Skewness, entropy1, energy, contrast, variance, correlation, sum entropy, perimeter, compactness, area ratio, contour roughness, normalized residual value.

Figure 3.

Accuracy of features selected by MOPSO and NSGA-III.

Figure 4.

Frequency of features selected.

The frequency of features selected by MOPSO and NSGA-III for all iterations is as shown in Fig. 4. One can observe from Fig. 4, 6 ${}^{\text{th}}$ feature (entropy1) is selected 100% times, 20 ${}^{\text{th}}$ feature (compactness) selected 90% times, 23 ${}^{rd}$ feature (contour roughness) selected 70% times while 4 ${}^{\text{th}}$ (skewness), 8 ${}^{\text{th}}$ (entropy2) and 17 ${}^{\text{th}}$ (sum entropy) features are selected 60% time. The features have significant impact on classifiers performance than all than other features.

The importance and relevance of features selected by MOPSO and NSGA-III is evaluated using the ANN classifier, discussed in the following section.

4.2 Performance evaluation parameters

The performance of the classifier is evaluated using some statistical parameters, as shown in Eqs (5) to (11) below:

$\displaystyle\text{Accuracy }=(\textit{TP}+\textit{TN})/(\textit{TP}+\textit{% TN}+\textit{FP}+\textit{FN})$ (5) $\displaystyle\text{Sensitivity}=\textit{TP}/(\textit{TP}+\textit{FN})$ (6) $\displaystyle\text{Specificity}=\textit{TN}/(\textit{TN}+\textit{FP})$ (7) $\displaystyle\text{Type-I error}=\textit{FP}/(\textit{FP}+\textit{TN})$ (8) $\displaystyle\text{Type-II error}=\textit{FN}/(\textit{FN}+\textit{TP})$ (9) $\displaystyle\text{F-measure}=2\textit{TP}/(2\textit{TP}+\textit{FP}+\textit{% FN})$ (10) $\displaystyle\text{Cohen's kappa coefficie nt}={P_{O}\!-\!P_{e}}/(1\!-\!P_{e})$ (11)

where, $P_{\textit{YES}}=((\textit{TP}+\textit{FN})/(\textit{TP}+\textit{TN}+\textit{% FP}+\textit{FN}))*((\textit{TP}+\textit{FP})/(\textit{TP}+\textit{TN}+\textit{% FP}+\textit{FN}));$ $P_{\textit{NO}}=((\textit{FP}+\textit{TN})/(\textit{TP}+\textit{TN}+\textit{FP% }+\textit{FN}))*((\textit{FN}+\textit{TN})/(\textit{TP}+\textit{TN}+\textit{FP% }+\textit{FN}));$ $P_{0}=(\textit{TP}+\textit{TN})/(\textit{TP}+\textit{TN}+\textit{FP}+\textit{% FN})$ ; and $P_{e}=P_{\textit{YES}}+P_{\textit{NO}};$ TP: True Positive, TN: True Negative, FP: False Positive, FN: False Negative

4.2.1 Performance of the classifier

The feature sets, MOPSO-F1 and NSGA-F1, are utilized for training and testing of artificial neural network classifiers. An input data consist of 651 samples, partitioned as 70% samples utilize to train the network and the remaining 30% utilize to test the network. The proposed feed-forward neural network comprises three layers namely input layers, hidden layer, and output layers. The number of nodes at input layers varies with input data (feature variables). The proposed method uses ten hidden nodes and two output nodes (for the benign and malignant category). The Bayesian regularization strategy that updates the weight and bias values, as indicated by the Levenberg-Marquardt technique, is used to train the ANN. The network trained for 1000 epochs. The performance of the ANN evaluated against various statistical parameters is as shown in Fig. 5.

Figure 5.

Performance of ANN.

One can see from Fig. 5, for a whole dataset containing 25 features, ANN achieves an accuracy of 92.31% with 93.47% sensitivity and 91.08% specificity, and for MOPSO feature set, ANN accomplishes the highest accuracy of 97.54% with 97.92% sensitivity and 96.18% specificity. For NSGA-III feature set, ANN achieves an accuracy of 96.18% with 97.92% sensitivity and 96.18% specificity.

The comparative analysis of type-I and type-II errors is as shown in Fig. 6. From Fig. 6, one can see that the type-I error (FPR) or false alarm for the entire dataset, MOPSO and NSGA-III is 8.91%, 3.18%, and 3.82% respectively while type-II error (FNR) for the entire dataset, MOPSO, and NSGA-III is 6.52%, 1.78%, and 2.07% respectively.

Table 6

The area under the curve for MOPSO, NSGA-III and the entire dataset

Method	Area	Error	95% C. I.
			LB	UB
MOPSO	0.983	0.005	0.972	0.993
NSGAIII	0.981	0.005	0.971	0.992
Entire Dataset	0.964	0.007	0.950	0.978

Figure 6.

Type-I and Type-II error.

Figure 7.

Comparison of F-Score and Cohen’s Kappa coefficient.

Table 7

Comparison with performance reported in related work

Authors	Feature selection & classification Method	Database used	Accuracy (%)
Habib et al. [4]	MOPSO	WBCD	91.90
	NSGA-II	(Breast cancer)	90.40
Manohar et al. [7]	BPSO-SVM	NAMIC (MR brain)	90.00
Taghanaki et al. [11]	NSGAII-KNN	INbreast, DDSM	96.21
	NSGAII- Naïve Bayes		98.45
Proposed	MOPSO	DDSM	97.54
	NSGA-III		97.08

So, as far as accuracy, sensitivity, specificity, type-I, and type-II errors are concerned MOPSO feature set is outperformed both the entire dataset and NSGA-III. MOPSO improves the efficiency of ANN by 5.23% over whole dataset.

To assess the model performance more rigorously the two important parameters; F-score and Cohen’s kappa coefficient are used. The results for both the parameters are as shown in Fig. 7. The parameter F-score is used to measure the test accuracy of the binary classification. It is the harmonic mean of precision and recall. From Fig. 7, one can observe that F-score for the entire dataset, MOPSO, and NSGA-III features are 0.9262, 0.9765, and 0.9720 respectively. Results produce by MOPSO is better than both the entire dataset and NSGA-III features.

One more vital parameter to evaluate the model performance is Cohen’s kappa coefficient [44] as shown in Fig. 7. It was adopted as a test of psychological behavioral consensus between the observers. Cohen’s Kappa first aim was to measure the degree of agreement or disagreement between at least two individuals that observed a similar phenomenon. It can take values from 0 to 1. If its value is $\leqslant$ 0, then there is no agreement between two raters. If its value is equal to 1, then there is a complete agreement between two raters. The better performing classifiers must have higher values of Cohen’s kappa coefficient. From Fig. 7, one can observe that the kappa coefficient for the entire dataset, MOPSO and NSGA-III is 0.8461, 0.9508 and 0.9415 respectively. These results show that MOPSO-ANN is the best classification model. It achieves high accuracy with minimal features.

Figure 8.

Comparison of ROC curves of the entire dataset, MOPSO and NSGA-III.

4.3 Performance analysis for Area under ROC curve

The receiver operating characteristics (ROC) curve used to determine the accurate accuracy of medical diagnostic tests. It is a plot between Sensitivity and 1-Specificity. The values of the curve lie in the range of ‘0’ and ‘1’. The predictive model is said to be 100% accurate if the area under the curve is equal to ‘1’ [45]. The area under the ROC curve is measured using high performing feature sets of MOPSO, NSGA-III and the entire dataset. The ROC curve for MOPSO, NSGAIII and the entire dataset is as shown in Fig. 8, and the area under the curve as shown in Table 6.

One can see from Table 6, an area under the ROC curve with 95% Confidence Interval (C.I.) for MOPSO, NSGA-III and the entire dataset is $A_{Z}=$ 0.983 $\pm$ 0.005, $A_{Z}=$ 0.981 $\pm$ 0.005 and $A_{Z}=$ 0.964 $\pm$ 0.007 respectively. From Fig. 8, we can conclude that the performances of both MOPSO and NSGA-III optimization methods are superior to the entire dataset since the area under the curve for both the models is close to 1.

4.4 Comparative analysis of proposed work

The results of the proposed MOPSO and NSGA-III techniques were compared with some related works, as appeared in Table 7. From Table 7, we can see that the proposed MOPSO and NSGA-III methods are superior to the techniques presented in similar work. The performance of given work is marginally lower than the method proposed by [11] when compared with accuracy. Even though the performance is slightly moderate, the methods provided in the article are better. The reason is that the technique presented by [11] evaluated using accuracy parameter while proposed methods are rigorously evaluated using accuracy, F-score, Cohen’s Kappa coefficient, and area under the ROC curve.

The complexity of both MOPSO and NSGA-III is $O(n^{2}m)$ , where $m$ be the number of objective functions to be optimized and $n$ be the population size.

With this, we conclude that both MOPSO and NSGA-III give off an impression of being the best techniques, while the performance of MOPSO is marginally better than NSGA-III. The NSGA-III method is implemented first time for solving digital mammography problem. Both techniques presented in the article achieve high accuracy with minimum number of features. These methods help to improve breast cancer findings and reduce the death rate.

5. Conclusion

The proposed article investigates multi-objective techniques for feature selection and classification in digital mammography. The methodology uses multi-objective particle swarm optimization and Nondominated sorting genetic algorithm-III for feature selection. The optimal features are selected from Pareto-optimal fronts generated by MOPSO and NSGA-III for two conflicting objective functions; the number of selected features and the mean squared error. The chosen feature sets used to train and test artificial neural network classifiers. The significance of MOPSO and NSGA-III features are evaluated using statistical parameters; sensitivity, specificity, accuracy, F-score and Cohen’s kappa coefficient. The test results show that MOPSO outperforms NSGA-III strategy. The proposed system can assist in improving breast cancer diagnosis and reduce the death rate. Further, this work could be extended by implementing hybrid multi-objective techniques where the parameters of multi-objective techniques are tuned with the use of a basic evolutionary technique to achieve better performance.

References

Majid

de Paredes

Doherty

Sharma

Salvador

. Missed breast carcinoma: Pitfalls and pearls. Radiographics. 2003; 23(4): 881-895.

Abu-Amara

Abdel-Qader

. Hybrid mammogram classification using rough set and fuzzy classifier. International Journal of Biomedical Imaging. 2009; 2009.

Ganesan

Acharya

Chua

Min

Abraham

. Computer-aided breast cancer detection using mammograms: A review. IEEE Reviews in Biomedical Engineering. 2012; 6: 77-98.

Habib

Aljarah

Faris

Mirjalili

. Multi-objective particle swarm optimization: Theory, literature review, and application in feature selection for medical diagnosis. in: Evolutionary Machine Learning Techniques. 2020; 175-201. Springer, Singapore.

Alexandropoulos

Aridas

Kotsiantis

Vrahatis

. Multi-objective evolutionary optimization algorithms for machine learning: A recent survey. in: Approximation and Optimization. 2019; 35-55.

Ibrahim

Ewees

Oliva

Elaziz

. Improved salp swarm algorithm based on particle swarm optimization for feature selection. Journal of Ambient Intelligence and Humanized Computing. 2019; 10(8): 3155-3169.

Manohar

Ganesan

. Diagnosis of schizophrenia disorder in MR brain images using multi-objective BPSO based feature selection with fuzzy SVM. Journal of Medical and Biological Engineering. 2018; 38(6): 917-932.

Agarwalla

Mukhopadhyay

. Feature selection using multi-objective optimization technique for supervised cancer classification. in: Multi-Objective Optimization. 2018; 195-213.

Amoozegar

Minaei-Bidgoli

. Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism. Expert Systems with Applications. 2018; 113: 499-514.

10.

Zhang

Gong

Cheng

. Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015; 14(1): 64-75.

11.

Taghanaki

Kawahara

Miles

Hamarneh

. Pareto-optimal multi-objective dimensionality reduction deep auto-encoder for mammography classification. Computer Methods and Programs in Biomedicine. 2017; 145: 85-93.

12.

Dioşan

Andreica

. Multi-objective breast cancer classification by using multi-expression programming. Applied Intelligence. 2015; 43(3): 499-511.

13.

Xue

Zhang

Browne

. Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Transactions on Cybernetics. 2012; 43(6): 1656-1671.

14.

Hamdani

Won

Alimi

Karray

. Multi-objective feature selection with NSGA II. in: International Conference on Adaptive and Natural Computing Algorithms. 2007; 240-247. Springer, Berlin, Heidelberg.

15.

Annavarapu

Dara

Banka

. Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm. EXCLI Journal. 2016; 15: 460.

16.

Zhu

Liang

Chen

Ming

. An improved NSGA-III algorithm for feature selection used in intrusion detection. Knowledge-Based Systems. 2017; 116: 74-85.

17.

Sohrabi

Tajik

. Multi-objective feature selection for warfarin dose prediction. Computational Biology and Chemistry. 2017; 69: 126-133.

18.

Jain

Deb

. An evolutionary many-objective optimization algorithm using reference-point based nondominated sorting approach, part II: Handling constraints and extending to an adaptive approach. IEEE Transactions on Evolutionary Computation. 2013; 18(4): 602-622.

19.

Deb

Jain

. An improved NSGA-II procedure for many-objective optimization, Part I: Solving problems with box constraints. KanGAL Report. 2012; 2012009.

20.

Raj

Shobana

Pustokhina

Pustokhin

Gupta

Shankar

. Optimal feature selection-based medical image classification using deep learning model in internet of medical things. IEEE Access. 2020; 8: 58006-58017.

21.

Gupta

Ahlawat

. Usability feature selection via MBBAT: A novel approach. Journal of Computational Science. 2017; 23: 195-203.

22.

Arora

Agrawal

Tiwari

Gupta

Khanna

. Ensemble feature selection method based on recently developed nature-inspired algorithms. in: International Conference on Innovative Computing and Communications. 2020; 457-470.

23.

Gupta

Rodrigues

Sundaram

Khanna

Korotaev

de Albuquerque

. Usability feature extraction using modified crow search algorithm: A novel approach. Neural Computing and Applications. 2018; 18: 1-1.

24.

Dash

Misra

. Gene selection and classification of microarray data: A Pareto DE approach. Intelligent Decision Technologies. 2017; 11(1): 93-107.

25.

Thawkar

Ingolikar

. Classification of masses in digital mammograms using the genetic ensemble method. Journal of Intelligent Systems. 2018; 29(1): 831-845.

26.

Gupta

Agrawal

Arora

Khanna

. Bat-inspired algorithm for feature selection and white blood cell classification. in: Nature-Inspired Computation and Swarm Intelligence. 2020; 179-197.

27.

Huang

. A two stages algorithm for feature selection based on feature score and genetic algorithms. Intelligent Decision Technologies. 2019; 13(2): 139-151.

28.

Hasanpour

Meibodi

Navi

. Optimal selection of ensemble classifiers using particle swarm optimization and diversity measures. Intelligent Decision Technologies. 2019; 13(1): 131-137.

29.

Roy

Sikaria

Susan

. A deep learning based CNN approach on MRI for Alzheimer’s disease detection. Intelligent Decision Technologies. (Preprint): 1-1.

30.

Selvathi

Emala

. MRI brain pattern analysis for detection of Alzheimer’s disease using random forest classifier. Intelligent Decision Technologies. 2016; 10(4): 331-340.

31.

Thawkar

Ingolikar

. Classification of masses in digital mammograms using Biogeography-based optimization technique. Journal of King Saud University-Computer and Information Sciences. 2018.

32.

Sameti

Ward

Morgan-Parkes

Palcic

. A method for detection of malignant masses in digitized mammograms using a fuzzy segmentation algorithm. in: Proceedings of the 19th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Magnificent Milestones and Emerging Opportunities in Medical Engineering (Cat. No. 97CH36136). 1997; 2: 513-516.

33.

Wang

Liu

Freedman

. Computerized radiographic mass detection. II. Decision support by featured database visualization and modular neural networks. IEEE Transactions on Medical Imaging. 2001; 20(4): 302-313.

34.

Kennedy

Eberhart

. Particle swarm optimization. in: Proceedings of ICNN’95-International Conference on Neural Networks. 1995; 4: 1942-1948.

35.

Coello

Lechuga

. MOPSO: A proposal for multiple objective particle swarm optimization. in: Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No. 02TH8600). 2002; 2: 1051-1056.

36.

Deb

Jain

. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: Solving problems with box constraints. IEEE Transactions on Evolutionary Computation. 2013; 18(4): 577-601.

37.

Deb

Pratap

Agarwal

Meyarivan

. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation. 2002; 6(2): 182-197.

38.

Chankong

Haimes

. Multiobjective decision making: Theory and methodology. Courier Dover Publications; 2008.

39.

Cheng

Shi

Min

Cai

. Approaches for automated detection and classification of masses in mammograms. Pattern Recognition. 2006; 39(4): 646-668.

40.

Saritas

Ozkan

Sert

. Prognosis of prostate cancer by artificial neural networks. Expert Systems with Applications. 2010; 37(9): 6646-6650.

41.

Wichard

Cammann

Stephan

Tolxdorff

. Classification models for early detection of prostate cancer. BioMed Research International. 2008; 2008.

42.

Heath

Bowyer

Kopans

Kegelmeyer

Moore

Chang

Munishkumaran

. Current status of the digital database for screening mammography. in: Digital Mammography. 1998; 457-460.

43.

Bowyer

Kopans

Kegelmeyer

Moore

Sallam

Chang

Woods

. The digital database for screening mammography. in: Third International Workshop on Digital Mammography. 1996; 58: 27.

44.

Landis

Koch

. The measurement of observer agreement for categorical data. Biometrics. 1977; 159-174.

45.

Swets

. Measuring the accuracy of diagnostic systems. Science. 1988; 240(4857): 1285-1293.

Multi-objective techniques for feature selection and classification in digital mammography

Abstract

Keywords

1. Introduction

2. Related work

3.1 Feature selection

3.1.1 Multi-objective functions

4.1 Performance evaluation of mopso and NSGA-III

Table 1 MOPSO and NSGA-III parameters

4.4 Comparative analysis of proposed work

5. Conclusion

References

Table 1
MOPSO and NSGA-III parameters