A classification framework using filter–wrapper based feature selection approach for the diagnosis of congenital heart failure

Abstract

Premature mortality from cardiovascular disease can be reduced with early detection of heart failure by analysing the patients’ risk factors and assuring accurate diagnosis. This work proposes a clinical decision support system for the diagnosis of congenital heart failure by utilizing a data pre-processing approach for dealing missing values and a filter-wrapper based method for selecting the most relevant features. Missing values are imputed using a missForest method in four out of eight heart disease datasets collected from the Machine Learning Repository maintained by University of California, Irvine. The Fast Correlation Based Filter is used as the filter approach, while the union of the Atom Search Optimization Algorithm and the Henry Gas Solubility Optimization represent the wrapper-based algorithms, with the fitness function as the combination of accuracy, G-mean, and Matthew’s correlation coefficient measured by the Support Vector Machine. A total of four boosted classifiers namely, XGBoost, AdaBoost, CatBoost, and LightGBM are trained using the selected features. The proposed work achieves an accuracy of 89%, 84%, 83%, 80% for Heart Failure Clinical Records, 81%, 80%, 83%, 82% for Single Proton Emission Computed Tomography, 90%, 82%, 93%, 80% for Single Proton Emission Computed Tomography F, 80%, 80%, 81%, 80% for Statlog Heart Disease, 80%, 85%, 83%, 86% for Cleveland Heart Disease, 82%, 85%, 85%, 82% for Hungarian Heart Disease, 80%, 81%, 79%, 82% for VA Long Beach, 97%, 89%, 98%, 97%, for Switzerland Heart Disease for four classifiers respectively. The suggested technique outperformed the other classifiers when evaluated against Random Forest, Classification and Regression Trees, Support Vector Machine, and K-Nearest Neighbor.

Keywords

Henry gas solubility optimization atom search optimization algorithm XGBoost adaboost catboost LightGBM

1 Introduction

The cardiovascular disease (CVD) refers to a wide range of conditions affecting the heart that are among the leading causes of death in the world. The most important factors that put a human at risk for cardiovascular disease include smoking, having high blood pressure, high cholesterol, high triglycerides, drinking too much alcohol, having an unhealthy diet, being overweight, and not being physically active [1]. Because the rate of cardiovascular disease (CVD) is on the rise in the majority of nations at an alarming rate, it is more crucial than ever to detect CVD at an early stage and make an accurate diagnosis [2]. CVD can lead to a number of different cardiovascular conditions, including coronary heart disease, cardiomyopathy, heart failure, and stroke [3]. The number of persons dying from cardiovascular diseases in the western world is reducing as better pharmacological diagnosis and better approaches to prevent CVD are being established [4].

The machine learning approach analyses clinical records that can be found in open sources or directly in the hospital repository if the right permissions are given [5]. The Clinical decision support systems (CDSS), evaluates the medical records of a patient and provide doctors with a second opinion based on an analysis of electronic health record (EHR) data that has been collected in the past [6]. Clinicians in the actual world using this CDSS might be able to accept the suggestions offered by the CDSS, which might lead to an accurate diagnosis [7].

Classification, Association Rule Mining, and Clustering are the three most essential subtasks in data mining. A two-stage supervised learning approach; classification relies on labeling examples to create a training set. To categorize an unseen instance, first a model is constructed using the training dataset followed by testing phase. Association Rule Mining is a method of unsupervised learning and identifies clusters of regularly occurring data and employs them to construct robust rules. Clustering is a method for unsupervised learning in which groups of data are categorized according to their resemblance to other groups and to themselves, with the goal of increasing similarity within each group while decreasing similarity across groups [8].

In order to improve the performance of classifiers, it is necessary to do data preprocessing, which involves removing any missing, noisy, or outlier values from the dataset. Missing or inaccurate numbers are common in clinical health records due to human error, inaccurate instrument readings, and data duplication. Predictions for the missing values are calculated by either machine learning or statistical approaches during the missing value imputation process [9]. In a dataset with missing values that are completely at random (MCAR), there is no relationship between the missing values and the data and the likelihood of missing values is the same for all values. Missing Completely at Random (MCAR) is a type of missingness in a dataset in which the probability of missingness percentage is the same for all the data and there is no correlation between the missingness and the data. Missing at Random (MAR) is the other type of missingness. In MAR, the percentage of missing data happens at random, but there is a strong link between the missing data and the original data. It is called “missingness, not at random” when the overall missingness is more related to the variables in the dataset and the missingness is not random (MNAR).

When irrelevant features that have nothing to do with the class label are present, they create bias in the prediction and affect the classification’s performance. Feature selection is used to analyze which features of the original dataset are the most important [10]. The most relevant features are selected by using the filter, wrapper, embedded, or hybrid approaches. Methods like the chi-square test, the Analysis of variance (ANOVA) test, and the Linear discriminant analysis (LDA), are used in the filter approach [11] to rank features. The wrapper method figures out how important each feature in the dataset is by looking at how well the classifier works and using methods called Recursive Feature Elimination (RFE), Backward Elimination, and Forward Selection [12, 13]. The learning algorithm is used in the embedded feature selection method to combine the best parts of the filter and wrapper methods. Both Ridge regression and Lasso regression are used in these methods to choose features at the same time. In the hybrid method, the features are first ranked by a filter method. Then, a wrapper method figures out how much each feature adds to the classification method. In one step, Pearson’s Correlation, Chi-Square, ANOVA, Fisher’s score, Gini index, an Information Gain, or a Feature Importance Score are used to rank the features. A wrapper approach [14] is then used to evaluate the top set of features.

Zhiyong Hu et al. [15] proposed the missing value imputation method of Gaussian Process Latent Variable Model (GPLVM) for the efficient prediction of heart failure disease using Constrained Support Vector Machine (cSVM) along with a Logistic Regression (LR) with Least Absolute Shrinkage and Selection Operator (LASSO) regularization for selecting the most relevant features. Fahad B. Mostaf et al. have come up with an imputation framework that uses Multiple Imputation by Chained Equations to predict liver diseases using binary classifiers like Artificial Neural Network (ANN), Random Forest (RF), and Support Vector Machine (SVM) [16]. Liying Fang et al. have come up with a way to pull out features from multi-dimensional clinical time series data using the Kozachenko–Leonenko (K–L) information entropy estimation method. This method is based on mutual information [17]. Elham Nasarian et al. came up with a way to pick the most important features based on a hybrid method that combines the best parts of the Fisher score algorithm (FSA), the F score algorithm (FA), and the extra trees classifier algorithm (ETCA). This method is used to predict coronary artery disease [18]. Mohammed Al-Sarem et al. came up with a method for predicting Parkinson’s disease based on XGBoost, CatBoost, and random forest. This method is called “ensemble feature selection.” [19]. A. N. M. Bazlur Rashid et al. [20] suggested using the cooperative co-evolution method with a penalty-based objective function as the fitness value for classifying medical datasets. These studies made it possible to look into more methods for missing value imputation and different optimization algorithms for choosing the most important features to improve the performance of the classifier.

In this work, the missing values in four of the eight datasets are addressed by the data pre-processing subsystem using a missForest method. The filter approach used is the Fast Correlation-Based Filter (FCBF), and the features are chosen by combining the Atom Search Optimization Algorithm (ASO) and the Henry Gas Solubility Optimization (HGSO). As the fitness function, the Support Vector Machine (SVM) measures accuracy, G-mean, and Matthew’s correlation coefficient (MCC). The chosen features are used to train the four boosted classifiers: XGBoost, AdaBoost, CatBoost, and LightGBM Classifier.

Here’s how the rest of the paper is put together: Section 2 presents the list of abbreviations used in alphabetical order. In Section 3, an overview of the related work is given. Section 4 outlines the materials and methods. In Section 5, the framework for the subsystem for selecting features is explained. The Classification Subsystem is explained in Section 6. In Section 7, the Results and Discussions are presented. Section 8 presents the Conclusion and Future Scope.

2 List of abbreviations

Table 1 shows an outline of the list of abbreviations that are used in the rest of the manuscript. The list is in alphabetical order.

Table 1
Abbreviations used

Sl No Abbreviation Phrase

1 ABCO Artificial Bee Colony Optimization

2 ACO Ant Colony Optimization

3 AFO Artificial Flora Optimization Algorithm

4 ANOVA Analysis of Variance

5 AO Aquila Optimizer

6 ANN Artificial Neural Network

7 ASO Atom Search Optimization Algorithm

8 AUC Area Under Curve

9 BP Backpropagation

10 BAO Binary Arithmetic Optimization Algorithm

11 BUN Blood Urea Nitrogen

12 CART Classification and Regression Trees

13 CDSS Clinical Decision Support System

14 CHD Cleveland Heart Disease

15 CMVO Chaotic Multi-Verse Optimization Algorithm

16 CT Computed Tomography

17 CTED CT Emphysema Database

18 CVD Cardio Vascular Disease

19 DE Differential Evolution

20 DMO Dwarf Mongoose Optimization Algorithm

21 DISON Diverse Intensified Strawberry Optimized Neural Network

22 EOA Ebola Optimization Algorithm

23 ELM Extreme Learning Machine

24 EFB Exclusive Feature Bundling

25 ERT Extremely Random Tree-Based

26 FCBF Fast Correlation-Based Filter

27 FELM Fuzzy Sets and Extreme Learning Machine

28 FFANN Feed Forward Artificial Neural Network

29 FFO Firefly Optimization

30 FURIA Fuzzy Unordered Rule Induction Algorithm

31 GA Genetic Algorithm

32 GAN Generative Adversarial Networks

33 GDBA Gradient Descent Backpropagation algorithm

34 GOSS Gradient-based One-Side Sampling

35 GSO Glow Worm Swarm Optimization

36 GSA Gravitational Search Algorithm

37 GP Gaussian Process

38 HFCD Heart Failure Clinical Records

39 HGSO Henry Gas Solubility Optimization

40 HHD Hungarian Heart Disease

41 IABCO Improved Artificial Bee Colony Optimization

42 IRSA Improved Reptile Search Algorithm

43 ICSM Istituto Clinico Scientifico Maugeri

44 IGA Intelligent Genetic Algorithm

45 ILPD Indian Liver Patients Dataset

46 IRCCS Istituto di Ricovero e Cura a Carattere Scientifico

47 KDD Knowledge Discovery in Databases

48 K-NN K-Nearest Neighbor

49 LDA Linear Discriminant Analysis

50 LFC Loss Function Change

51 LOA Lion Optimization Algorithm

52 LIDCIDRI Lung Image Database Consortium-Image Database Resource Initiative

53 MAR Missing at Random

54 MCC Matthews Correlation Coefficient

55 MFO Moth Flame Optimization

56 MGH Massachusetts General Hospital

57 MICE Multiple Imputations using Chained Equations

58 MCV Mean Corpuscular Volume

59 MLR Machine Learning Repository

60 MLP Multi-Layer Perceptron

61 MNAR Missing Not at Random

62 MSE Mean Square Error

63 MOPSO Multi-Objective Particle Swarm Optimization

64 PCA Principle Component Analysis

65 PFA Paddy Field Algorithm

66 PID Pima Indian Diabetes

67 RBFNN Radial Basis Function Neural Network

68 RDW Red Cell Distribution Width

69 RF Random Forest

70 RFE Recursive Feature Elimination

71 RMSE Root Mean Square Error

72 RMSEN Normalized Root Mean Squared Error

73 ROI Region of Interest

74 SHD Statlog Heart Disease

75 SFC Spatial Fuzzy C-Means Clustering

76 SHAP Shapley Additive Explanations

77 SMO Spider Monkey Optimization

78 SOA Strawberry Optimization Algorithm

79 SMOTE Synthetic Minority Over-sampling Technique

80 SPECT Single Proton Emission Computed Tomography

81 SPECTF Single Proton Emission Computed Tomography F

82 SRBCT Small Round Blue Cell Tumors

83 SVM Support Vector Machine

84 SWD Switzerland Heart Disease

85 TB Tuberculosis

86 TEJ Taiwan Economic Journal

87 TSD Thoracic Surgery Dataset

88 UCI University of California Irvine

89 LBV VA Long Beach

90 WBC White Blood Cell

91 WDBC Wisconsin Diagnostic Breast

Cancer

Sl No	Abbreviation	Phrase
1	ABCO	Artificial Bee Colony Optimization
2	ACO	Ant Colony Optimization
3	AFO	Artificial Flora Optimization Algorithm
4	ANOVA	Analysis of Variance
5	AO	Aquila Optimizer
6	ANN	Artificial Neural Network
7	ASO	Atom Search Optimization Algorithm
8	AUC	Area Under Curve
9	BP	Backpropagation
10	BAO	Binary Arithmetic Optimization Algorithm
11	BUN	Blood Urea Nitrogen
12	CART	Classification and Regression Trees
13	CDSS	Clinical Decision Support System
14	CHD	Cleveland Heart Disease
15	CMVO	Chaotic Multi-Verse Optimization Algorithm
16	CT	Computed Tomography
17	CTED	CT Emphysema Database
18	CVD	Cardio Vascular Disease
19	DE	Differential Evolution
20	DMO	Dwarf Mongoose Optimization Algorithm
21	DISON	Diverse Intensified Strawberry Optimized Neural Network
22	EOA	Ebola Optimization Algorithm
23	ELM	Extreme Learning Machine
24	EFB	Exclusive Feature Bundling
25	ERT	Extremely Random Tree-Based
26	FCBF	Fast Correlation-Based Filter
27	FELM	Fuzzy Sets and Extreme Learning Machine
28	FFANN	Feed Forward Artificial Neural Network
29	FFO	Firefly Optimization
30	FURIA	Fuzzy Unordered Rule Induction Algorithm
31	GA	Genetic Algorithm
32	GAN	Generative Adversarial Networks
33	GDBA	Gradient Descent Backpropagation algorithm
34	GOSS	Gradient-based One-Side Sampling
35	GSO	Glow Worm Swarm Optimization
36	GSA	Gravitational Search Algorithm
37	GP	Gaussian Process
38	HFCD	Heart Failure Clinical Records
39	HGSO	Henry Gas Solubility Optimization
40	HHD	Hungarian Heart Disease
41	IABCO	Improved Artificial Bee Colony Optimization
42	IRSA	Improved Reptile Search Algorithm
43	ICSM	Istituto Clinico Scientifico Maugeri
44	IGA	Intelligent Genetic Algorithm
45	ILPD	Indian Liver Patients Dataset
46	IRCCS	Istituto di Ricovero e Cura a Carattere Scientifico
47	KDD	Knowledge Discovery in Databases
48	K-NN	K-Nearest Neighbor
49	LDA	Linear Discriminant Analysis
50	LFC	Loss Function Change
51	LOA	Lion Optimization Algorithm
52	LIDCIDRI	Lung Image Database Consortium-Image Database Resource Initiative
53	MAR	Missing at Random
54	MCC	Matthews Correlation Coefficient
55	MFO	Moth Flame Optimization
56	MGH	Massachusetts General Hospital
57	MICE	Multiple Imputations using Chained Equations
58	MCV	Mean Corpuscular Volume
59	MLR	Machine Learning Repository
60	MLP	Multi-Layer Perceptron
61	MNAR	Missing Not at Random
62	MSE	Mean Square Error
63	MOPSO	Multi-Objective Particle Swarm Optimization
64	PCA	Principle Component Analysis
65	PFA	Paddy Field Algorithm
66	PID	Pima Indian Diabetes
67	RBFNN	Radial Basis Function Neural Network
68	RDW	Red Cell Distribution Width
69	RF	Random Forest
70	RFE	Recursive Feature Elimination
71	RMSE	Root Mean Square Error
72	RMSEN	Normalized Root Mean Squared Error
73	ROI	Region of Interest
74	SHD	Statlog Heart Disease
75	SFC	Spatial Fuzzy C-Means Clustering
76	SHAP	Shapley Additive Explanations
77	SMO	Spider Monkey Optimization
78	SOA	Strawberry Optimization Algorithm
79	SMOTE	Synthetic Minority Over-sampling Technique
80	SPECT	Single Proton Emission Computed Tomography
81	SPECTF	Single Proton Emission Computed Tomography F
82	SRBCT	Small Round Blue Cell Tumors
83	SVM	Support Vector Machine
84	SWD	Switzerland Heart Disease
85	TB	Tuberculosis
86	TEJ	Taiwan Economic Journal
87	TSD	Thoracic Surgery Dataset
88	UCI	University of California Irvine
89	LBV	VA Long Beach
90	WBC	White Blood Cell
91	WDBC	Wisconsin Diagnostic Breast
Cancer

3 Related works

The majority of the studies that are included below focused on classification frameworks for clinical datasets. These works utilized bio-inspired wrapper-based algorithms for data pre-processing and feature selection.

Nancy et al. [21] proposed a missing value imputation approach for unevenly spaced clinical time series data utilizing an IDW and PSO. The known data points have been chosen based on the TR and a PSO is used to select the influence factor which assigns weights for the known data. Based on the attribute dependencies the nearest significant set has been generated for each missing value. The IDW utilizes the influence factor and neighborhood set for missing data. Training and testing sets were generated using ten-fold cross-validation and two independent evaluation runs. The proposed work has been experimented with clinical time-series data of hepatitis and thrombosis patients from Chiba hospital which achieved a classification accuracy of 83.57% and 80.15% with missingness percent ranging from 10% and 15%. The proposed approach in combination with the decision tree results in an accuracy of 81.14% and 77.91% for the missing rate of 10% and 15% and 78.89% and 76.19% for the missing rate of 10% and 15% using SVM.

Kindie et al. [22] proposed a classification framework for clinical datasets based on a hybrid approach that combines the properties of Fuzzy sets and FELM. In this study, missing values are imputed using the KNN algorithm, and outliers are removed from the datasets during the pre-processing phase. The dataset’s properties are mapped to the Fuzzy set, and the ELM is used to conduct classification. Experiments has been performed on clinical datasets collected from the MLR maintained by UCI and achieved an accuracy of 93.55% for CHD with two class labels and 73.77% with five class labels. 94.44% for the SHD dataset and 92.54% for the PID dataset.

Ching-Hsue Cheng [23] proposed an imputation framework utilizing a purity-based K-NN algorithm, which enhances the performance of missing value imputation. In this work, imputation has been performed by normalizing all features to the same scale and evaluating their purity. The dataset has been used then partitioned into instances with and without missing values. The Euclidean distance metric has been used to determine the nearest neighbors, and the purity was determined by aggregating the votes cast by the neighbors. Positive purity has been used to replace missing values and has been indicated as being extremely similar to its neighbors, whilst negative purity has been considered as extremely dissimilar to its neighbors. Experiments has been performed on nine datasets, eight from the MLR maintained by UCI and one from the TEJ financial distress dataset, and achieved an accuracy of 94.37% for the banknote dataset, 81.31 for the blood dataset, 98.17% for the climate dataset, 79.75% for the Haberman dataset, 85.42% for the Pima dataset, 89.01% for the Vertebral 2 C dataset, and 86.67% for the Vertebral 3 C dataset.

Arianna Dagliati et al. [24] proposed an imputation framework for the detection of Diabetes Complications utilizing two techniques, namely imputation using the mean and median approach and imputation using the RF approach. The imputation technique involves a dataset with no missing values, after which the percentage of missingness in the original dataset is estimated by randomly removing features. Then, artificial missingness has been generated by removing the same estimated percentage of missingness from the original dataset in order to evaluate the imputing capability. The mean, median, and missForest algorithms are used to address the missing values in the data set. The missForest method was utilized for random forest imputation, in which 100 trees were created with 100 iterations, and the RMSE and RMSEN were used to evaluate the effectiveness of the missForest imputation. The experiments has been performed on clinical datasets, the Type 2 Diabetes Mellitus dataset collected from the IRCCS, a research hospital, ICSM, Hospitals of Pavia, Italy, and the SVM classifier achieved an accuracy of 83.8%.

Malek Alzaqebah et al. [25] proposed a wrapper-based approach for selecting the most relevant features utilizing an MFO and the accuracy of an SVM classifier as the fitness function. The objective of the MFO algorithm is to find the intelligent flight path that has been deemed a straight line by maintaining the Moon as the light source. In this wrapper-based feature selection, three neighbourhood approaches, NBChange, NBMove, and NBSwap, has been outlined to prevent moths from being trapped in the local optimum. The datasets used in this study were collected from the MLR maintained by UCI, and achieved an accuracy of 78.63% for the German dataset, 88.42% for the Heart dataset, 89.64% for the Ionosphere dataset, 89.71% for the Parkinson’s dataset, 86.38% for the Spectf dataset, 88.09% for the Sonar dataset, 98.96% for the WDBC dataset, and 98.35% for the WBC dataset using the SVM classifier.

Mehrdad Rostami et al. [26] proposed a wrapper-based approach for selecting the most relevant features utilizing the PSO with the MOPSO and Node Centrality technique for medical datasets. The features present in the medical datasets, initially represented as a weighted graph, and the feature popularity has been evaluated using the node centrality approach. This yields the starting population for the PSO algorithm, and the MOPSO has been used to choose the most relevant features. The computation of node centrality has been performed to avoid early convergence by producing features with high centrality that will be treated as initial solutions rather than being randomly assigned. The datasets collected from the Bioinformatics Research Group, achieved an accuracy of 85.10 percent for the Colon dataset and 88.89 percent for the Leukaemia dataset. Using the remaining datasets from Universidad Pablo de Olavide, the SVM classifier attained an accuracy of 82.10% for the SRBCT dataset, 81.67% for the Prostate Tumor dataset, and 88.19% for the Lung Cancer dataset.

Golnaz Sahebi et al. [27] proposed a wrapper-based method for identifying the most relevant features using an IGA and the K-NN classifier. The multi-population scheme property has been achieved by performing parallel processing of the modified GA algorithm for the initial population. The precision of the K-NN classifier was used to evaluate the quality of each solution. A new genetic operator has been proposed named inverse which implements the weighting method by updating the weights of each feature present in the dataset. In addition to removing the improved solutions from the original population, the replacement operator has also been used to remove the improved solutions from the initial population. The technique has been iterated until the optimal population solution was obtained. The experiments has been performed on five clinical datasets collected from the MLR maintained by UCI and achieved an accuracy of 95.83% for Lung, 97.62% for Dermatology, 99.02% for Arrhythmia, 98.51% for WDBC, and 94.2% for Hepatitis.

Huseyin Polat et al. [28] proposed the selection of the most relevant features for the diagnosis of chronic kidney disease by combining the filter and wrapper approaches. In the filter approach of feature selection, a correlation-based approach to feature selection has been utilized, while a greedy step-wise feature selector and a wrapper-based subset evaluator has been employed. The greedy search-based feature selection selects features based on forward feature selection and eliminates irrelevant features using backward feature selection. In the process of creating a new subset of features, the newly performing features has been added after the most relevant subset of features identified by evaluating each subset of features. The correlation-based feature selector selects the most relevant feature by ranking all the features in the dataset according to their correlation. The Chronic Kidney Disease dataset collected from the MLR maintained by UCI achieved an accuracy of 98.5%.

Sushama Nagpal et al. [29] proposed a wrapper-based method for selecting the most relevant features using GSA with the accuracy of K-NN as a fitness function. The GSA has been formulated by the property of Newtonian Mechanics in which particle interactions monitored based on the notion that the force of gravity drives all particles to flow towards heavier particles. The entire value of the interaction force has been determined based on the particle directions and, therefore, by updating their positions. The accuracy of the K-NN classifier has been employed as the fitness function and tested on three clinical datasets collected from the MLR maintained by UCI, with an accuracy of 82.96% for the heart disease dataset, 95.7% for the breast cancer dataset, and 96.7% for the dermatology dataset.

V. R. Elgin Christo et al. [30] proposed a wrapper-based approach for selecting the most relevant features using three nature-inspired algorithms, namely DE, LOA, and GSO, with the fitness function being the accuracy of the AdaBoostSVM classifier. Selected relevant features has been used to train a neural network utilizing gradient descendant backpropagation. The bioinspired algorithms utilized in this experiment select a relevant subset of features, and a correlation-based ensemble feature selection has been performed to find the best subset of features for performance evaluation. A gradient-based backpropagation neural network has been used to classify the resultant subset of features. The experiment has been performed on clinical datasets collected from the MLR maintained by UCI and achieved an accuracy of 95.51% for Hepatitis dataset, 98.40% for WDBC.

Sreejith S et al. [31] proposed a classification framework utilizing an ANN optimized with a GA and an AFO for the diagnosis of clinical datasets. The ANN’s topology has been optimized using the GA algorithm, whereas the parameters optimized using the AFO algorithm. The optimization of the topology has been improved with a dropout approach, and the optimization of the parameters is accomplished with a weight regularization method. The dataset is normalized using a min-max scaling method, and the feature selection has been performed using an embedded method that employs a random forest classifier to evaluate the feature importance. Using the GA, the total number of hidden layers and the hidden layer neurons has been optimized, while the AFO approach has been used to optimize the weights and bias. The proposed work has been tested on three clinical datasets from the MLR maintained by UCI, and achieved an accuracy of 86.82% for the HCV dataset, 84.91% for the VC dataset, 95.65% for the SHD dataset, and 93.79% for the ESDRP dataset.

Anisha Isaac et al. [32] proposed a wrapper-based approach for selecting the most relevant features utilizing MFO, FFO, ABCO, and ACO with the fitness function evaluated by the SVM classifier for the diagnosis of pulmonary emphysema. The lung CT slice has been segmented using the Spatial Intuitive Fuzzy C Means Algorithm, and the segmented images has been used to extract the region of interest, followed by a feature selection subsystem utilizing four bio-inspired algorithms, and a reduced feature subset has been obtained using the SVM classifier. Using a trained ELM, the overall performance of the model has been evaluated. The data has been collected at a hospital in Chennai, Tamil Nadu, India. Overall, the dataset achieved an accuracy of 89.02% for MFO, 86.64% for FFO, 84.86% for ABCO, and 78.64% for ACO, based on the proposed aaproach.

Leema N. et al. [33] proposed an evaluation parameter setting for the training of feed-forward artificial neural networks. Using the parameters namely, weights, training epochs, learning rate, momentum, neurons per hidden layer, activation function, and biases, the backpropagation technique has been applied to clinical datasets. Twelve distinct Backpropagation methods has been evaluated based on the effect of varying network parameter values. This work’s pre-processing subsystem handles noisy, irregular, and missing values by either removing the instances or imputing the missing values using the class’s most frequent feature values. The neural network classifier has been trained using a multi-layer feed-forward neural network with one hidden layer, where each input layer corresponds to a significant feature in the dataset. The input, hidden, and output nodes of the neural network has been initialized, and the output of each hidden node has been calculated. Using backpropagation techniques, the error estimate for the network has been reviewed by changing the weight and bias values. The experimentation has been conducted on three clinical benchmark datasets: WBC, PID, and Hepatitis, as well as a dataset taken from the MLR maintained by UCI.

Anisha Isaac et al. [34] proposed a wrapper-based method for determining the most relevant features utilizing an IABCO with the accuracy of an SVM classifier for the diagnosis of Cavitary TB and Military TB. The Active contour region-based model has been utilized for segmentation, and the ROI. The improved ABCO technique employs a search algorithm with two evaluation functions, namely Mutual information and the Rough Dependency measure, to select the most relevant features, for training the RBFNN classifier. The experimentation has been conducted on the Tuberculosis dataset and the LIDC-IDRI dataset, that achieved an accuracy of 88.34% and 92.63% respectively.

Sreejith et al. [35] proposed a wrapper-based approach for selecting the most relevant features using the chaotic evolutionary algorithm CMVO and a SMOTE approach for resolving the class imbalance problem. The model’s overall performance has been improved by the SMOTE technique enhanced with the Orchid algorithm. As the fitness function, the combination of the arithmetic mean of MCC and the F-Score(F1) has been employed and evaluated using an RF classifier. The RF classifier consists of 100 decision trees, and the information gain ratio considers the split criteria. The datasets used in this research has been collected from the MLR maintained by UCI and achieved an accuracy of 82.46% for ILPD, 86.88% for TSD, and 89.0% for PID.

Sreejith et al. [36] proposed an embedded approach of feature selection employing ERT based on different enhanced SOA for modelling a classification framework for clinical datasets. The weight and bias of a feed-forward neural network has been tuned using the strawberry optimization algorithm and the backpropagation algorithm. Two-phase training has been used to overcome the issue of local optimums in neural networks. The issue of local optimums in neural networks has been resolved by combining the two-phase training approach with the stochastic duplicate–elimination strategy. The ensemble approach employed by the ERT provides a series of randomly generated decision trees that reduces the correlation between the data points, hence reducing the total variance of the classifier. An SOA and a Gradient Descent Backpropagation algorithm GDBA optimize DISON. The datasets used in this research include the Vertebral Column, PID, CHD, and SHD collected from the MLR maintained by UCI, and achieved an accuracy of 87.17%, 90.92%, 93.67%, and 94.57%, respectively.

Anisha Isaac et al. [37] proposed a wrapper-based approach for selecting the most relevant features via competitive coevolution by utilizing the SMO algorithm and PFA with the accuracy of the SVM classifier as the fitness function. Employing the SFCM clustering approach, the process of segmenting lung tissues has been achieved. Validated pixel-based segmentation has been utilized to retrieve the ROI. The ROI’s most relevant features, namely shape, texture, and run-length, has been extracted. Using a Competitive Coevolution Model, the most relevant features has been selected. The wrapper-based approach utilizing both algorithms compete to select the most relevant features, employed to train the linear SVM classifier. The datasets used in this research include a real-time emphysema dataset and a CTED dataset, which achieved an accuracy of 81.95% and 93.74%, respectively.

Priti Bansal et al. [38] proposed a wrapper-based feature selection approach for selecting the most relevant features for the diagnosis of osteosarcoma using a binary arithmetic optimization algorithm (BAO). Using a BAO-based feature selection method, the model has been further improved. In this wrapper approach, the classification accuracy of the RBF-SVM classifier is employed as the fitness function. Histological images of osteosarcoma labeled with Hematoxylin and Eosin are included in the publicly accessible dataset used for experiments. The suggested approach has been compared to deep learning models in which feature extraction performed using the EfficientNet-B0 and Xception deep learning models. The model’s performance has been evaluated using an SVM classifier, which yielded an accuracy of 99.54%.

Jeffrey O. Agushaka et al. [39] proposed the Dwarf Mongoose Optimization Algorithm (DMO), a metaheuristic algorithm for solving a diverse range of discrete and multidimensional problems by mimicking the foraging behaviour of mongooses. This algorithm has been considered as a population-based algorithm divided into three distinct groups: the alpha group, the babysitter group, and the scout group. The alpha group search for food, which determines the search path and distance covered. The babysitters chosen from the general population of mongooses; they remain with the babies until the alpha group returns from foraging. As the mongoose never returns to a previously visited sleeping mound, the scout group will undoubtedly engage in exploration as they look for the next one. After the random initialization of the population, the fitness is evaluated. The suggested method has been used to solve 31 benchmark problems and outperforms existing techniques.

Mohammad H. Nadimi-Shahraki et al. [40] proposed a wrapper-based feature selection approach for COVID-19 diagnosis employing a binary Aquila optimizer (AO). This research proposed two variations of the AO algorithm for selecting features: the S-shaped binary Aquila optimizer (S-BAO) and the V-shaped binary Aquila optimizer (V-BAO). The AO algorithm considers each solution as a position vector with real-valued parameters. In S-BAO, a transfer function in the shape of a S is utilized to move solutions around the search space. In V-BAO, a V-shaped transfer function has been used to determine the likelihood of position changes. In both implementations of the AO algorithm, the classification accuracy of K-NN has been employed as the fitness function. The selected features used to train the K-NN classifier that evaluate the model’s performance. A real-time dataset comprising of 864 instances and 15 features, and achieved an accuracy of 96.80% for S-BAO and 96.15% for V-BAO, respectively.

Zenab Elgamal et al. [41] proposed wrapper-based feature selection methods utilizing an IRSA for classifying clinical datasets with the accuracy of the K-NN classifier as a fitness function. Utilizing chaotic maps and a Simulated Annealing (SA) approach, the classification performance of the basic RSA is improved. Initialization of solutions achieved using chaotic maps makes the IRSA converge faster and hence results in diversified solutions. The standard RSA algorithm has been improved by using an SA algorithm which prevents the local optima problem and improves the exploitation of the search space. The proposed approach has been experimented on 20 datasets collected from the MLR maintained by UCI out of which 19 are clinical datasets. The proposed approach performs well when compared with other state-of-the-art optimization algorithms.

Olaide N. Oyelade et al. [42] proposed a bio-inspired population-based optimization algorithm termed the Ebola optimization algorithm (EOA) for handling different complex problems. Individuals’ scalar and vector quantities has been classified as susceptible, infected, recovered, deceased, and vaccinated. Using the fitness value of all susceptible individuals, generate an index case that has been considered as the global and current best. The susceptible individual’s position has been updated so that the shortest displacement optimizes exploration and exploitation, while the infection rate rises with longer displacement. The newly infected people has been created and evaluated as a distinct pool of potential remedies. In order to tackle both discrete and continuous issues, the global best value has been returned as the final solution.

The related research focused primarily on the imputation framework in addition to the feature selection techniques employing a wrapper-based method for selecting the most relevant features and a classification framework. Based on inferences from related work, a CDSS with the benefits of a novel imputation framework utilizing missForest and feature selection using a filter-wrapper technique has been presented in this study. From related works, the following conclusions may be drawn to improve the proposed work: the performance of the classifier improves with the selection of the most relevant features and the treatment of the dataset’s missing values. The Evolutionary algorithms can effectively be used for selecting the most relevant features must be formulated for feature selection for the datasets used in this work.

The outline of our contribution is highlighted below:

Using a filter-wrapper method, a CDSS for diagnosing Congenital Heart Failure is proposed.

In this work, four of the eight datasets used for experimentation contain missing values, which are imputed by a missForest algorithm. The most relevant features are selected using an FCBF filter approach and the union operation between two wrapper-based approaches for selecting the most relevant features, ASO and HGSO.

The selected features are utilized to train the four boosted classifiers namely, AdaBoost, CatBoost, LightGBM, and XGBoost—that evaluate the model’s overall performance.

4 Materials and methods

The overall system framework includes three subsystems namely, Data Pre-processing, Feature Selection, and Classification. The outline of the system framework is presented in Fig. 1.

Fig. 1

System framework.

4.1 Data pre-processing subsystem

The presence of missing and irrelevant values affects the overall performance of the classifier. The data pre-processing subsystem handles the dataset’s missing values, hence enhancing performance. Four of the eight datasets on cardiovascular disease include missing values and these datasets were collected from MLR maintained by UCI. Table 2 presents an overview of the experimental datasets.

Table 2
The datasets used for experimentation

Name of Dataset No of Features No of Instances Instances Associated with each Class Label Inter-pretation of Class Labels

HFCD 13 299 1 - 96 / 0 –203 1 = Critical / 0 = Not Critical

SPECT 22 267 1 - 110 / 0 –157 1 = Abnormal / 0 = Normal

SPECTF 44 267 1- 254 / 0 –95 1 = Abnormal / 0 = Normal

SHD 13 270 1 - 150 / 2 –120 2 = Presence / 1 = Absence

CHD 14 303 1-139 / 0 –164 1 = Presence / 0 = Absence

HHD 14 294 1 - 106 / 0 –188 1 = Presence / 0 = Absence

LBV 14 200 1 - 149 / 0 - 51 1 = Presence / 0 = Absence

SWD 14 123 1 –115 / 0 –8 1 = Presence / 0 = Absence

Name of Dataset	No of Features	No of Instances	Instances Associated with each Class Label	Inter-pretation of Class Labels
HFCD	13	299	1 - 96 / 0 –203	1 = Critical / 0 = Not Critical
SPECT	22	267	1 - 110 / 0 –157	1 = Abnormal / 0 = Normal
SPECTF	44	267	1- 254 / 0 –95	1 = Abnormal / 0 = Normal
SHD	13	270	1 - 150 / 2 –120	2 = Presence / 1 = Absence
CHD	14	303	1-139 / 0 –164	1 = Presence / 0 = Absence
HHD	14	294	1 - 106 / 0 –188	1 = Presence / 0 = Absence
LBV	14	200	1 - 149 / 0 - 51	1 = Presence / 0 = Absence
SWD	14	123	1 –115 / 0 –8	1 = Presence / 0 = Absence

In the HFCD, the Death event class label has been replaced by Critical and Not-Critical, with 1 indicating critical and 0 indicating not critical. Each dataset’s missing values are imputed using a missForest algorithm employing an RF approach. In section 4.2, an overview of missing value imputation for datasets using missForest is presented.

4.2 Missingness percentage

The percentage of missing data in a dataset directly affects the inferences that may be drawn statistically and visually. In order to get the percentage of missing data for each attribute, divide the number of missing data points for that attribute by the total number of instances in this dataset multiplied by 100 as presented in Equation (1).

$\begin{matrix} Miss Percentage \\ = \frac{Total Number of Miss Values}{Total Number of Observations} * 100 \end{matrix}$ (1)

4.3 Missing value imputation using missForest

The missForest imputation approach is a non-parametric method that uses a Random Forest (RF) for filling the missing values in a dataset [44]. The initial step in the missForest is to impute the missing values using the Mean operation and then the model RF is fit to the observed data in the dataset for predicting the missing values and this iterative process repeats until the stopping criteria are met [45]. The missing values present in the clinical dataset are imputed with the mean operation and for each observed value present the RF model is fit for predicting the missing values and iterations continue until some stopping criteria are met or after a certain number of iterations has elapsed. The parameters for the missForest algorithm are outlined in Table 3:

Table 3
Parameters used for missForest algorithm for imputation

No Parameters Phrase

1 $N_{obs}^{s}$ Non-Missing Values

2 M_miss Missing Values

3 D_s Arbitrary Variable

4 O_obs Observed Values

5 $R_{obs}^{s}$ Random Missingness

6 γ Stopping Criteria

7 $D_{old}^{imp}$ Previously Imputed Matrix

8 $X_{new}^{imp}$ Newly Imputed Matrix

No	Parameters	Phrase
1	$N_{obs}^{s}$	Non-Missing Values
2	M_miss	Missing Values
3	D_s	Arbitrary Variable
4	O_obs	Observed Values
5	$R_{obs}^{s}$	Random Missingness
6	γ	Stopping Criteria
7	$D_{old}^{imp}$	Previously Imputed Matrix
8	$X_{new}^{imp}$	Newly Imputed Matrix

The outline for the missForest algorithm is presented in section 4.3.1.

4.3.1 Outline of missForest algorithm

Input: Heart Disease Dataset with Missing Values.

Process

Step 1: Consider the Dataset D_x with missing values and calculate the percentage of missingness P.

Step 2: The initial assumption of missing values is performed using the Mean operation.

Step 3: Based on the % of missingness P arrange the values in ascending order.

Step 4: If the stopping criteria are not met (γ): $D_{old}^{imp} \leftarrow$ the initial imputed dataset using mean operation.

Step 5: Iterate the operation by fitting the Random Forest tree for predicting the $D_{s} : N_{obs}^{(s)} \sim R_{obs}^{(s)}$ , the arbitrary variable is considered as D_s,and non-missing values present in the dataset are denoted by $N_{obs}^{(s)}$ , the random missingness is represented as $R_{obs}^{(s)}$ .

Step 6: Based on the Random Forest tree prediction of the missing values on D_s is performed using $N_{obs}^{(s)}$ .

Step 7: Update imputed matrix $X_{new}^{imp}$ , using predicted M_miss .

Step 8: Repeat steps 1 to 7 until the stopping criteria are met.

Output: Heart Disease Dataset with no missing values.

5 Feature selection subsystem

The most important features for the classifier model are selected by the feature selection subsystem. Combining a Filter-Wrapper based approach with a FCBF as the filter approach and the union of two wrapper-based algorithms, namely ASO and HGSO, with accuracy, G-mean, and MCC measured by the SVM as the fitness function is used to select features. The overall performance of the classifier is improved by selecting the feature subset using two criteria: the highest accuracy and the minimum feature size.

The wrapper algorithm uses the fitness function to determine which solution is the most relevant, and this feature subset is the optimal solution. For this research, the SVM was used as the fitness function to evaluate several wrapper-based feature selection methods with respect to their accuracy, G-mean, and MCC.

5.1 Feature selection using filter-based approach

Filter-based feature selection operates by assigning each feature in the dataset a score based on how relevant it is to the problem at hand, and then ranking those features in order of their score. Features with low relevance scores are discarded when a statistical measure is applied to the pair of input features and the class label to determine their correlation.

5.1.1 Fast correlation based filter approach (FCBF)

A multivariate algorithm with phases, FCBF filters data based on feature correlation relative to a metric called Symmetric Uncertainty (SU), which is basically an improved version of the Information Gain (IG) whose values fall within the interval [0, 1]. Using the FCBF approach, in which the correlation between any features and the class label is sorted in order, the Symmetric Uncertainty may be estimated to minimize the bias influence of features and, in turn, quantify the effectiveness of each feature in the dataset [46]. FCBF’s primary strategy is split in two parts: the first is the relevance analysis, and the second is the redundancy analysis. Correlation analysis is then used to determine how important each remaining feature is after the irrelevant ones have been filtered out. The parameters for the FCBF algorithm is outlined in Table 4.

Table 4
Parameters used for FCBF algorithm

No Parameters Phrase

1 $N_{obs}^{s}$ Non-Missing Values

2 M_miss Missing Values

3 D_s Arbitrary Variable

4 O_obs Observed Values

5 $R_{obs}^{s}$ Random Missingness

6 γ Stopping Criteria

7 $D_{old}^{imp}$ Previously Imputed Matrix

8 $X_{new}^{imp}$ Newly Imputed Matrix

9 SU Symmetric Uncertainty

10 F_i, The First Feature in the Sorted List

11 F_s Feature Subset

12 C_s Class Label

13 δ Threshold

14 IG Information Gain

15 X and Y Features

16 IG (X_i|Y_i) Information Gain

17 H_i(X), H_i(Y) Entropy

18 P(x_i) Prior Probability

19 P(x_a|y_b) Posterior Probability

20 R Linear Correlation Coefficient

21 $\underset{- i}{x}, \underset{- i}{y}$ Mean of X and Y

No	Parameters	Phrase
1	$N_{obs}^{s}$	Non-Missing Values
2	M_miss	Missing Values
3	D_s	Arbitrary Variable
4	O_obs	Observed Values
5	$R_{obs}^{s}$	Random Missingness
6	γ	Stopping Criteria
7	$D_{old}^{imp}$	Previously Imputed Matrix
8	$X_{new}^{imp}$	Newly Imputed Matrix
9	SU	Symmetric Uncertainty
10	F_i,	The First Feature in the Sorted List
11	F_s	Feature Subset
12	C_s	Class Label
13	δ	Threshold
14	IG	Information Gain
15	X and Y	Features
16	IG (X_i\|Y_i)	Information Gain
17	H_i(X), H_i(Y)	Entropy
18	P(x_i)	Prior Probability
19	P(x_a\|y_b)	Posterior Probability
20	R	Linear Correlation Coefficient
21	$\underset{- i}{x}, \underset{- i}{y}$	Mean of X and Y

In the redundancy analysis, the Symmetric Uncertainty correlation between the individual features is calculated based on a Markov Blanket Concept in which for two relevant features Fi and Fj, based on the condition of SUj,i≥SUi,c the elimination of one feature is done. The subset of relevant features is represented as Fs and each of the features is highly correlated with the Class label Cs with respect to SU≥ts. The equation for the evaluation of Symmetric Uncertainty is presented in Equation (2).

$SU (X_{i}, Y_{i}) = 2 \frac{IG (X_{i} | Y_{i})}{H_{i} (X_{i}) + H_{i} (Y_{i})}$ (2)

The Information Gain (IG) for each feature with respect to the class label is presented in Equation (3).

$IG (X_{i}, Y_{i}) = H_{i} (X_{i}) - H (X_{i} | Y_{i})$ (3)

The uncertainty of a particular variable is calculated using Entropy as the evaluation parameter which is calculated as per the equation presented in Equation (4) & Equation (5).

$H_{i} (X) = - \sum_{i} P (x_{a}) {log}_{2} (P (x_{a}))$ (4)

$H (X | Y) = \sum_{j} P (y_{b}) \sum_{i} P (x_{a} | y_{b}) {log}_{2} (P (x_{a} | y_{b}))$ (5)

Given two features F_i and F_j (i ≠ j) so that SU_j,i ≥ SU_i,c then F_j forms an approximate Markov blanket for F_i iff SU(F_i, F_j) ≥ SU(X_i, Y). The relevance score for each feature is calculated using Equation (6).

$r = \frac{\sum_{i} (x_{i} - {\bar{x}}_{i}) (y_{i} - {\bar{y}}_{i})}{\sqrt{{\sum_{i} (x_{i} - {\bar{x}}_{i})}^{2}} \sqrt{{\sum_{i} (y_{i} - {\bar{y}}_{i})}^{2}}}$ (6)

The steps in the Fast Correlation Based Filter (FCBF) are presented in Section 5.3.1.

5.1.2 Steps in Fast Correlation Based Filter (FCBF)

Input: Heart Disease Dataset with no missing values

Process

Step 1: The Symmetric uncertainty score for each feature is calculated based on SU (X, Y).

Step 2: The relevant features are stored on a list SU_List’ based on the threshold δ, in which the features are selected if their value is above or equal to that threshold value.

Step 3: Arrange the SU_List’ in decreasing order based on the SU (X, Y) values.

Step 4: Based on the first feature F_i in the SU_List’, if F_i is the approximate Markov blanket for the next feature F_j in the list, then the feature F_j is removed from the sorted list.

Step 5: The filtering process for all the remaining features is done for SU_List’ with the feature F_i.

Step 6: Iterate steps 1-5 until no more features can be removed from the list.

Output: Optimal Feature Subset (Filter Approach).

5.2 Feature selection subsystem using wrapper-based approach

The union of two wrapper-based algorithms namely ASO, and HGSO with the combination of accuracy, G-mean, and MCC measured by the SVM classifier as the fitness function is used for selecting the most relevant features. The section 5.3.1 presents the outline of ASO algorithm in detail.

5.2.1 Atom search optimization algorithm (ASO)

The ASO algorithm mimics the core assumptions of atomic and molecular dynamics, modelling their fundamental properties such the force between atoms, the geometric force between them, and the potential function between them. The population of solutions in this method is referred to as atoms, and each atom has two crucial properties: its position and its velocity [47]. The inter-atomic distance varies within a certain range because of the atoms’ repulsion feature. As an example, to utilize ASO for feature selection, two constraints are represented adequately: the representation of the atom particles and the definition of the fitness function. A “1” signifies that the corresponding feature has been chosen, whereas a “0” indicates that it has not. Table 5 presents the ASO’s parameter settings.

Table 5
Parameters for ASO for feature selection

No Parameters Phrase

1 a_i Acceleration of the atom

2 F_i Interaction Force

3 G_i Constraint Force

4 m_i Mass of the atom

5 ɛ The depth of the potential well

6 σ The inter-particle distance

7 r_ij Positions of atoms in the n^th dimension

8 ${(\frac{σ}{r_{ij}})}^{12}$ The attraction of the atoms

9 ${(\frac{σ}{r_{ij}})}^{6}$ Repulsion of the atoms

10 η(x) Depth Function

11 T Maximum Number of Iterations = 100

12 α Depth Weight = 50

13 h_min, h_max Lower and Upper Limit

14 u Upper Limit

15 g Drift Function

16 rand _i A random number between [0,1]

17 m_i Mass of the atoms

18 $X_{best}^{d}$ Position of the Best Atom

19 λ Lagrangian multiplier

20 Fit _best Best Solution

21 Fit _worst Worst Solution

22 N Number of Search Agents

23 β Multiplier Weight which is set as 0.2

24 h A function for each iteration

No	Parameters	Phrase
1	a_i	Acceleration of the atom
2	F_i	Interaction Force
3	G_i	Constraint Force
4	m_i	Mass of the atom
5	ɛ	The depth of the potential well
6	σ	The inter-particle distance
7	r_ij	Positions of atoms in the n^th dimension
8	${(\frac{σ}{r_{ij}})}^{12}$	The attraction of the atoms
9	${(\frac{σ}{r_{ij}})}^{6}$	Repulsion of the atoms
10	η(x)	Depth Function
11	T	Maximum Number of Iterations = 100
12	α	Depth Weight = 50
13	h_min, h_max	Lower and Upper Limit
14	u	Upper Limit
15	g	Drift Function
16	rand _i	A random number between [0,1]
17	m_i	Mass of the atoms
18	$X_{best}^{d}$	Position of the Best Atom
19	λ	Lagrangian multiplier
20	Fit _best	Best Solution
21	Fit _worst	Worst Solution
22	N	Number of Search Agents
23	β	Multiplier Weight which is set as 0.2
24	h	A function for each iteration

Like all the other metaheuristic algorithm the ASO also randomly initialize the population of atoms. The atoms are considered as the solutions which consist of two main vectors namely position and velocity. The acceleration of the atom inside a particle is presented in Equation (7):

$a_{i} = \frac{F_{i} + G_{i}}{m_{i}}$ (7) The Lennard–Jones potential between two atoms, i^th, and j^th atom is presented in Equation (8).

$U (r_{ij}) = 4 ɛ [{(\frac{σ}{r_{ij}})}^{12} - {(\frac{σ}{r_{ij}})}^{6}]$ (8) The interaction force and the distance between the atoms are calculated based on the Lennard–Jones potential presented in Equation (9):

$F_{ij}^{'} = \frac{24 ɛ}{σ} [2 {(\frac{σ}{r_{ij}})}^{13} - {(\frac{σ}{r_{ij}})}^{7}]$ (9)

The attraction force of atoms is always in a positive direction and the repulsion property always tends in a negative direction which results in the atoms converging in a specific position. Equation (8) is modified based on the depth function which adjusts the attraction and the repulsion properties of the atoms presented in Equation (10):

$F_{ij}^{'} = η [2 {(\frac{σ}{r_{ij}})}^{13} - {(\frac{σ}{r_{ij}})}^{7}]$ (10) The depth function between two atoms is calculated in Equation (11) as follows:

$η (t) = α {(1 - \frac{t - 1}{T})}^{3} e^{- \frac{20 t}{T}}$ (11) Where h_ij (t) is calculated in Equations (12) and (13) as follows:

$h_{ij} (t) = {\begin{matrix} h_{min} if \frac{r_{ij} (t)}{σ (t)} < h_{min} \\ \frac{r_{ij} (t)}{σ (t)} if h_{min} ⩽ \frac{r_{ij} (t)}{σ (t)} ⩽ h_{max} \\ h_{max} if \frac{r_{ij} (t)}{σ (t)} > h_{max} \end{matrix}}$ (12) ${\begin{matrix} h_{m i n} = g_{0} + g (t) \\ h_{m a x} = u \end{matrix}$ (13)

The drift function enhances the algorithm to traverse from exploration to the exploitation phase which is presented in Equation (14).

$g = 0.1 \times sin (\frac{π}{2} \times \frac{t}{T})$ (14) The mass of the atom is presented in Equation (15)

$M_{i} = e^{- \frac{Fi t_{i} - Fi t_{best}}{Fi t_{worst} - Fi t_{best}}}$ (15) The Number of neighbors K is presented in Equation (16) as follows:

$K (t) = N - (N - 2) \sqrt{\frac{t}{T}}$ (16) The weighted sum of components of the forces in the d^th dimension is presented in Equation (17)

$F_{i}^{d} (t) = \sum_{j \in KBest} rando m_{j} F_{ij}^{d} (t)$ (17) The acceleration between the atoms is calculated using Equation (18)

$\begin{matrix} a_{i}^{d} = - α {(1 - \frac{t - 1}{T})}^{13} e^{\frac{- 20 t}{T}} \sum_{j \in K} \\ \frac{ran d_{j} [2 {(h_{ij})}^{13} - {(h_{ij})}^{7}]}{m_{i}} \frac{X_{j}^{d} - X_{i}^{d}}{∥ X_{i}, X_{j} ∥} \\ + β e^{\frac{- 20 t}{T}} \frac{X_{best}^{d} - X_{i}^{d}}{m_{i}} \end{matrix}$ (18) The calculation of constraint force between the atoms is presented in Equation (19)

$G_{i}^{d} = λ (X_{best}^{d} - X_{i}^{d})$ (19)

The number of best atoms based on the fitness function is grouped and this enhances the process of exploration and exploitation. The higher number of K makes the atoms do the process of exploration and the lower value for K does the exploitation process which in turn results in the best solution or the best feature subset.

The Support Vector Machine (SVM) classifier evaluates each feature based on the combination of the fitness function by taking the square root of accuracy, G-mean, and MCC of the SVM as the fitness function.

Steps for Feature Selection using ASO Algorithm

The ASO algorithm mimics and mathematically models the atomic motion model in nature, which deals with the interaction force between the atoms and constraint forces between them. The distance between two atoms is calculated using the Euclidean Distance metric in the original algorithm whereas in this work the Mahalanobis distance metric is used to calculate the distance between two atoms. The Mahalanobis distance metric has the advantage of evaluating the mean and variance of each attribute in the dataset and it avoids the problems of scaling and correlation in the Euclidean distance. The usage of Mahalanobis distance finds the distance between two data points in the multivariate space and is presented in Equation (20):

$D^{2} = {(x - m)}^{T} . C^{- 1} . (x - m)$ (20)

The Outline of Feature Selection Using the ASO algorithm is presented in Fig. 2:

Fig. 2

The Outline of Feature Selection Using ASO algorithm.

Input: Optimal Feature Subset (Filter Approach).

Process:

Step 1: The position and the velocity of the atoms are initialized randomly.

Step 2: The fitness of each atom in the particle is calculated.

Step 3: The attraction and repulsion forces between the atoms need to be calculated to enhance the exploration using Equation (8).

Step 4: Calculate the mass of the Atom using Equation (15).

Step 5: Calculate the K Neighbours for each atom in the particle using Equation (16).

Step 6: Calculate the interaction force between the two atoms along with the constraint force using Equation (17).

Step 7: Once the interaction force between the atoms is calculated the acceleration of the atoms is calculated using Equation (18).

Step 8: Update the atom position and Velocity using Equation (19), and Equation (20).

Step 10: Repeat steps 2 to 8 until the stopping criteria are met.

Output: Optimal feature subset (Wrapper Approach).

5.2.2 Henry Gas Solubility Optimization (HGSO)

Henry’s Law states that “at a fixed temperature, the amount of a given gas that dissolves in a given kind and volume of liquid is directly proportional to the partial pressure of that gas in equilibrium with that liquid,” and this is the basis for HGSO, a physics-based population algorithm. HGSO maintains a collection of potential solutions as gas particles that can be combined with a liquid [48]. In the phase of exploration and exploitation, the properties of these particles are modified to locate the best possible solutions in the search space. Table 6 presents the HGSO’s feature-selection settings.

Table 6
Parameters for HGSO for feature selection

No Parameters Phrase

1 S_g Solubility of a gas

2 H Henry’s constant

3 P_g The partial pressure of the gas

4 rand Random Function

5 LB, UB Lower and Upper Bounds

6 H _j Henry’s constant

7 l, r₁ Random Number and a Constant Value

8 C _j C_j is a constant value of type j.

11 T Temperature between the Gas particles

12 T ^Θ The constant value which is equal to 298.15

13 K Constant Value

14 P_ij (t) The partial pressure on each solution

15 η The ability of a gas to interact with other solutions

16 F_ij Overall Fitness Function

17 F _g The value that is used to change the direction of the gas

18 α Influence parameter

19 F_b Solution with the highest fitness value

20 r, β, t, j Constant Value

21 U _ib The best gas molecule in the cluster group

22 G _ij Position of the gas

23 $G_{ij}^{max} and G_{ij}^{min}$ The boundary of the particular problem

24 ɛ Value to avoid dividing by zero error

25 G_c Type of the gas

26 N Total number of Agents

27 N $\overset{´}{ω}$ Total number of Worst agents

28 c₂, c₁ Constant parameters

No	Parameters	Phrase
1	S_g	Solubility of a gas
2	H	Henry’s constant
3	P_g	The partial pressure of the gas
4	rand	Random Function
5	LB, UB	Lower and Upper Bounds
6	H _j	Henry’s constant
7	l, r₁	Random Number and a Constant Value
8	C _j	C_j is a constant value of type j.
11	T	Temperature between the Gas particles
12	T ^Θ	The constant value which is equal to 298.15
13	K	Constant Value
14	P_ij (t)	The partial pressure on each solution
15	η	The ability of a gas to interact with other solutions
16	F_ij	Overall Fitness Function
17	F _g	The value that is used to change the direction of the gas
18	α	Influence parameter
19	F_b	Solution with the highest fitness value
20	r, β, t, j	Constant Value
21	U _ib	The best gas molecule in the cluster group
22	G _ij	Position of the gas
23	$G_{ij}^{max} and G_{ij}^{min}$	The boundary of the particular problem
24	ɛ	Value to avoid dividing by zero error
25	G_c	Type of the gas
26	N	Total number of Agents
27	N $\overset{´}{ω}$	Total number of Worst agents
28	c₂, c₁	Constant parameters

According to this law, the solubility of a gas (S_g) is directly proportional to the partial pressure of the gas (P_g) which is calculated in Equation (21):

$S_{g} = H * P_{g}$ (21) For the N number of gases in the HGSO, it assigns the random values are presented in Equation (22).

$U_{i} = LB + rand * (UB - LB), rand \in [0, 1]$ (22) Henry’s constant is presented in Equation (23):

$H_{j} = l \times r_{1}, j = 1, 2, . . . . . . . ., G_{c}, l = 5 E - 2$ (23) The best solution is determined within a group and also evaluates the best solution globally with the highest fittest value. The coefficient of Henry (H_j) is presented in Equation (24).

$\begin{matrix} H_{j}^{t + 1} = & H_{j}^{t} \times exp [- C_{j} \times (\frac{1}{T^{t}} - \frac{1}{T^{θ}})], \\ T^{t} = exp \frac{- t}{t_{max}} \end{matrix}$ (24) The solubility of each gas is presented in Equation (25).

$S_{ij} (t) = K \times H_{j} (t + 1) \times P_{ij} (t)$ (25) Where K is the constant and P_ij (t) is the partial pressure on each solution. The update based on the fitness value is presented in Equation (26):

$\begin{matrix} U_{ij} (t + 1) = U_{ij} (t) + F_{g} \times r \times η \times (U_{ib} (t) - U_{ij} (t)) \\ + F_{g} \times r \times a \times (S_{ij} (t) \times U_{ib} (t) - U_{ij} (t)) \\ η = β \times exp (- (F_{B} (t) + ɛ) / (F_{ij} (t) + ɛ)) \end{matrix}$ (26)

The exploration methodology seeks new solutions across the search space to avoid becoming stuck on local optima and is presented in Equation (27).

$N_{ω} = N \times (c_{1} + rand (0, 1) (c_{2} - c_{1}))$ (27) The updation of the worst solutions in HGSO is presented in Equation (28).

$G_{ij} = G_{ij}^{min} + r \times (G_{ij}^{max} - G_{ij}^{min}), i = 1, 2, . . . . . N_{ω}$ (28)

As with other metaheuristic algorithms, the HGSO algorithm begins by an initial set of candidate solutions, then modifies existing solutions, and evaluates overall fitness before selecting the most efficient solution. The Outline of Feature Selection using the HGSO algorithm is presented in Fig. 3:

Fig. 3

Outline of Feature Selection Using the HGSO Algorithm.

Steps for Feature Selection using HGSO

Input: Optimal Feature Subset (Filter Approach).

Process:

Step 1: Randomly initialize the population of solutions S_u.

Step 2: Split the population into two groups G_u with the same Henry Constant Value.

Step 3: Calculate the fitness of each solution F_ij.

Step 4: Determine the best solution in the group and also determine the overall best solution.

Step 4.1: Update the fitness F_ij of each gas particle.

Step 4.2: Update the coefficient of Henry’s using Equation (23).

Step 4.3: Update the Solubility of the solutions using Equation (24).

Step 4.4: Perform the position update using Equation (25).

Step 4.5: The avoidance of stuck in local optimal using Equation (26).

Step 4.6: Update the position of the worst gas solutions using Equation (27), Equation (28).

Step 4.7: Update the local best solution and global best solution.

Step 5: Repeat steps 2 to 4.7 until the stopping criteria are met.

Output: Optimal Feature Subset (Wrapper Approach).

6 Classification subsystem

A union operation between the wrapper-based bio-inspired algorithms known as ASO and HGSO is used to select the features that are the most relevant to the problem at hand. The selected features are used to train four boosted classifiers namely, AdaBoost, XGBoost, CatBoost, and LightGBM. The dataset is divided into two sets: the training set, which is used to train the classifier, and the testing set, which is used to evaluate the performance of the classifier.

The performance of the classifier is influenced by a wide range of factors, including imputation of missing values, the detection and removal of outlier values, feature engineering to select the most relevant features for the classifier while avoiding the irrelevant features present in the dataset, tuning the hyperparameters, applying a variety of classification models, using an ensemble approach, and many more.

Based on these factors, the datasets in this work contain missing values, and rather than removing these missing values, the missForest imputation method is presented to impute the missing values present in the dataset. The presence of irrelevant features in the dataset is dealt with by a Filter-Wrapper based feature selection approach utilizing an FCBF as the filter approach and the union of two algorithms namely ASO, and HGSO with the fitness function as the combination of accuracy, G-mean, and MCC measured by the SVM classifier.

The proposed approach is evaluated using four boosted classifiers, XGBoost, AdaBoost, CatBoost, and LightGBM, and achieves superior classification performance compared to other classifiers.

6.1 AdaBoost classifier

The iterative ensemble approach combines multiple weak performing classifiers, thereby improving the classifiers’ overall accuracy. By merging the weak classifiers, AdaBoost creates a robust classifier with a higher accuracy performance score. The AdaBoost classifier generates the weights of the classifier and trains the data samples with each iteration, ensuring accurate predictions for unseen data samples [49].

The two properties of AdaBoost make it more reliable when compared with the other classifiers, as it should be trained on various weighted training data, and for each iteration, the training error is minimized. All the observations in the dataset are given equal weights and a model is built on the subset of the data sample. This model is used for the classification of the dataset and the errors are calculated by comparing with the classifications made and the actual sample data. The training weights are updated giving more weight to incorrectly predicted instances, and less weight to correctly predicted instances. In the AdaBoost classifier, the overfitting condition is handled by a parameter n_estimators and finds a point where the model begins to overfit. This point is passed and the algorithm is executed again. The entire process is iterated until there is no change found in the error function or with the number of estimators reached. The outline of the AdaBoost classifier is presented in section 6.1.1. Table 7 outlines the parameters used for AdaBoost classifier in this research.

Table 7
Parameters for AdaBoost classifier

No Parameters Phrase

1 Z _t Normalization Factor

2 D Training Instance

3 α_t Weights t

4 h _t Weak Learner

5 x _i Data Sample

6 n_estimators Base Estimators for Identifying Overfitting

No	Parameters	Phrase
1	Z _t	Normalization Factor
2	D	Training Instance
3	α_t	Weights t
4	h _t	Weak Learner
5	x _i	Data Sample
6	n_estimators	Base Estimators for Identifying Overfitting

6.1.1 Classification using AdaBoost

Input: Heart disease training dataset.

Process

Step 1: Assign Equal weights to all the observations in the dataset.

Step 2: Fit a Decision Tree model to the random samples and the class label for the dataset is predicted.

Step 3: Calculate the total error based on the weights of misclassification.

Step 4: Calculate the performance of the base learner with respect to the total error.

Step 5: Update the weights to correctly classify the misclassified data point by increasing the sample weight using Equation (29).

$D_{i + 1} (i) = \frac{D_{t} (i) e^{- α_{t} h_{t} (x_{i})}}{Z_{t}}$ (29)

Step 6: Termination Condition: Repeat Step 2 –Step 6 until the termination criteria are met (low Training error).

Step 7: Test data is presented to the trained AdaBoost classifier and obtain the prediction class labels.

Output: The accuracy of the classifier is computed, where the accuracy is the percentage of the testing set correctly classified by the classifier.

6.2 XGBoost classifier

XGBoost is a gradient boosting method that creates the models in the form of a classifier which uses an ensemble approach in which the newer models are created based on the previous model errors. The newer models are created that compute the error in the previous model and then leftover is added to make the final prediction [49]. Based on the similarity score the decision trees are created in the XGBoost classifier. The similarity score is calculated using Equation (30).

$Similarity score = \frac{Gradien t^{2}}{Hessian + λ}$ (30)

Where Hessian is equal to the number of residuals; Gradient2 is the squared sum of residuals; λ is a regularization hyperparameter.

The features in the dataset are the conditional node or internal node in tree algorithms, and the splitting of trees into branches or edges is predicated on the root node. A node is considered a leaf node if there are no more edges and the best solution is found by splitting the node. Additionally, XGBoost applies decision-tree algorithms to a given dataset and classifies the data based on their results. XGBoost is predicated on gradient-boosted trees with supervised learning as the primary approach. In section 6.2.1, an overview of the XGBoost classifier is presented. Table 8 outlines the parameters used for XGBoost classifier in this research.

Table 8

Parameters for XGBoost classifier

No	Parameters	Phrase
1	w _j	Constant in the region R_j
2	I	Set of indices of input x
3	x	Input variable
4	T	Number of leaves in the tree
5	G, H	Gain and Loss reduction
6	L, R	Left and Right branches of the tree
7	$\hat{b} (x)$	Minimization Risk Factor
8	f_k (x)	Sum of all Base Learners
9	λ	Regularization Parameter
10	W^*	Leaf Weight
11	early_stopping_rounds	Number of Iterations to be Performed

6.2.1 Classification using XGBoost

Input: Heart disease training dataset.

Process

Step 1: The base learners are initialized and the additive model is considered as the sum of the base learners.

Step 2: For each base learner present the tree model is outlined in Equation (31).

$F (x) = \sum_{j = 1}^{T} w_{j} I [x \in R_{j}]$ (31) Where, w_j is a constant in the region R_j, I is considered as the set of indices of input x assigned to the leaf node, and T is the number of leaves in the tree.

Step 3: Determine the leaf weight w^* and is presented in Equation (32).

$w^{*} = \frac{G}{H}$ (32) Where G and H are gain and loss reduction respectively.

Step 4: Determine the structure by choosing splits with maximized gains which are presented in Equation (33).

$A = \frac{1}{2} [\frac{G_{L}^{2}}{H_{L}} + \frac{G_{R}^{2}}{H_{R}} + \frac{G^{2}}{H}]$ (33) Where L and R are the left and right branches of the tree.

Step 5: Iterate the process by adding more trees based on Equation (34).

$f_{k} (x) = f_{k - 1} (x) + \hat{b} (x)$ (34)

Where $\hat{b} (x)$ is the minimization risk factor for the base learners, f_k (x) is the sum of all base learners.

Step 6: Termination Condition: Repeat Step 2 –Step 5 until the termination criteria are met (reaching the maximum depth of the tree or there is no further node to process).

Step 7: Test data is presented to the trained XGBoost classifier and obtain the prediction class labels.

Output: The accuracy of the classifier is computed, where the accuracy is the percentage of the testing set correctly classified by the classifier.

6.3 CatBoost classifier

CatBoost is a machine learning technique that incorporates the implementation of ordered boosting and a novel approach for dealing with categorical information. It is a Gradient Boosting-based decision tree that handles categorical and ordered features. The permutation operation is the primary highlight of the CatBoost classifier, which ranks the created model’s features using a LFC [51]. Loss Function Change computes the change in prediction when a feature’s related value changes. During model training, a series of DT are sequentially generated to yield trees with decreasing loss. In other words, each DT learns from the preceding tree and influences the subsequent tree to improve model performance, hence constructing a robust learner.

After running several boosting stages, the Ordered Boosting mechanism of the CatBoost algorithm helps to develop a prediction model. This method generates a prediction shift in the created model, which leads to a unique type of target leakage issue. By employing the ordered boosting framework, the CatBoost algorithm overcomes the mentioned challenges. Moreover, in contrast to typical learning classifiers, the CatBoost method solves the overfitting problem by employing several permutations of the training dataset. This is the principal cause for employing its intelligence in the present research work. Before constructing new trees, the CatBoost classifier verifies the number of iterations, and the classifier is deemed overfitted if the number of iterations exceeds the value set in the training parameters. The outline of the CatBoost classifier is presented in section 6.3.1. Table 9 outlines the parameters used for CatBoost classifier in this research.

Table 9
Parameters for CatBoost classifier

No Parameters Phrase

1 α Step size

2 h^t Boosting Tree

3 F^t Approximate Function

4 x ∈ R_j Indicator Function

5 I Feature Index

6 n Total Number of Instances

7 a Weight Parameter

8 P Target variable

9 od_type Control Parameter for Overfitting

No	Parameters	Phrase
1	α	Step size
2	h^t	Boosting Tree
3	F^t	Approximate Function
4	x ∈ R_j	Indicator Function
5	I	Feature Index
6	n	Total Number of Instances
7	a	Weight Parameter
8	P	Target variable
9	od_type	Control Parameter for Overfitting

6.3.1 Classification using CatBoost

Input: Heart disease training dataset.

Process

Step 1: Randomly permutate rows of S to generate new training sets and Symmetric decision trees are constructed in a recursive manner by partitioning the entire feature space and are presented in Equation (35).

$\begin{matrix} H (x) = & \sum_{j = 1}^{J} b_{j} L_{{x \in R_{j}}}, where {x \in R_{j}} \\ is an indicator function \end{matrix}$ (35)

Step 2: The estimated value for each predicted class label has to be calculated.

Step 3: Minimize the expected loss in a greedy manner which is presented in Equation (36).

$F^{t} = F^{t + 1} + α h^{t}$ (36) Where α is the step size and h^t is a boosting tree and F^t is the approximate function.

Step 3.1: Apply Ordered Target Statistics (TS) for categorical features in the dataset.

Step 3.2: Build a new Ordered Boosting tree T based on the gradient of each instance.

Step 4: The loss function is calculated based on the least square function and the negative gradient is used to solve the minimization problem.

Step 5: Termination Condition: Repeat Step 2 –Step 4 until the termination criteria are met (if the specified number of trees are built).

Step 6: The series of approximate functions are calculated and sum them to get the final model which is presented in Equation (37).

$F (x) = \sum_{t = 1}^{N} h^{t}$ (37)

Output: The accuracy of the classifier is computed, where the accuracy is the percentage of the testing set correctly classified by the classifier.

6.4 LightGBM classifier

A gradient boosting framework that uses the concept of decision tree algorithms introduced by Microsoft is LightGBM. The LightGBM classifier is based on DT and less complex with the property that the decision tree used grows in a leaf-wise manner and optimization is performed for the losses which generate branches. The leaf-wise growth strategy is adopted by the decision trees in this LightGBM classifier as it optimizes the loss generated by the branches. The histogram algorithm is used in this LightGBM which discretizes the features present in the data into K small bins as these bins are used to construct the histogram with width k. The histogram will collect the statistics required which are the gradients and number of samples in each bin. The main aim of the LightGBM classifier is to find the leaf node with a larger split gain by using a leaf-wise traversal [51].

In the LightGBM classifier, the controlling parameter used is early_stopping_rounds which controls whether the model is overfitting or not. The outline of the LightGBM classifier is presented in section 6.4. 1. Table 10 outlines the parameters used for XGBoost classifier in this research.

Table 10
Parameters for LightGBM classifier

No Parameters Phrase

1 N Number of Decision Trees

2 h (x, θ_t) Prediction Result

3 θ_t Output of the Decision Tree

4 t Decision Tree

5 x Input Variable

6 early_stopping_rounds Number of Iterations to be Performed

No	Parameters	Phrase
1	N	Number of Decision Trees
2	h (x, θ_t)	Prediction Result
3	θ_t	Output of the Decision Tree
4	t	Decision Tree
5	x	Input Variable
6	early_stopping_rounds	Number of Iterations to be Performed

6.4.1 Classification using LightGBM

Input: Heart Disease Training Dataset.

Process

Step 1: Initialize n Decision Trees.

Step 2: The weight for the training set data needs to be set as $\frac{1}{n}$ .

Step 3: The training of the weak classifier is performed.

Step 4: The weight is updated based on the weight of the weak classifier.

Step 5: The final classifier is obtained as presented in Equation (38).

$H_{t} (x) = \sum_{t = 1}^{T} h (x, θ_{t})$ (38) Where h (x, θ_t) prediction result after training, θ_t the output of the decision tree, t is the decision tree, and x is the input variable.

Step 6: Termination Condition: Repeat Step 2 –Step 5 until the termination criteria are met (No improvement in the accuracy of the model).

Output: The overall accuracy is computed.

7 Results and discussions

This section discusses the results of integrating a Filter-Wrapper technique with the FCBF, the union operation for two bio-inspired algorithms utilized for the feature, and the reduced feature subset used to train four boosted classifiers. The overall performance of datasets with feature selection is compared with the performance of four boosted classifiers: RF, CART, SVM and K-NN. The fitness function used for two wrapper-based bioinspired algorithms namely ASO, and HGSO with the combination of accuracy, G-mean, and MCC measured by the SVM and is presented in Equation (40), Equation (41), Equation (42), and Equation (43).

$\begin{matrix} Fitness function = \\ \sqrt{Accuracy * G - Mean * MCC} \end{matrix}$ (39)

$Accuracy = \frac{TP + TN}{TP + FP + FN + TN}$ (40)

$\begin{matrix} G - Mean = \sqrt{Sensivity * Specificity} \\ = \sqrt{\frac{TP}{TP + FN} * \frac{TN}{FP + TN}} \end{matrix}$ (41)

$\begin{matrix} MCC \\ = \frac{TN * TP - FN * FP}{\sqrt{(TP + FP) (TP + FN) (TN + FP) (TN + FN)}} \end{matrix}$ (42) The accuracy is calculated using Equation (43):

$\begin{matrix} Classification Accuracy \\ = \frac{Number of samples classified correctly}{Total Number of samples classified} \\ = \frac{TP + TN}{TP + FP + FN + TN} \end{matrix}$ (43) The G-mean is calculated as shown in Equation (44):

$G - Mean = \sqrt{Sensivity * Specificity}$ (44) The true positive rate is represented by sensitivity and it refers to how well the positive class was predicted. Equation (45) shows the calculation for sensitivity.

$\begin{matrix} Sensitivity = \frac{True positive}{True positive + False Negative} \\ = \frac{TP}{TP + FN} \end{matrix}$ (45) The true negative rate is represented by specificity and it refers to how well the negative class was predicted and is presented in Equation (46) shows the calculation for specificity.

$\begin{matrix} Specificity = \frac{True Negative}{False Positive + True Negative} \\ = \frac{TN}{FP + TN} \end{matrix}$ (46) The incorrectly classified cases are shown by the F1- Score which is the harmonic mean of Precision and Recall and is calculated using Equation (47):

$F 1 - Score = 2 \times \frac{Precision * Recall}{Precision + Recall}$ (47) The positive predictive value (PPV) is represented by Precision and the best precision is 1.0, whereas the worst is 0.0 shown in Equation (48).

$Precision = \frac{True Positive}{True Positive + False Positive}$ (48)

7.1 Missingness in the clinical datasets

Four clinical datasets out of the eight datasets employed for experimentation have missing values, and the missingness % for the four clinical datasets is depicted in Fig. 4.

Fig. 4

Percentage of missingness in clinical dataset.

The Cleveland dataset on heart disease has 303 cases with 14 features and 6 missing values. The Hungarian Heart Disease dataset consists of 294 cases with 14 features, with a total of 779 missing values. The Switzerland Heart Disease dataset consists of 123 cases with 14 features and a total of 274 missing values. The Long Beach Va dataset consists of 200 instances with 14 features and 698 missing values in total. Figure 4 outlines the percentage of missingness present in each dataset use din this research.

7.2 Fitness evaluation

Tables 11 12 give the performance metric of the two wrapper-based algorithms with the fitness function of the combination accuracy, G-mean, and MCC measured by the SVM classifier.

Table 11
Evaluation using fitness function measured by SVM classifier (ASO)

Dataset Accuracy Sensitivity Specificity Precision Recall Proposed Fitness

HFCD 0.92 0.92 0.95 0.96 0.92 0.84

SPECT 0.91 0.95 0.82 0.93 0.95 0.8

SPECTF 0.94 0.96 0.9 0.96 0.96 0.86

SHD 0.92 0.95 0.86 0.93 0.95 0.89

CHD 0.95 0.96 0.93 0.97 0.96 0.9

HHD 0.93 0.96 0.89 0.92 0.96 0.88

LBV 0.93 0.95 0.88 0.95 0.95 0.85

SWD 0.9 0.88 0.89 0.88 0.89 0.89

Dataset	Accuracy	Sensitivity	Specificity	Precision	Recall	Proposed Fitness
HFCD	0.92	0.92	0.95	0.96	0.92	0.84
SPECT	0.91	0.95	0.82	0.93	0.95	0.8
SPECTF	0.94	0.96	0.9	0.96	0.96	0.86
SHD	0.92	0.95	0.86	0.93	0.95	0.89
CHD	0.95	0.96	0.93	0.97	0.96	0.9
HHD	0.93	0.96	0.89	0.92	0.96	0.88
LBV	0.93	0.95	0.88	0.95	0.95	0.85
SWD	0.9	0.88	0.89	0.88	0.89	0.89

Table 12

Evaluation using fitness function measured by SVM classifier (HGSO)

Dataset	Accuracy	Sensitivity	Specificity	Precision	Recall	Proposed fitness
HFCD	0.94	0.92	0.93	0.9	0.91	0.9
SPECT	0.9	0.88	0.89	0.9	0.88	0.89
SPECTF	0.92	0.9	0.89	0.91	0.9	0.9
SHD	0.95	0.92	0.9	0.9	0.93	0.9
CHD	0.96	0.92	0.93	0.93	0.93	0.91
HHD	0.95	0.94	0.9	0.95	0.94	0.89
LBV	0.91	0.95	0.81	0.89	0.9	0.86
SWD	0.89	0.87	0.86	0.88	0.88	0.85

The evaluation metrics employed include accuracy, sensitivity, specificity, precision, and recall, and the fitness of the ASO method as measured by the SVM classifier is generated. The graphical representation of the performance evaluation using the fitness function is presented in Fig. 5.

Fig. 5

Evaluation Using Fitness Function Measured by SVM Classifier (ASO).

From the graph in Fig. 5, it can be inferred that the heart disease dataset performs well in terms of accuracy, sensitivity, specificity, precision, recall, and proposed fitness with respect to the ASO algorithm and the fitness function.

The performance metrics employed include accuracy, sensitivity, specificity, precision, and recall, and the SVM classifier’s fitness is determined for the HGSO. The graphical representation of the performance evaluation using the fitness function is presented in Fig. 6.

Fig. 6

Evaluation Using Fitness Function Measured by SVM Classifier (HGSO).

From the graph in Fig. 6, it can be inferred that the heart disease dataset performs well with respect to the HGSO algorithm and the fitness function in terms of accuracy, sensitivity, specificity, precision, recall, and the proposed fitness.

The union operation is performed between both wrapper approaches, and the most relevant features are chosen. XGBoost, Adaboost, Catboost, and LightGBM are trained using the selected features. Following is a summary of the overall performance of feature selection with four classifiers:

7.3 Classifier performance evaluation

The overall performance evaluation using the XGBoost classifier based on accuracy, sensitivity, F1-score, precision, and recall is presented in Table 13. The Graphical plot for Table 13 is shown in Fig. 7. From the graphical plot it is inferred that with respect to the XGBoost classifier along with the proposed feature selection approach the datasets used for experimentation perform well in terms of accuracy, precision, recall, and F1-Score.

Table 13
Performance evaluation using XGBoost classifier

XGBoost Classifier

Dataset Accuracy F1-Score Precision Recall

HFCD 0.89 0.85 0.87 0.86

SPECT 0.81 0.78 0.8 0.79

SPECTF 0.9 0.85 0.87 0.86

SHD 0.8 0.77 0.79 0.78

CHD 0.8 0.77 0.79 0.78

HHD 0.82 0.77 0.79 0.78

LBV 0.8 0.76 0.78 0.77

SWD 0.97 0.94 0.96 0.95

XGBoost Classifier
HFCD	0.89	0.85	0.87	0.86
SPECT	0.81	0.78	0.8	0.79
SPECTF	0.9	0.85	0.87	0.86
SHD	0.8	0.77	0.79	0.78
CHD	0.8	0.77	0.79	0.78
HHD	0.82	0.77	0.79	0.78
LBV	0.8	0.76	0.78	0.77
SWD	0.97	0.94	0.96	0.95

Fig. 7

Performance Evaluation using XGBoost Classifier.

The overall performance evaluation using the AdaBoost classifier based on accuracy, sensitivity, F1-score, precision, and recall is presented in Table 14. The Graphical plot for Table 14 is shown in Fig. 8. From the graphical plot it is inferred that with respect to the XGBoost classifier along with the proposed feature selection approach the datasets used for experimentation perform well in terms of accuracy, precision, recall, and F1-Score.

Table 14

Performance evaluation using AdaBoost classifier

AdaBoost Classifier
Dataset	Accuracy	F1-Score	Precision	Recall
HFCD	0.84	0.78	0.81	0.79
SPECT	0.8	0.74	0.77	0.75
SPECTF	0.82	0.75	0.78	0.76
SHD	0.8	0.75	0.78	0.76
CHD	0.85	0.77	0.8	0.78
HHD	0.85	0.78	0.81	0.79
LBV	0.81	0.76	0.79	0.77
SWD	0.89	0.85	0.88	0.86

Fig. 8

Performance Evaluation using AdaBoost Classifier.

The overall performance evaluation using the CatBoost classifier based on accuracy, sensitivity, F1-score, precision, and recall is presented in Table 15. The Graphical plot for Table 15 is shown in Fig. 9. From the graphical plot in Fig. 9 it is inferred that with respect to the AdaBoost classifier along with the proposed feature selection approach the datasets used for experimentation perform well in terms of accuracy, precision, recall, and F1-Score.

Table 15

Performance evaluation using CatBoost classifier

CatBoost Classifier
Dataset	Accuracy	F1-Score	Precision	Recall
HFCD	0.83	0.78	0.81	0.79
SPECT	0.89	0.85	0.88	0.86
SPECTF	0.93	0.88	0.91	0.89
SHD	0.81	0.77	0.8	0.78
CHD	0.83	0.76	0.79	0.77
HHD	0.85	0.8	0.83	0.81
LBV	0.79	0.74	0.77	0.75
SWD	0.98	0.93	0.96	0.94

Fig. 9

Performance Evaluation using CatBoost Classifier.

The overall performance evaluation using the LightGBM classifier based on accuracy, sensitivity, F1-score, precision, and recall is presented in Table 16. The Graphical plot for Table 16 is shown in Fig. 10. From the graphical plot in Fig. 10 it is inferred that with respect to the LightGBM classifier along with the proposed feature selection approach the datasets used for experimentation perform well in terms of accuracy, precision, recall, and F1-Score.

Table 16

Performance Evaluation using LightGBM Classifier

LGBM Classifier
Accuracy	F1-Score	Precision	Recall
Dataset
HFCD	0.8	0.76	0.78	0.77
SPECT	0.82	0.8	0.82	0.81
SPECTF	0.8	0.77	0.79	0.78
SHD	0.8	0.75	0.77	0.76
CHD	0.86	0.77	0.79	0.78
HHD	0.82	0.77	0.79	0.78
LBV	0.82	0.78	0.8	0.79
SWD	0.97	0.93	0.95	0.94

Fig. 10

Performance Evaluation using LightGBM Classifier.

7.4 Confusion matrix for the best performing classifier

The confusion matrix is the metric used to evaluate the performance of classification algorithms. Figures 11 –14 presents the confusion matrix for the best-performing classifier in this research. Presented here are the four boosted classifiers that performed better in terms of accuracy when applied to the SWD dataset.

Fig. 11

Confusion Matrix for XGBoost Classifier.

Fig. 12

Confusion Matrix for AdaBoost Classifier.

Fig. 13

Confusion Matrix for XGBoost Classifier.

Fig. 14

Confusion Matrix for XGBoost Classifier.

7.5 Misclassification Rate

The misclassification rate measures the proportion of inaccurate predictions generated by the model. The misclassification rate for each dataset utilizing four boosted classifiers is displayed in Table 14 and calculated using Equation (51):

$\begin{matrix} Misclassification \\ = \frac{Incorrect Predictions}{Total Number of Predictions} \end{matrix}$ (51)

The relation between misclassification rate and accuracy is presented as Accuracy = 1- misclassification rate. The misclassification rate is presented in Table 17 for four boosted classifiers:

Table 17

Misclassification rate for the classifiers

Dataset	XGBoost	AdaBoost	CatBoost	LightGBM
HFCD	0.16	0.16	0.17	0.20
SPECT	0.20	0.20	0.11	0.18
SPECTF	0.18	0.18	0.07	0.20
SHD	0.20	0.20	0.19	0.20
CHD	0.15	0.15	0.17	0.14
HHD	0.15	0.15	0.15	0.18
LBV	0.19	0.19	0.21	0.18
SWD	0.03	0.11	0.02	0.03

7.6 ROC-AUC curves

The ROC-AUC curve, which illustrates the probability curve in terms of true positive rate and false positive rate, may be used to analyze binary classification problems. The area under the curve evaluates the classifier’s ability to differentiate between two classes for various user-specified thresholds. Figures 15 –22 outlines the ROC-AUC curve for eight datasets classified by four boosted classifiers.

Fig. 15

ROC-AUC for HFCD Dataset.

Fig. 16

ROC-AUC for SPECT Dataset.

Fig. 17

ROC-AUC for SPECTF Dataset.

Fig. 18

ROC-AUC for SHD Dataset.

Fig. 19

ROC-AUC for CHD Dataset.

Fig. 20

ROC-AUC for HHD Dataset.

Fig. 21

ROC-AUC for LBV Dataset.

Fig. 22

ROC-AUC for SWD Dataset.

7.7 Loss function curves for four classifiers

The performance of the model is evaluated using a learning curve that incrementally learns from a training dataset. After each update during training, the model may be evaluated on the training dataset and validation dataset, and plots of the measured performance can be generated to exhibit learning curves. Using the learning curves of model performance on the train and validation datasets, it is possible to figure out if a model is underfit, overfit, or well-fit. The loss function curves for the eight datasets evaluated with four classifiers are evaluated. Figures 23 –30 outlines the loss function curves for XGBoost classifier. Figures 31 –38 outlines the loss function curves for AdaBoost classifier. Figures 39 –46 outlines the loss function curve for CatBoost classifier. Figures 47 –54 outlines the loss function curve for LightGBM classifier.

Fig. 23

Loss Function Curve for HFCD Dataset.

Fig. 24

Loss Function Curve for SPECT Dataset.

Fig. 25

Loss Function Curve for SPECTF Dataset.

Fig. 26

Loss Function Curve for SHD Dataset.

Fig. 27

Loss Function Curve for CHD Dataset.

Fig. 28

Loss Function Curve for HHD Dataset.

Fig. 29

Loss Function Curve for LBV Dataset.

Fig. 30

Loss Function Curve for SWD Dataset.

Fig. 31

Loss Function Curve for HFCD Dataset.

Fig. 32

Loss Function Curve for SPECT Dataset.

Fig. 33

Loss Function Curve for SPECTF Dataset.

Fig. 34

Loss Function Curve for SHD Dataset.

Fig. 35

Loss Function Curve for CHD Dataset.

Fig. 36

Loss Function Curve for HHD Dataset.

Fig. 37

Loss Function Curve for LBV Dataset.

Fig. 38

Loss Function Curve for SWD Dataset.

Fig. 39

Loss Function Curve for HFCD Dataset.

Fig. 40

Loss Function Curve for SPECT Dataset.

Fig. 41

Loss Function Curve for SPECTF Dataset.

Fig. 42

Loss Function Curve for SHD Dataset.

Fig. 43

Loss Function Curve for CHD Dataset.

Fig. 44

Loss Function Curve for HHD Dataset.

Fig. 45

Loss Function Curve for LBV Dataset.

Fig. 46

Loss Function Curve for SWD Dataset.

Fig. 47

Loss Function Curve for HFCD Dataset.

Fig. 48

Loss Function Curve for SPECT Dataset.

Fig. 49

Loss Function Curve for SPECTF Dataset.

Fig. 50

Loss Function Curve for SHD Dataset.

Fig. 51

Loss Function Curve for CHD Dataset.

Fig. 52

Loss Function Curve for HHD Dataset.

Fig. 53

Loss Function Curve for LBV Dataset.

Fig. 54

Loss Function Curve for SWD Dataset.

7.7.1 Loss function curves for XGBoost classifier

7.7.2 Loss function curves for AdaBoost classifier

7.7.3 Loss function curves for CatBoost classifier

7.7.4 Loss function curves for LightGBM classifier

In a learning curve for an underfit model, the training loss remains constant regardless of the amount of training and continues until the completion of training process. In an overfit curve, the training loss decreases with training, but the test loss first decreases and subsequently increases. There is a gap between the train and validation loss learning curves. The loss of the model from the Figs. 23 54 will always be lower on the training dataset than on the validation dataset and there exists a gap between the train and validation loss learning curves.

7.8 Overall accuracy comparison

Table 18 compares the overall accuracy performance of classifiers together with the proposed method of missing value imputation and feature selection (19). Comparing Random Forest Classifier (RF), Classification and Regression Trees (CART), K-Nearest Neighbour Classifier (K-NN), and Support Vector Machine (SVM) using the performance metrics accuracy, precision, recall, and F1 score. The graphical plot for the accuracy compared with the classifiers is presented in Fig. 54.

Table 18
Accuracy comparison proposed approach with RF, CART, K-NN, SVM

RF CART K-NN SVM XGBoost AdaBoost CatBoost LightGBM

HFCD 0.78 0.85 0.86 0.84 0.89 0.84 0.83 0.8

SPECT 0.77 0.83 0.84 0.86 0.81 0.8 0.89 0.82

SPECTF 0.79 0.83 0.92 0.85 0.9 0.82 0.93 0.8

SHD 0.77 0.85 0.87 0.76 0.8 0.8 0.81 0.8

CHD 0.78 0.83 0.88 0.8 0.8 0.85 0.83 0.86

HHD 0.81 0.85 0.8 0.83 0.82 0.85 0.85 0.82

LBV 0.8 0.8 0.86 0.76 0.8 0.81 0.79 0.82

SWD 0.83 0.79 0.84 0.85 0.97 0.89 0.98 0.97

	RF	CART	K-NN	SVM	XGBoost	AdaBoost	CatBoost	LightGBM
HFCD	0.78	0.85	0.86	0.84	0.89	0.84	0.83	0.8
SPECT	0.77	0.83	0.84	0.86	0.81	0.8	0.89	0.82
SPECTF	0.79	0.83	0.92	0.85	0.9	0.82	0.93	0.8
SHD	0.77	0.85	0.87	0.76	0.8	0.8	0.81	0.8
CHD	0.78	0.83	0.88	0.8	0.8	0.85	0.83	0.86
HHD	0.81	0.85	0.8	0.83	0.82	0.85	0.85	0.82
LBV	0.8	0.8	0.86	0.76	0.8	0.81	0.79	0.82
SWD	0.83	0.79	0.84	0.85	0.97	0.89	0.98	0.97

Based on the Accuracy achieved in Fig. 55 using the proposed missing value imputation method and feature selection method, it can be inferred that the boosted classifiers have a greater accuracy than the RF, CART, K-NN, and SVM.

Fig. 55

Accuracy Comparison With RF, CART, K-NN, SVM.

The comparison of overall Precision among classifiers and the proposed technique of missing value imputation and feature selection is shown in Table 19. Evaluating Random Forest Classifier (RF), Classification and Regression Trees (CART), K-Nearest Neighbour Classifier (K-NN), and Support Vector Machine (SVM) using the performance metrics accuracy, precision, recall, and F1 score.

Table 19

Precision Comparison Proposed Approach with RF, CART, K-NN, SVM

	RF	CART	K-NN	SVM	XGBoost	AdaBoost	CatBoost	LightGBM
HFCD	0.77	0.84	0.84	0.81	0.87	0.81	0.81	0.78
SPECT	0.76	0.82	0.82	0.83	0.8	0.77	0.88	0.82
SPECTF	0.78	0.82	0.9	0.82	0.87	0.78	0.91	0.79
SHD	0.76	0.84	0.85	0.73	0.79	0.78	0.8	0.77
CHD	0.77	0.82	0.86	0.77	0.79	0.8	0.79	0.79
HHD	0.8	0.84	0.78	0.8	0.79	0.81	0.83	0.79
LBV	0.79	0.79	0.84	0.73	0.78	0.79	0.77	0.8
SWD	0.82	0.78	0.82	0.82	0.96	0.88	0.96	0.95

From the Precision obtained in Fig. 56 with the proposed missing value imputation approach and feature selection approach, it is inferred that the boosted classifiers are having higher Precision when compared with the RF, CART, K-NN, and SVM.

Fig. 56

Precision Comparison With RF, CART, K-NN, SVM.

The overall Recall comparison is done with the classifiers along with the proposed approach of missing value imputation and feature selection and is presented in Table 20. The classifiers used for comparison are Random Forest Classifier (RF), Classification and Regression Trees (CART), K-Nearest Neighbour Classifier (K-NN), and Support Vector Machine (SVM) with the performance metrics accuracy, precision, recall, and F1 score. The graphical plot for the Recall compared with the classifiers is presented in Fig. 56.

Table 20

Recall comparison proposed approach with RF, CART, K-NN, SVM

	RF	CART	K-NN	SVM	XGBoost	AdaBoost	CatBoost	LightGBM
HFCD	0.76	0.83	0.83	0.8	0.86	0.79	0.79	0.77
SPECT	0.75	0.81	0.81	0.82	0.79	0.75	0.86	0.81
SPECTF	0.77	0.81	0.89	0.81	0.86	0.76	0.89	0.78
SHD	0.75	0.83	0.84	0.72	0.78	0.76	0.78	0.76
CHD	0.76	0.81	0.85	0.76	0.78	0.78	0.77	0.78
HHD	0.79	0.83	0.77	0.79	0.78	0.79	0.81	0.78
LBV	0.78	0.78	0.83	0.72	0.77	0.77	0.75	0.79
SWD	0.81	0.77	0.81	0.81	0.95	0.86	0.94	0.94

From the Recall obtained in Fig. 57 with the proposed missing value imputation approach and feature selection approach, it is inferred that the boosted classifiers are having higher Recall when compared with the RF, CART, K-NN, and SVM.

Fig. 57

Recall Comparison With RF, CART, K-NN, SVM.

The overall F1-Score comparison is done with the classifiers along with the proposed approach of missing value imputation and feature selection and is presented in Table 21. The classifiers used for comparison are Random Forest Classifier (RF), Classification and Regression Trees (CART), K-Nearest Neighbour Classifier (K-NN), and Support Vector Machine (SVM) with the performance metrics accuracy, precision, recall, and F1 score. The graphical plot for the Recall compared with the classifiers is presented in Fig. 57.

Table 21

F1-score comparison proposed approach with RF, CART, K-NN, SVM

	RF	CART	K-NN	SVM	XGBoost	AdaBoost	CatBoost	LightGBM
HFCD	0.77	0.84	0.85	0.81	0.85	0.78	0.78	0.76
SPECT	0.76	0.82	0.83	0.83	0.78	0.74	0.85	0.8
SPECTF	0.78	0.82	0.91	0.82	0.85	0.75	0.88	0.77
SHD	0.76	0.84	0.86	0.73	0.77	0.75	0.77	0.75
CHD	0.77	0.82	0.87	0.77	0.77	0.77	0.76	0.77
HHD	0.8	0.84	0.79	0.8	0.77	0.78	0.8	0.77
LBV	0.79	0.79	0.85	0.73	0.76	0.76	0.74	0.78
SWD	0.82	0.78	0.83	0.82	0.94	0.85	0.93	0.93

From the F1-Score obtained in Fig. 58 with the proposed missing value imputation approach and feature selection approach, it is inferred that the boosted classifiers are having higher Recall when compared with the RF, CART, K-NN, and SVM.

Fig. 58

F1-Score Comparison with Standard Classifiers.

The proposed approach is used for binary classification problems, and multiclassification problems can be improved in the future. The missing value imputation performed in this study is a random forest-based imputation referred to as the missForest technique, where the imputation time increases as the number of observations increases. The wrapper approach employs the computationally intensive liberty of learning algorithm. The fact that feature selection is reliant on the classifier reduces training time.

No Free Lunch Theorem suggests that under a uniform distribution over induction problems (search problems or learning problems), all induction algorithms perform equally. Utilizing many methodologies, the suggested approach performed well and the search algorithms used in this research may perform poorly for one set of goal functions but effectively for another set. he results observed in the proposed work perform better with the parameter setting of the optimization algorithms used.

From the results, it is evident that the boosted classifiers performed comparably to other state-of-the-art algorithms. The performance of these boosted classifiers may be enhanced by hyperparameter tuning, which updates the parameters of each of these boosted classifiers and yields a learning model with high performance. The performance of the proposed work was compared to that of current algorithms for feature selection. When compared to other classifiers, Table 17 illustrates that the proposed approach performed well in terms of accuracy. The performance of the classifiers RF, CART, SVM, and K-NN has been compared. Among the datasets used in this proposed work, the SWD dataset performed much better for all the boosted classifiers.

8 Conclusion and future scope

In this work, a novel classification framework for the diagnosis of heart disease is proposed. In the proposed work, the data pre-processing subsystem uses a non-parametric imputation method using the missForest imputation strategy. The feature selection is performed in the first step by a filter approach in which the FCBF is used. The selected features from the filter approach are passed to two wrapper-based algorithms namely ASO, and HGSO with the fitness function as the combination of accuracy, G-mean, and MCC measured by the SVM classifier and the best features are selected by performing the union operation. The selected features are used to train four boosted classifiers, namely XGBoost, AdaBoost, CatBoost, LightGBM. The overall performance metrics considered in this work are in terms of Accuracy, Precision, Sensitivity, Recall, and F1- Score. The overall performance is evaluated and compared with the traditional classifiers namely RF, CART, K-NN, and SVM. The imputation of missing values can be done by replacing the Random Forest tree with other boosting trees namely XGBoost, and Catboost or LightGBM can be considered as a future scope. The implementation of the latest state of-the-art wrapper-based feature selection approaches can also be considered a future study. The proposed work is concentrated on binary classification and as a future scope, multi-class classification can also be considered and also the proposed approach can also be used for high-dimension datasets and time-series datasets.

References

Surya Aaditya , Machine Learning and Ensemble Approach Onto Predicting Heart Disease, arXiv preprintarXiv:2111. 08667 (2021). https://doi.org/10.48550/arXiv.2111.08667.

Mc Namara

, Alzubaidi

and Jackson

J.K.

Cardiovascular diseaseas a leading cause of death: how are pharmacists getting involved?

Integrated Pharmacy Research & Practice 8 (2019), 1. https://doi.org/10.2147/IPRP.S133088.

Roth , Gregory

, George Mensah

, Catherine Johnson

, Giovanni Addolorato , Enrico Ammirati , Larry Baddour

, Noël

, Barengoet , et al. Global burden of cardiovascular diseases and risk factors,–: update from the GBD study, Journal ofthe American College of Cardiology 76(25) (2020), 2982–3021. https://doi.org/10.1016/j.jacc.2020.11.010.

Smáradóttir and María Isabel , Copeptin in Cardiovascular Disease and Dysglycemia. Karolinska Institutet (Sweden), 2021. https://doi.org/10.1177/1479164116664490.

Kilkenny , Monique

, Libby Dunstan , Doreen Busingye , Tara Purvis , Megan Reyneke , Mary Orgill and Dominique Cadilhac

, Knowledge of Risk Factors for Diabetes or Cardiovascular Disease (CVD) is Poor Among Individuals with Risk Factors for CVD, PloS One 12(2) (2017), e0172941. https://doi.org/10.1371/journal.pone.0172941.

Matheus , Schallig and Vaez Barzani Den , An Analysis of Artificial Intelligence Based Clinical Decision Support Systems (2021). https://doi.org/10.53759/0088/JBSHA202101002.

Hou

, Zhang

, Ladizhinsky

, Yang

, Kuleshov

, Wang

and Yang

Clinical Evidence Engine: Proof of-Concept For A Clinical-Domain-Agnostic Decision Support Infrastructure. ArXiv preprint ArXiv:2111.00621, (2021). https://doi.org/10.48550/arXiv.2111.00621.

Han , Jiawei , Jian Pei and Micheline Kamber , Data Mining: Concepts and Techniques. Elsevier, (2011). https://doi.org/10.1016/C2009-0-61819-5.

Yuan Luo , Peter Szolovits , Anand Dighe

and Jason Baron

, 3D-MICE:Integration of cross-sectional and longitudinal imputation formulti-analyte longitudinal clinical data, Journal of theAmerican Medical Informatics Association 25.6 (2018), 645–653. https://doi.org/10.1093/jamia/ocx133.

10.

Singh , Namrata and Pradeep Singh , A hybrid ensemble-filter wrapper feature selection approach for medical data classification, Chemometrics and Intelligent Laboratory Systems 217 (2021), 104396. https://doi.org/10.1016/j.chemolab.2021.104396.

11.

Alirezanejad , Mehdi , Rasul Enayatifar , Homayun Motameni and Hossein Nematzadeh , Heuristic filter feature selection methods for medicaldataset, Genomics 112(2) (2020), 1173–1181. https://doi.org/10.1016/j.ygeno.2019.07.002.

12.

Sahebi , Golnaz , Parisa Movahedi , Masoumeh Ebrahimi , Tapio Pahikkala , Juha Plosila and Hannu Tenhunen , GeFeS: A Generalized Wrapper Feature Selection Approach for Optimizing Classification Performance, Computers in Biology and Medicine 125 (2020), 103974. https://doi.org/10.1016/j.compbiomed.2020.103974.

13.

Chen , Chih-Wen , Yi-Hong Tsai , Fang-Rong Chang and Wei-Chao Lin , Ensemble feature selection in medical datasets: combining filter,wrapper, and embedded feature selection results, Expert Systems 37(5) (2020), e12553. https://doi.org/10.1111/exsy.12553.

14.

Shaban , Warda

, Asmaa Rabie

, Ahmed Saleh

and Abo-Elsoud

M.A.

, A new COVID-19 patients detection strategy (CPDS) basedon hybrid feature selection and enhanced K-NN classifier, Knowledge-Based Systems 205 (2020), 106270. https://doi.org/0.1016/j.knosys.2020.106270.

15.

Hu , Zhiyong and Dongping Du , A new analytical framework for missingdata imputation and classification with uncertainty: missing dataimputation and heart failure readmission prediction, PloS One 15(9) (2020), e0237724. https://doi.org/10.1371/journal.pone.0237724.

16.

Mostafa , Fahad

and Md Easin Hasan , Machine learning approaches for binary classification to discover liver diseases using clinical data, arXiv preprint arXiv:2104.12055 (2021). https://doi.org/10.1101/2021.04.26.21256121.

17.

Fang , Liying , Han Zhao , Pu Wang , Mingwei Yu , Jianzhuo Yan , Wenshuai Cheng and Peiyu Chen , Feature selection method based on mutualinformation and class separability for dimension reduction inmultidimensional time series for clinical data, BiomedicalSignal Processing and Control 21 (2015), 82–89. https://doi.org/10.1016/j.bspc.2015.05.011.

18.

Nasarian , Elham , Moloud Abdar , Mohammad Amin Fahami , Roohallah Alizadehsani , Sadiq Hussain , Mohammad Ehsan Basiri , Mariam Zomorodi-Moghadam , et al. Association between work-related features and coronary artery disease: a heterogeneous hybrid featureselection integrated with balancing approach, Pattern Recognition Letters 133 (2020), 33–40. https://doi.org/10.1016/j.patrec.2020.02.010.

19.

Al-Sarem , Mohammed , Faisal Saeed , Wadii Boulila , Abdel Hamid Emara , Muhannad Al-Mohaimeed and Mohammed Errais , Feature selection and classification using cat-boost method for improving the performance of predicting parkinson’s disease. In Advances on Smart and Soft Computing, pp. 189-199. Springer, Singapore, (2021). https://doi.org/10.1007/978-981-15-6048-4_17.

20.

Rashid , ANM Bazlur , Mohiuddin Ahmed , Leslie Sikos

and Paul Haskell-Dowland , A novel penalty-based wrapper objective functionfor feature selection in big data using cooperative co-evolution, IEEE Access 8 (2020), 150113–150129. https://doi.org/10.1109/ACCESS.2020.3016679.

21.

Nancy , Jane

, Nehemiah Khanna

and Kannan Arputharaj , Imputing missing values in unevenly spaced clinical time-series data to buildan effective temporal classification framework, Computational Statistics & Data Analysis 112 (2017), 63–79. https://doi.org/10.1016/j.csda.2017.02.012.

22.

Nahato , Kindie Biredagn , Khanna Nehemiah

and Kannan

, Hybrid approach using fuzzy sets and extreme learning machine for classifying clinical datasets, Informatics in MedicineUnlocked 2 (2016), 1–11. https://doi.org/10.1016/j.imu.2016.01.001.

23.

Cheng , Ching-Hsue , Chia-Pang Chan and Yu-Jheng Sheu , A novel purity-based k nearest neighbors imputation method and its application in financial distress prediction, Engineering Applications of Artificial Intelligence 81 (2019), 283–299. https://doi.org/10.1016/j.engappai.2019.03.003.

24.

Arianna Dagliati , Simone Marini , Lucia Sacchi , Giulia Cogni , Marsida Teliti , Valentina Tibollo , Pasquale De Cata , Luca Chiovato and Riccardo Bellazzi , Machine learning methods to predict diabetescomplications, Journal of Diabetes Science and Technology 12.2 (2018), 295–302. https://doi.org/10.1177/1932296817706375.

25.

Malek Alzaqebah , Nashat Alrefai , Eman Ahmed

A.E.

, Sana Jawarneh , Mutasem Dagliati and Alsmadi Alzaqebah

, Neighborhood search methods with mothoptimization algorithm as a wrapper method for feature selection problems, International Journal of Electrical and Computer Engineering 10.4 (2020), 3672. https://doi.org/10.11591/ijece.v10i4.pp3672-3684.

26.

Mehrdad Rostami , Saman Forouzandeh , Kamal Berahmand and Min SoltanidRostami , Integration of multi-objective pso based feature selectionand node centrality for medical datasets, Genomics 112.6 (2020), 4370–4384. https://doi.org/10.1016/j.ygeno.2020.07.027.

27.

Sahebi , Golnaz , Parisa Movahedi , Masoumeh Ebrahimi , Tapio Pahikkala and Juha Plosila , GeFeS: A generalized wrapper feature selection approach for optimizing classification performance, Computers in Biology and Medicine 125 (2020), 103974. https://doi.org/10.1016/j.compbiomed.2020.103974.

28.

Polat , Huseyin , Homay Danaei Mehr and Aydin Cetin , Diagnosis ofchronic kidney disease based on support vector machine by featureselection methods, Journal of Medical Systems 41.4 (2017), 55. https://doi.org/10.1007/s10916-017-0703-x.

29.

Nagpal , Sushama , Sanchit Arora and Sangeeta Dey , Feature selectionusing gravitational search algorithm for biomedical data, Procedia Computer Science 115 (2017), 258–265. https://doi.org/10.1016/j.procs.2017.09.133.

30.

Elgin Christo

V.R.

, Khanna Nehemiah

, Minu

and Kannan

, Correlation-based ensemble feature selection using bioinspired algorithms and classification using backpropagation neural network, Computational and Mathematical Methods in Medicine (2019). https://doi.org/10.1155/2019/7398307.

31.

Sreejith

, Khanna Nehemiah

and Kannan

A framework toclassify clinical data using a genetic algorithm and artificialflora-optimized neural network, International Journal of SwarmIntelligence Research (IJSIR) 13(1) (2022), 1–22. https://doi.org/0.4018/IJSIR.304719.

32.

Anisha Isaac , Khanna Nehemiah

, Anubha Isaac and Arputharaj Kannan , Computer-aided diagnosis system for diagnosis of pulmonary emphysemausing bio-inspired algorithms, Computers in Biology andMedicine 124 (2020), 103940. https://doi.org/10.1016/j.compbiomed.2020.103940.

33.

Leema

, Khanna Nehemiah

, Elgin Christo

V.R.

and Kannan

Evaluation of parameter settings for training neural networks using back propagation algorithms: a study with clinical datasets, International Journal of Operations Research and Information Systems (IJORIS) 11(4) (2020), 62–85. https://doi.org/10.4018/IJORIS.2020100104.

34.

Isaac , Anisha , Khanna Nehemiah

and Kannan

, Computer-aided diagnosis system for diagnosis of cavitary and miliary tuberculosis using improved artificial bee colony optimization, IETE Journal of Research 1–20, (2021). https://doi.org/10.1080/03772063.2021.1946440.

35.

Sreejith

, Khanna Nehemiah

and Kannan

Clinical data classification using an enhanced smote and chaotic evolutionary feature selection, Computers in Biology and Medicine 126 (2020), 103991. https://doi.org/10.1016/j.compbiomed.2020.103991.

36.

Sreejith

, Khanna Nehemiah

and Kannan

A classification framework using a diverse intensified strawberry optimized neural network (DISON) for clinical decision-making, Cognitive Systems Research 64 (2020), 98–116. https://doi.org/10.1016/j.cogsys.2020.08.003.

37.

Anisha Isaac , Khanna Nehemiah

, Snofy Dunston

, Elgin Christoand

and Kannan

, Feature selection using competitive coevolution of bio-inspired algorithms for the diagnosis of pulmonary emphysema, Biomedical Signal Processing and Control 72 (2022), 103340. https://doi.org/10.1016/j.bspc.2021.103340.

38.

Bansal , Priti , Abhishek Singhal and Abhishek Gupta , Automatic detection of osteosarcoma based on integrated features and feature selection using a binary arithmetic optimization algorithm, Multimedia Tools and Applications 81(6) (2022), 8807–8834. https://doi.org/10.1007/s11042-022-11949-6.

39.

Agushaka , Jeffrey

, Absalom Ezugwu

and Laith Abualigah , Dwarfmongoose optimization algorithm, Computer Methods in Applied Mechanics and Engineering 391 (2022), 114570. https://doi.org/10.1016/j.cma.2022.114570.

40.

Nadimi-Shahraki , Mohammad

, Shokooh Taghian , Seyedali Mirjalili and Laith Abualigah , Binary aquila optimizer for selecting effective features from medical data: A COVID-19 case study, Mathematics 10(11) (2022), 1929. https://doi.org/0.3390/math10111929.

41.

Elgamal , Zenab , Aznul Qalid Md Sabri , Mohammad Tubishat , Dina Tbaishat , Sharif Naser Makhadmeh and Osama Ahmad Alomari , Improved reptile search optimization algorithm using chaotic map and simulated annealing for feature selection in medical filed, IEEE Access (2022). https://doi.org/10.1109/ACCESS.2022.3174854.

42.

Oyelade , Olaide

and Absalom Ezugwu

, Ebola optimization search algorithm (EOSA): a new metaheuristic algorithm based on the propagation model of ebola virus disease, ArXiv Preprint ArXiv:2106.01416, (2021). https://doi.org/10.1109/ACCESS.2022.3147821.

43.

Stekhoven , Daniel

and Peter Bühlmann , Miss Forest—Non-Parametric Missing Value Imputation for Mixed-Type Data, Bioinformatics 28.1 (2012), 112–118. https://doi.org/10.1093/bioinformatics/btr597.

44.

Hong , Shangzhi and Henry Lynn

, Accuracy of random-forest-based imputation of missing data in the presence of non-normality,non-linearity, and interaction, BMC Medical Research Methodology 20.1 (2020), 1–12. https://doi.org/10.1186/s12874-020-01080-1.

45.

Caterina Penone1 , Ana Davidson

, Kevin Shoemaker

, Moreno Di Marco , Carlo Rondinini , Thomas Brooks

, Bruce Youn

, Catherine Graham

and Gabriel

, Imputation of missing data in life-history trait datasets: which approach performs the best? Methods in Ecology and Evolution 5.9 (2014), 961–970. https://doi.org/10.1111/2041-210X.12232.

46.

Zhao , Weiguo , Liying Wang and Zhenxing Zhang , Atom searchoptimization and its application to solve a hydrogeologic parameterestimation problem, Knowledge-Based Systems 163 (2019), 283–304. https://doi.org/10.1016/j.knosys.2018.08.030.

47.

Hashim

F.A.

, Houssein

E.H.

, Mabrouk

M.S.

, Al-Albany

and Mirjalili

Henry gas solubility optimization: a novel physics-basedalgorithm, Future Generation Computer Systems 101 (2019), 646–667. https://doi.org/10.1016/j.future.2019.07.015.

48.

Schapire

R.E.

, Explaining AdaBoost. In Empirical Inference (pp. 37-52). Springer, Berlin, Heidelberg.(2013). https://doi.org/10.1007/978-3-642-41136-6_5.

49.

Chen , Tianqi and Carlos Guestrin , Xgboost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM sigkdd International Conference on Knowledge Discovery and Data Mining, (2016), pp. 785–794. https://doi.org/10.1145/2939672.2939785.

50.

Dorogush , Anna Veronika , Vasily Ershov and Andrey Gulin , CatBoost: A Gradient Boosting with Categorical Features Support, ArXiv PreprintArXiv:1810.11363. (2018). https://doi.org/10.48550/arXiv.1810.11363.

51.

Ke , Guolin , Qi Meng , Thomas Finley , Taifeng Wang , Wei Chen , Weidong Ma , Qiwei Ye and Tie-Yan Liu , Lightgbm: A Highly Efficient Gradient Boosting Decision Tree, Advances in Neural Information Processing Systems 30 (2017).

A classification framework using filter–wrapper based feature selection approach for the diagnosis of congenital heart failure

Abstract

Keywords

1 Introduction

2 List of abbreviations

4 Materials and methods

5 Feature selection subsystem

5.1 Feature selection using filter-based approach

5.1.1 Fast correlation based filter approach (FCBF)

5.2 Feature selection subsystem using wrapper-based approach

5.2.1 Atom search optimization algorithm (ASO)

6.1 AdaBoost classifier

Table 7 Parameters for AdaBoost classifier No Parameters Phrase 1 Z t Normalization Factor 2 D Training Instance 3 α t Weights t 4 h t Weak Learner 5 x i Data Sample 6 n_estimators Base Estimators for Identifying Overfitting

Table 9 Parameters for CatBoost classifier No Parameters Phrase 1 α Step size 2 h t Boosting Tree 3 F t Approximate Function 4 x ∈ R j Indicator Function 5 I Feature Index 6 n Total Number of Instances 7 a Weight Parameter 8 P Target variable 9 od_type Control Parameter for Overfitting

Table 10 Parameters for LightGBM classifier No Parameters Phrase 1 N Number of Decision Trees 2 h (x, θ t ) Prediction Result 3 θ t Output of the Decision Tree 4 t Decision Tree 5 x Input Variable 6 early_stopping_rounds Number of Iterations to be Performed

7.7.2 Loss function curves for AdaBoost classifier

7.7.3 Loss function curves for CatBoost classifier

7.7.4 Loss function curves for LightGBM classifier

7.8 Overall accuracy comparison

References

Table 7
Parameters for AdaBoost classifier

No Parameters Phrase

1 Z _t Normalization Factor

2 D Training Instance

3 α_t Weights t

4 h _t Weak Learner

5 x _i Data Sample

6 n_estimators Base Estimators for Identifying Overfitting

Table 9
Parameters for CatBoost classifier

No Parameters Phrase

1 α Step size

2 h^t Boosting Tree

3 F^t Approximate Function

4 x ∈ R_j Indicator Function

5 I Feature Index

6 n Total Number of Instances

7 a Weight Parameter

8 P Target variable

9 od_type Control Parameter for Overfitting

Table 10
Parameters for LightGBM classifier

No Parameters Phrase

1 N Number of Decision Trees

2 h (x, θ_t) Prediction Result

3 θ_t Output of the Decision Tree

4 t Decision Tree

5 x Input Variable

6 early_stopping_rounds Number of Iterations to be Performed