Diagnosis of Covid-19 from CT slices using Whale Optimization Algorithm,Support Vector Machine and Multi-Layer Perceptron

Abstract

BACKGROUND:

The coronavirus disease 2019 is a serious and highly contagious disease caused by infection with a newly discovered virus, named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

OBJECTIVE:

A Computer Aided Diagnosis (CAD) system to assist physicians to diagnose Covid-19 from chest Computed Tomography (CT) slices is modelled and experimented.

METHODS:

The lung tissues are segmented using Otsu’s thresholding method. The Covid-19 lesions have been annotated as the Regions of Interest (ROIs), which is followed by texture and shape extraction. The obtained features are stored as feature vectors and split into 80:20 train and test sets. To choose the optimal features, Whale Optimization Algorithm (WOA) with Support Vector Machine (SVM) classifier’s accuracy is employed. A Multi-Layer Perceptron (MLP) classifier is trained to perform classification with the selected features.

RESULTS:

Comparative experimentations of the proposed system with existing eight benchmark Machine Learning classifiers using real-time dataset demonstrates that the proposed system with 88.94% accuracy outperforms the benchmark classifier’s results. Statistical analysis namely, Friedman test, Mann Whitney U test and Kendall’s Rank Correlation Coefficient Test has been performed which indicates that the proposed method has a significant impact on the novel dataset considered.

CONCLUSION:

The MLP classifier’s accuracy without feature selection yielded 80.40%, whereas with feature selection using WOA, it yielded 88.94%.

Keywords

Covid-19 WOA SVM MLP kendall’s correlation coefficient graph

1 Introduction

Since December 2019, the world has faced a global health crisis due to the rapid spread of a virus known as Corona Virus diseases-2019 (COVID-19), as named by the World Health Organization (WHO). By conducting deep sequencing analysis of pulmonary samples from patients, an unknown coronavirus was identified on January 7th, 2020, and subsequently named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1]. Symptoms associated with Covid-19 encompass fever, cough, vomiting, myalgia, fatigue, and diarrhoea [2, 3]. As of October 21st, 2023, a total of 771,549,718 Covid-19 cases had been diagnosed worldwide, resulting in 6,974,473 fatalities [4]. In chest CT imaging, radiologists have identified three types of anomalies linked to Covid-19 namely, Ground-Glass Opacities (GGO), Consolidation, and Pleural Effusion [5, 6].

The diagnostic tests for this disease include rapid antigen tests, nucleic acid amplification and serology test. However, the false-negative rate for nucleic acid testing is as high as 17% to 25.5%, and tests like real-time reverse transcriptase polymerase chain reaction (rRT-PCR) are a gold standard but are time consuming and also suffer from a high false negative rate [7, 8]. Therefore, radiological imaging techniques, such as chest radiographs and CT, can be used for effective diagnosis and disease evaluation [9, 10]. Because of the high spatial and distinct relationship between CT density and lung air content [11, 12], CT is commonly used to detect and segment the typical indications of Covid-19 disease. Compared to chest radiographs which can present only a two-dimensional view, CT screening is widely preferred due to its merit in revealing the structural and anatomical details of the lung in a three-dimensional view [13, 14]. The qualitative identification of infection and longitudinal changes in CT slices could therefore aid in diagnosing Covid-19.

CAD systems used to interpret CT slices of the lung to diagnose disorders involve the following process. Firstly, from the input chest CT slices, segmentation of lung parenchyma is done using Otsu’s thresholding method. For further feature analysis, the ROIs are extracted based on shape, run-length and texture features. Feature selection techniques are used to choose an optimal feature subset (OFS) for improving disease diagnosis accuracy [15, 16]. To improve prediction performance and diagnostic accuracy of the CAD system, these techniques help to minimise irrelevant and redundant features.

Feature Selection (FS) is the process of reducing dimensionality to improve model performance with better interpretability, alleviating redundancy by focusing on a subset of essential features. This process involves eliminating irrelevant features and selecting the best possible subset. Feature Selection approaches are broadly categorized into filter, wrapper, hybrid, and embedded methods [17, 18]. Filter methods are feature selection techniques that assess the relevance of features based on statistical measures or domain knowledge, without involving a machine learning algorithm. These methods rank or score features, to select the top-performing ones for further analysis. They are computationally efficient but may not consider feature interactions or the specific learning algorithm used. Wrapper methods, in contrast, use machine learning algorithms to evaluate feature subsets. Hybrid methods combine elements of both filter and wrapper techniques. Embedded methods incorporate feature selection into the model training process itself. [19, 20]. The nature-inspired metaheuristic search algorithms have been found to be efficient in finding optimum solutions with a higher degree of success based on the methodologies discussed in [21, 22].

Bio-inspired algorithms are employed that help in solving complex optimization and search problems by emulating the behaviour of organisms. In particular, these algorithms are often valued for complex problems with large search spaces where traditional methods may struggle to find optimal solutions [19]. In this study, “Otsu’s Thresholding” method has been utilized for segmentation. The ROIs have been extracted, and the features are stored as feature vectors and divided into training and testing sets, each containing 80% and 20% of the feature vectors, respectively. A wrapper-based feature selection approach has been employed, utilizing the Whale Optimization Algorithm (WOA) with the Support Vector Machine (SVM) classifier’s accuracy as the fitness function to select the optimal features. Further, the selected features have been trained using the Multi-Layer Perceptron (MLP) classifier. The combination of WOA and the MLP classifier provided more accurate results (88.94%) compared to eight traditional ML classifiers.

The remaining manuscript is structured as follows: Section 2 presents a literature review of related research. Section 3 outlines the system framework. The details of the obtained results are presented in Section 4. Finally, in section 5, the conclusion and the scope of the work are summarized.

2 Related works

Anisha et al. proposed a CAD system for the diagnosis of pulmonary emphysema from chest CT slices. Segmentation has been done using Spatial Fuzzy C-Means (SFCM) clustering algorithm and the ROIs have been extracted using pixel-based segmentation. From each extracted ROI, features have been extracted and a competitive coevolution model has been proposed for feature selection. Feature selection is performed as a wrapper approach, using the bio-inspired algorithms namely, Spider Monkey Optimization (SMO) and Paddy Field Algorithm (PFA) with the accuracy of the Support Vector Machine (SVM) classifier as the fitness function. This CAD system has been tested using two datasets namely CT emphysema database (CTED) and Real-time emphysema datasets. The accuracy, precision, specificity and recall obtained for both the datasets are (93.74%, 81.95%), (90.61%, 72.92%), (95.3%, 86.46%), (90.61%, 72.92%), which are better compared to the performance of SMO and PFA algorithms applied individually for feature selection and the CAD system without performing feature selection [16].

Anisha et al. proposed a CAD framework for the detection of pulmonary emphysema. Within this system, the segmentation process is carried out using the Spatial Intuitionistic Fuzzy C-Means clustering algorithm. A wrapper-based approach, incorporating four bio-inspired algorithms, namely Moth-Flame Optimization (MFO), Artificial Bee Colony Optimization, Fire Fly Optimization (FFO), and Ant Colony Optimization (ACO), with the accuracy of the Support Vector Machine (SVM) classifier as the fitness function was used to select the optimal feature subset. The four optimal feature subsets that were selected by the bio-inspired algorithms were trained independently using the ELM classifier, and the performance was evaluated. The system has been trained using the real-time emphysema dataset and the public emphysema dataset. Compared to other three bio-inspired optimization algorithms, the MFO algorithm produced the best accuracy of 89.02% for the real-time emphysema dataset, and the framework that used the FFO algorithm produced the best accuracy of 91.89% for the public emphysema dataset [23].

Sweetlin et al. proposed a CAD framework for the detection of pulmonary hamartoma disease from chest CT slices. Segmentation has been done using Otsu’s thresholding method. The feature selection process utilizes a population-based meta-heuristic approach incorporating the ACO algorithm with filter evaluation functions, specifically the Cosine Similarity Measure (CSM) and Rough Dependency Measure (RDM). These features have been used to select two subsets for training Naïve Bayes and SVM classifiers, using 10-fold cross-validation. The proposed ACO-RDM FS algorithm has outperformed the performance scores of NB, SVM, and Decision Tree classifiers, by achieving respective accuracies of 91.02%, 94.36%, and 90%. On the other hand, the ACO-CSM algorithm has yielded slightly lower accuracies of 85.64%, 83.07%, and 84.87%. It is evident that the adoption of RDM with ACO significantly enhances disease diagnosis accuracy [24].

Sunil et al. proposed a CAD framework for the detection of three diseases namely, tuberculosis, bronchiectasis, and pneumonia. Segmentation has been done using the supervised segmentation algorithm. This framework incorporates a Distance-Based Genetic Algorithm (DGA) for feature selection. The performance of this CAD system, equipped with DGA for feature selection, is evaluated against systems using features selected by Differential Evolution and Statistical Repair Mechanism Feature Selection (DEFS) and those that do not employ feature selection at all. A K-Nearest Neighbour (k-NN) classifier is used to classify the extracted ROIs into four classes namely, healthy, tuberculosis, bronchiectasis, and pneumonia. Significantly, the CAD system utilizing DGA achieves an accuracy of 88.16%, while the systems with DEFS and without FS achieve only 83.47% and 86.46%, respectively [25].

Ali et al. proposed an efficient hybrid technique based on the Ant Lion Optimizer (ALO), which is designed to work with Multi-Layer Perceptron (MLP) neural networks. The proposed training model, based on the ALO method, is validated using sixteen standard datasets. The proposed ALO-MLP algorithm outperforms other optimization algorithms such as Differential Evolution, Genetic Algorithm, Particle Swarm Optimization (PSO) and Population-Based Incremental Learning in terms of worst, average, highest, and median accuracies. Additionally, the convergence tendencies of all competitors are monitored and analysed. The outcomes strongly support the assertion that ALO-MLP efficiently classifies a majority of datasets with exceptional performance [26].

Waleed et al. proposed an efficient training technique based on the MFO to train MLPs, resulting in the MFO-MLP model. This approach focuses on finding optimal weights and biases for MLPs to reduce the error rate and achieve high classification accuracy. To evaluate the performance of MFO-MLP, five standard classification datasets (xor, iris, heart, balloon, and breast cancer) and three function-approximation datasets (sine, sigmoid, and cosine) have been used. The performance of MFO-MLP is compared with four well-known optimization algorithms, including Genetic Algorithm, PSO, ACO, and Evolution Strategy. The experimental results underscore the superior accuracy and ability of the MFO algorithm to mitigate local optima issues [27].

Moura et al. proposed a classification system for Covid-19 pneumonia utilizing texture-based feature extraction algorithms applied to chest X-ray images. Their study involved the investigation of various classification models to distinguish between Covid-19 pneumonia and other types of pneumonia in chest radiology. Training and evaluation has been conducted on a dataset comprising 136 antero posterior and postero anterior chest X-rays sourced from two public databases. The lung images have been initially segmented using a pre-trained U-net-inspired segmentation model to generate lung masks. Radiomic statistical texture-based features have been subsequently extracted, selected, and trained across four classification models: Ada Boost, SVM, Random Forest, and Logistic Regression. Impressively, the proposed framework yielded an accuracy of 93% and a sensitivity of 95% [28].

Godbin et al. have proposed a new metaheuristic-based fusion model for Covid-19 diagnosis using chest X-ray images. Initially, the Weiner filter has been used for the pre-processing of images. Then, the fusion-based feature extraction process takes place by the incorporation of gray level co-occurrence matrix (GLCM), gray level run length matrix (GLRM), and local binary patterns (LBP). Then by using the salp swarm algorithm (SSA) the optimal feature subset has been chosen. An artificial neural network is applied for the classification step and classified the images as infected and healthy patients. The proposed model’s performance has been assessed using the Chest X-ray image dataset, and the results have been examined The obtained results outperforms over the state of art methods. The experimental results of the proposed CAD model have achieved an accuracy of 95.1% and 95.65% on binary and multiple classes [29].

Duchesne et al. proposed a novel approach testing the ability of a deep learning algorithm at extracting features from chest x-rays (CXR) to track and predict radiological evolution. This deep learning method was trained using the open CheXNet dataset and patient data from two open-source datasets. On patients from the CheXnet dataset, the area under ROC curves ranged from 0.71 to 0.93 for seven imaging features and one diagnosis. The study includes ROC studies and Mann-Whitney tests, with label learning determined by the ROC-AUC curve and comparisons made between different outcome groups using Mann-Whitney tests [30].

Marwa et al. proposed a feature selection approach based on WOA that mimics the natural behaviour of humpback whales. A wrapper-based approach has been used to find the optimal feature subset. The principal characteristic of this approach has been applied to the classification approach as a guide to feature selection based on a selected feature subset. This work uses a k-NN classifier to preserve the quality of the feature set selected. WOA has been employed adaptively to find the OFS that maximizes the performance of classification. To assess WOA, performance on feature selection, GA and PSO have been used as benchmarks. From the UCI ML data repository, 16 different datasets have been used. The statistical results obtained demonstrate the performance of WOA, which outperforms all features selected, proving the ability of the wrapper-based approach in feature selection. The outcomes obtained with WOA FS have shown the best classification accuracy values on an average [31].

Vijay Wasule et al. proposed a CAD system for classifying brain tumours using MRIs. Tumours can either malignant or benign. Noises have been eliminated from images in the pre-processing step. With morphological filtering, the MRI image’s shape, boundaries, and skeleton are extracted. Using GLCM, fourteen texture characteristics have been extracted. The dataset has been collected from Sahyandri Hospital, Pune which includes MRI images and BRATS 2012 training data. The suggested approach is evaluated using the brats 2012 training database with 251 malignant and 166 benign pictures and a total of 80 low-grade and high-grade glioma images. The suggested study produced accuracies of 96% and 86% for SVM and k-NN on gathered datasets, and 85% and 72.5% for Brats database admittedly. Therefore, it is observed that the SVM classifier’s accuracy is greater than the k-NN classifier’s accuracy [32].

Majdi et al. proposed a new wrapper-based feature selection model based on WOA. The key objective of this approach is to remove redundant or irrelevant data present in the features that could degrade the classifier’s accuracy. A k-NN classifier has been used in the classification system. The datasets include eighteen UCI benchmark datasets to assess and compare the efficiency of the Whale Optimization Algorithm Crossover Mutation (WOA-CM) approach. The proposed method has been tested against algorithms including GA, ALO, PSO and five standard filter feature selection methods. The results indicate that on 14 of 18 datasets, the suggested WOA-CM methodology outperformed all other approaches. The adaptive mechanism of this algorithm also accelerates the convergence speed in proportion to the number of iterations [33].

Hoda et al. proposed a feature selection-based WOA (FSWOA), which is a meta-heuristic method for feature selection. This algorithm is based on how Humpback Whales hunt, which involves three key steps: a) surrounding the prey, b) attacking with a spiral bubble net, and c) looking for prey. Four benchmark datasets namely, Pima Indian Diabetes (PID), Original WBC, Statlog and Hepatitis have been downloaded from the UCI ML repository and are used to test how well the suggested FSWOA algorithm worked. On these medical datasets, the suggested algorithm has an accuracy of 78.57% for PID, 97.86% for Breast Cancer, 77.05% for Statlog Disease, and 87.10% for Hepatitis. For PID, Breast Cancer, Heart Disease, and Hepatitis datasets, the rate of feature decrease is 60%, 12.5%, 53.85%, and 57.89%, respectively. The algorithm works well enough to lower the size of medical datasets used to diagnose diseases [34].

The presented research work highlights the CAD system for various pulmonary disorders and the diagnosis of the diseases [16 , 23–25]. Existing research works also suggest that employing nature-inspired algorithms for feature selection improves the diagnostic performance [23–28 , 31–34]. Existing research works also suggests using various methods for selecting the most related features [28 –30].

3 System framework

Segmentation, Feature Extraction, Feature Selection, and Classification are the core components that collectively form the CAD framework. Figure 1 provides a visual representation of the system’s framework for Covid-19 diagnosis.

Fig. 1

System Framework of the Proposed CAD System.

3.1 Segmentation

Covid-19 lesions can be observed in various areas of the lung, including the edges or boundaries, therefore, accurate segmentation of the lung tissues is essential. Otsu’s thresholding method has been used to accurately segment the lung tissues from their background. It seeks to find the threshold that maximizes the between-class variance, effectively separating the image into two classes (foreground and background) based on the intensity values [35, 36].

Input: Gray Scale CT Slice I

Pseudocode:

Compute the normalized histogram of the input image. Denote the components of the histogram P_i, i = 0,1,2, . . . L–1.

Compute the cumulative sums P₁(k), for k = 0,1,2, . . . L–1 using (1) $P_{i} (k) = \sum_{i = 0}^{i} P_{i}$ (1)

Compute the cumulative means m(k), for k = 0,1,2, . . . L–1 using (2) $m (k) = \sum_{i = 0}^{k} i P_{i}$ (2)

Compute the global intensity means m_G using (3) $m_{G} = \sum_{i = 0}^{k} i P_{i}$ (3)

Compute the between class variance $σ_{B}^{2} (k)$ , for k = 0,1,2, . . . L–1 using (4) $σ_{B}^{2} (k) = \frac{{[m_{G} P_{1} (k) - m (k)]}^{2}}{P_{1} (k) [1 - P_{1} (k)]}$ (4)

Obtain the Otsu’s threshold k^*, as the value of k for which the value of $σ_{B}^{2} (k)$ is maximum.

Obtain the separability measure η^* by using (5) at k = k^* $η^{*} (k) = \frac{σ_{B}^{2} (k)}{σ_{G}^{2}}$ (5) Where $σ_{G}^{2}$ is the global variance.

Output: Segmented lung tissues.

3.2 ROI extraction

In this research work, Covid-19 lesions have been considered as ROIs. The lesions have been analysed based on sites and sizes. The majority of these lesions have been found in the sub-pleural and posterior respiratory zones. The ROIs extracted in this work are patchy Ground Glass Opacity (GGO), Bilateral GGO, Sub Pleural GGO, peripheral GGO, Broncho-vascular thickening, Traction bronchiectasis, Reverse halo sign, air space consolidations, consolidations, GGO with consolidations and crazy paving appearance with inter / intra lobular septal thickening.

The process of extracting ROIs is as follows:

Input: Segmented lung tissues

Pseudocode:

Extract the ROIs with the pixel intensity value in the range of 125 –155 from the segmented lung tissues.

Annotate and label each ROI by a skilled radiologist.

Mark class label 1 to the Covid-19 lesions

Mark class label 2 to the non Covid-19 lesions.

Output: ROIs with class labels 1 and 2.

3.3 Feature extraction

The features have been extracted from the ROIs of each slice either with or without Covid-19. The obtained features from the ROIs have been concatenated as feature vectors in the feature database [37, 38]. Table 1 outlines all the features extracted from ROIs.

Table 1
Outline of Features Extracted

Extracted Features

Geometric Texture (0°, 45°, 90°, 135°)

1. Equivalent Diameter 1. Sum of squares variance

2. Convex Area 2. Contrast

3. Perimeter 3. Sum Entropy

4. Major axis length 4. Autocorrelation

5. Eccentricity 5. Sum Average

6. Euler number 6. Correlation

7. Minor axis length 7. Cluster Prominence

8. Orientation 8. Maximum Probability

9. Area 9. Cluster Shade

10. Solidity 10. Dissimilarity

11. Filled area 11. Information Measure of correlation

12. Extent 12. Energy

13. Entropy

14. Homogeneity

15. Difference Variance

16. Inverse Difference

17. Difference Entropy

Extracted Features
Geometric	Texture (0°, 45°, 90°, 135°)
1.	Equivalent Diameter	1.	Sum of squares variance
2.	Convex Area	2.	Contrast
3.	Perimeter	3.	Sum Entropy
4.	Major axis length	4.	Autocorrelation
5.	Eccentricity	5.	Sum Average
6.	Euler number	6.	Correlation
7.	Minor axis length	7.	Cluster Prominence
8.	Orientation	8.	Maximum Probability
9.	Area	9.	Cluster Shade
10.	Solidity	10.	Dissimilarity
11.	Filled area	11.	Information Measure of correlation
12.	Extent	12.	Energy
		13.	Entropy
		14.	Homogeneity
		15.	Difference Variance
		16.	Inverse Difference
		17.	Difference Entropy

Input: ROIs with class labels 1 and 2.

Pseudocode:

Extract eighty geometric and texture features using Gray Level Co-occurrence Matrix (GLCM) matrix.

Extract twelve geometric features.

Extract seventeen texture features in four orientations (0 degree, 45 degree, 90 degree, 135 degree)

Construct the feature vector pertaining to each ROI from the eighty extracted features (sixty-eight texture features and twelve geometric features) with a class label.

Store each feature vector in a feature database.

Output: Feature Vectors.

3.4 Feature selection

The main aim of the feature selection is to find the optimal feature subset, thereby eliminating the redundant features from the extracted features and thereby enhancing the classifier’s predictive accuracy. In feature selection, the wrapper technique employing bio-inspired Whale Optimization Algorithm for selecting the optimal feature subset has been used. The SVM classifier’s accuracy has been used as the fitness function. The Whale Optimization Algorithm leverages the combined abilities of exploration and exploitation, inspired by the behavior of whales, to effectively search for the optimal subset of features that enhance performance of machine learning models. When working with high-dimensional datasets this strategy can be helpful because it can improve computational efficiency and enhance model interpretability. The humpback whale’s natural behaviour inspired WOA. This algorithm’s model comprises three operators: 1) simulating humpback whale’s search for prey (exploration phase), 2) encircling the prey 3) bubble-net foraging (exploitation phase) [39 –43].

The WOA algorithm starts with a set of the randomly generated solutions (population of whales). Each solution is then evaluated using the proposed fitness function (SVM classifier’s accuracy). The fittest solution in the population is marked as X^* (prey). In each iteration, solutions (Whales) update their positions according to each other to mimic the bubble-net attacking and prey searching methods. To mimic the bubble-net attacking, a probability of 0.5 is assumed to choose between the shrinking encircling mechanism. When the shrinking encircling mechanism is employed, a balance between exploration and exploitation is required. A random vector A which contains values greater than 1 or less than 1 is used. If the random vector value is greater than 1 then the exploration (searching for prey method) is employed by searching in the neighborhood of a randomly selected solution, while the neighborhood of best solution so far is exploited when random vector value is less than 1. This process is repeated until the stopping criteria is acheived, which is equal to maximum number of iterations (100). The parameters used in WOA has been outlined in Table 2. The pseudo code of WOA are given as follows.

Table 2
Outline of Parameters used for WOA

Parameter Value Definition

N 10 No. of search agents

t 100 No. of iterations

b 1 Spiral shape constant

a Range [2, 0] Convergence factor

r 0 to 1 Random number

Parameter	Value	Definition
N	10	No. of search agents
t	100	No. of iterations
b	1	Spiral shape constant
a	Range [2, 0]	Convergence factor
r	0 to 1	Random number

Input: Feature vectors

Pseudocode:

Initialize N, the population of search agents (whales), and the initial settings.

Compute the whale’s fitness (fitness function is the SVM classifier’s accuracy)

Find out the solution (X^*) having the maximum fitness value. This is considered as the location of the best whale observed so far. This step is called Prey Encircling.

Update whale’s location based on shrinking encircling and spiral-shaped paths

Execute shrinking encircling behavior, if switch probability p < 0.5

Calculate the coefficient vectors $\vec{A}$ and $\vec{C}$ using (6) and (7) $\vec{A} = 2 \vec{a} \cdot \vec{r} - \vec{a}$ (6) $\vec{C} = 2 \cdot \vec{r}$ (7)

Update the position of each whale

Employ shrinking encircling mechanism and spiral updating position method using (8). $\vec{X} (t + 1) = {\vec{D}}^{'} \cdot e^{bl} \cdot cos (2 π l) + {\vec{X}}^{*} (t)$ (8) ${\vec{D}}^{'} = {\vec{X}}^{*} (t) - \vec{X} (t)$ indicates the distance seen between the current best whale and prey.

Compute the chance of whales switching between the two behaviors is 50%, which is described using the equation (9) $\vec{X} (t + 1) = {\begin{matrix} {\vec{X}}^{*} (t) - \vec{A} \cdot \vec{D} & if p < 0.5 \\ {\vec{D}}^{'} \cdot e^{bl} \cdot cos (2 π l) + {\vec{X}}^{*} (t) & if p ⩾ 0.5 \end{matrix}$ (9) where p is a number drawn at random from the range [0, 1].

Update the location of the whales using the randomly chosen whale instead of the best whale to perform a global search. By this a global search is performed with the help of behaviour of the humpback whales. The whale’s new position is determined using the equations (10) and (11) $\vec{X} (t + 1) = {\vec{X}}^{*} (t) - \vec{A} \cdot \vec{D}$ (10) $\vec{D} = | \vec{C} \cdot {\vec{X}}^{*} (t) - \vec{X} (t) |$ (11) Where ${\vec{X}}^{*} (t)$ denotes the location of the randomly selected whale from the available whales. The position of n whales is updated in every generation till a terminating condition is reached.

Repeat steps 3 to 6 until convergence is met.

Output: Optimal feature subset

3.5 Classification

The number of input features selected are combined as a single training set for training the MLP classifier. In this work, the MLP classifier is used, which consists of a single Feed forward Neural Network (FNN) with an input layer of 24 neurons corresponding to the number of optimal features, three hidden layers of 100 neurons each, and an output layer of two neurons for classes 1 and 2. The initial weights and biases have been initialized at random. In the hidden layer, the activation function has been sigmoidal [44]. Table 3 shows the parameter settings of the MLP classification subsystem. The pseudocode for MLP classifier is outlined as follows.

Table 3
Outline of Parameters used for MLP Classifier

Dataset Covid-19 dataset

Learning algorithm MLP

No. of nodes in input layer using WOA for FS 24

Number of hidden layer nodes 2f

Output layer nodes 2 (Class 1, Class 2)

Activation function Sigmoidal function

Dataset division Training-testing divide into (80:20)

Dataset	Covid-19 dataset
Learning algorithm	MLP
No. of nodes in input layer using WOA for FS	24
Number of hidden layer nodes	2f
Output layer nodes	2 (Class 1, Class 2)
Activation function	Sigmoidal function
Dataset division	Training-testing divide into (80:20)

Input: Optimal feature subset

Pseudocode:

Initialize the input weight vectors w_ij, i = 1,2, . . . ,f, j = 1,2, . . . ,2f, where f is the number of selected features.

Calculate the hidden nodes’ input iH using (12). $i H_{j} = x_{i} w_{iwi} + B_{j}, i = 1, 2, \dots, f; j = 1, 2, \dots, 2 f$ (12) where, x_i is the input feature value, w_iwi is the weight between the input layer and the hidden layer, and B_j represents the random bias matrix.

Compute the hidden nodes’ output H_j using (13). $H_{j} = \frac{1}{1 + e^{- i H_{j}}}, j = 1, 2, \dots, 2 f$ (13)

Compute the weights between both the hidden and output layer using (14) $W_{k} = H_{k}^{+} T, k = 1, 2, \dots, 2 f$ (14) Where, $H_{k}^{+}$ is the Moore-Penrose inverse of the output H from the hidden layer, and T indicates the target class.

Compute the value of the neurons in the output layer using (15) $O_{i} = \sum_{i = 1}^{2 n} H_{j} W_{i}$ (15) Where, W_l is the weight of both the hidden and output layers.

Output: Trained MLP classifier.

4 Experimental results

The experiment uses the real-time Covid-19 dataset collected from Bharat Scan Centre, Tamil Nadu. This section describes the experimentation dataset and results.

4.1 Dataset description

The real-time dataset consists of two classes of chest CT slices namely, patients having Covid-19 and patients without Covid-19, which is collected from the Bharat Scan Centre in Chennai, Tamil Nadu, India, between June and December 2020. To protect privacy, Personally Identifiable Information (PII) has been removed from all CT slices. Although the Covid-19 CT slices include enough data, proper annotated labels are also required. Each ROIs has been identified, manually labelled by expert radiologist. Table 4 provides an outline of the dataset experimented in this research work.

Table 4
Dataset Outline

Dataset No. of Patients Total no. of Slices No. of ROIs Training Set ROIs Testing Set ROIs

Covid–19 Disease 26 342 343 274 69

Normal Lung 15 446 452 362 90

Total 41 788 795 636 159

Dataset	No. of Patients	Total no. of Slices	No. of ROIs	Training Set ROIs	Testing Set ROIs
Covid–19 Disease	26	342	343	274	69
Normal Lung	15	446	452	362	90
Total	41	788	795	636	159

4.2 Performance evaluation

The performance of the proposed CAD system using the real-time Covid-19 dataset has been compared with eight machine learning classifiers namely, K Nearest Neighbour (KNN) [45], Linear Discriminant Analysis (LDA) [46], Logistic Regression (LR) [47], Naïve Bayes (NB) [48], Extreme Gradient Boosting (XGB) [49], AdaBoost (AB) [50] and kernel SVM [46] and Linear SVM [46] with an accuracy of 84%, 88%, 87%, 60%, 87%, 86%, 65% and 84% respectively. The results of the performance measures compared with the above-mentioned classifiers have been shown in Fig. 2. The extraction of Covid-19 lesions from a chest CT slice with Covid-19 disease is shown in Figs. 3a to 3d. Figure 3a shows the input CT slice. Figure 3b shows the segmented Slice obtained from Otsu’s thresholding. The extracted ROIs have been shown in Fig. 3c and the extracted Covid-19 lesions have been shown in Fig. 3d. Figures 4a to 4d show the steps towards obtaining ROIs that confirms the absence of Covid-19 disease. Two classes of ROIs have been used, where class 1 represents the ROIs with Covid-19 and class 2 represents the ROIs without Covid-19. The algorithm’s optimization performance was assessed by evaluating the accuracy, precision, specificity, and sensitivity using equations (16) to (19). $Accuracy = \frac{(TP + TN)}{(TP + TN + FP + FN)}$ (16) $Specificity = \frac{TN}{(TN + FP)}$ (17) $Sensitivity = \frac{TP}{(TP + FN)}$ (18) $Precision = \frac{TP}{(TP + FP)}$ (19)

Where, TP, FP, FN and TN denotes True Positives, False Positives, False Negatives and True Negatives respectively.

Fig. 2

Performance Metrics Comparison with other Classifiers.

Fig. 3

Experimental Images for Covid-19 Disease.

Fig. 4

Experimental Images for Normal Lung Slices.

Table 5 displays the confusion matrix generated for WOA. Table 6 compares the performance of the CAD system with and without WOA. The precision, accuracy, recall, and specificity values are calculated for 24 features with WOA and 80 features without WOA. The system achieved a maximum accuracy of 88.94% using WOA. The aim is to decrease the false negative (FN) and false positive (FP) values i.e., to increase the sensitivity and specificity respectively. However, there is often a trade-off between sensitivity and specificity as one increases the other decreases. In the proposed research, we got inferences from the radiologist. He reviewed the model and provided feedback, suggesting that although it works well, more CT slices should be included so that it may be used to diagnose different lung diseases. Statistical analysis namely Mann Whitney U test, Friedman test and Kendall’s Rank Correlation Coefficient test has been performed on the novel dataset used. The results of a Mann-Whitney U-Test demonstrated that the difference between two variables in relation to the dependent variable was statistically significant, with p = < 0.001. Thus, the null hypothesis is rejected. A Friedman test showed that there was a significant difference between the dependent variables, p = < 0.001. So the null hypothesis value is rejected. Kendall’s rank correlation graph, represented by Kendall’s tau, is important because it offers a robust, nonparametric way to measure and interpret the association between two variables [51]. It is a statistical measure used to assess the strength and association between the features. Kendall’s relation is particularly important when analysing relationships between variables based on the list of features extracted. For example, for the extracted feature “Homogeneity”, Kendall’s tau can help determine if homogeneity measures are correlated with other attributes such as texture or contrast. The result of kendall’s rank is a correlation coefficient, which indicates the degree of agreement or disagreement between the two variables. The value of Kendall Tau ranges from –1 to+1, with 0 indicating no association. Here –1 shows the perfect discordance and+1 shows the perfect concordance. The Kendall’s rank correlation coefficient graph has been shown in Fig. 5.

Table 5

Confusion Matrix

Actual and predicted	Correctly identified	Incorrectly identified
Correctly identified	105	4
Incorrectly identified	18	72

Table 6

Performance Comparison

FS using WOA	No. of Features	Accuracy (%)	Precision (%)	Specificity (%)	Sensitivity (%)
With FS	24	88.94	96.33	94.73	85.36
Without FS	80	80.40	77.06	75.24	85.71

Fig. 5

Kendall’s Rank Correlation Coefficient Graph.

5 Conclusion and scope for future

A CAD system to diagnose the presence and absence of Covid-19 has been designed and implemented. The lung parenchyma has been segmented from the input chest CT slices, and ROIs have been extracted. The accuracy of the SVM classifier has been used along with WOA as the fitness function for the wrapper-based feature selection. The MLP classifier has been used to train the optimal feature subset, and performance measurements have been used to assess the efficiency of the CAD system. The goal of our research is to improve the classification accuracy and aid physicians in clinical decision making. Hence time complexity and space complexity are not the primary interest of this research work. Today physicians using CAD system for Clinical Decision Making System, deploy the system in a cloud environment and access the CAD system. Similarly, the CAD system developed in this research work can be deployed in the cloud environment. The deployment of the CAD system in cloud has the advantages like scalability, accessibility and cost-effectiveness. The maintenance overhead can also be reduced using the cloud environment by optimizing the resources. The proposed CAD system can identify the presence or absence of the Covid-19 disease. This system can be further enhanced to find the severity of the disease. The real-time Covid-19 dataset was used to test the system, and the results of the experiments showed that the framework with WOA was more accurate (88.94%) than the one without WOA (80.40%). Statistical analysis namely, Friedman test, Mann Whitney U test and Kendall rank correlation coefficient test have been performed which indicates that the proposed method has significant impact on the sensitivity, accuracy, and specificity on the novel dataset considered.

The CAD system created for diagnosing the Coronavirus can be improved by identifying its variants. The Coronavirus disease can be further extended in terms of its varying levels of severity such as low, medium and high. Additionally, other chronic lung diseases and their severity levels can also be diagnosed.

References

, Chen

, Wang

, Feng

, Zhou

, Li

, Zhong

, Hao

, Evolution of the novel coronavirus from the ongoing Wuhan outbreak and modelling of its spike protein for risk of human transmission, Science China Life Sciences 63 (2020), 457–460.

Huang

, Wang

, Li

, Ren

, Zhao

, Hu

, Zhang

, Fan

, Xu

, Gu

, Cheng

, Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China, The lancet 395 (2020), 497–506.

, Guan

, Wu

, Wang

, Zhou

, Tong

, Ren

, Leung

K.S.

, Lau

E.H.

, Wong

J.Y.

, Xing

, Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia, New England Journal of Medicine 382 (2020).

World Health Organisation Coronavirus (COVID-19) Dashboard, https://covid19.who.int/.

Shi

, Wang

, Shi

, Wu

, Wang

, Tang

, He

, Shi

, Shen

, Review of artificial intelligence techniques in imaging data acquisition, segmentation, and diagnosis for COVID-19, IEEE reviews in Biomedical Engineering 14 (2020), 4–15.

, Zhang

, Wang

, Huang

, Song

, Chest CT manifestations of new coronavirus disease 2019 (COVID-19): a pictorial review, European Radiology 8 (2020), 4381–4389.

Liu

, Wang

, Liu

, Sun

, Peng

, Differentiating novel coronavirus pneumonia from general pneumonia based on machine learning, Biomedical Engineering Online 1 (2020), 1–4.

Liu

, Zhao

, Yu

, Heidari

A.A.

, Li

, Ouyang

, Chen

, Mafarja

, Turabieh

, Pan

, Ant colony optimization with Cauchy and greedy Levy mutations for multilevel COVID 19 X-ray image segmentation, Computers in Biology and Medicine 1(136) (2021), 104609.

Rubin

G.D.

, Ryerson

C.J.

, Haramati

L.B.

, Sverzellati

, Kanne

J.P.

, Raoof

, Schluger

N.W.

, Volpi

, Yim

J.J.

, Martin

I.B.

, Anderson

D.J.

, The role of chest imaging in patient management during the COVID-19 pandemic: a multinational consensus statement from the Fleischner Society, Radiology 296 (2020), 172–180.

10.

Shi

, Han

, Jiang

, Cao

, Alwalid

, Gu

, Fan

, Zheng

, Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study, The Lancet Infectious Diseases 20 (2020), 425–434.

11.

Simon

B.A.

, Christensen

G.E.

, Low

D.A.

, Reinhardt

J.M.

, Computed tomography studies of lung mechanics, Proceedings of the American Thoracic Society 2 (2005), 517–521.

12.

Zhou

S.K.

, Greenspan

, Davatzikos

, Duncan

J.S.

, Van Ginneken

, Madabhushi

, Prince

J.L.

, Rueckert

, Summers

R.M.

, A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises, Proceedings of the IEEE 5 (2021), 820–838.

13.

, Yang

, Hou

, Zhan

, Chen

, Lv

, Tao

, Sun

, Xia

, Correlation of chest CT and RT-PCR testing in coronavirus disease (COVID-19) in China: a report of cases, Radiology 1 (2020).

14.

Huang

, Han

, Ai

, Yu

, Kang

, Tao

, Xia

, Serial quantitative chest CT assessment of COVID-19: a deep learning approach, Radiology: Cardiothoracic Imaging 2 (2020).

15.

Isaac

, Khanna Nehemiah

, Kannan

, Computer-Aided Diagnosis System for Diagnosis of Cavitary and Miliary Tuberculosis Using Improved Artificial Bee Colony Optimization, IETE Journal of Research 8 (2021), 1–20.

16.

Isaac

, Nehemiah

H.K.

, Dunston

S.D.

, Christo

V.E.

, Kannan

, Feature selection using competitive coevolution of bio-inspired algorithms for the diagnosis of pulmonary emphysema, Biomedical Signal Processing and Control 1(72) (2022), 103340.

17.

Sreejith

, Nehemiah

H.K.

, Kannan

, A classification framework using a diverse intensified strawberry optimized neural network (DISON) for clinical decision-making, Cognitive Systems Research 64 (2020), 98–116.

18.

Isaac

, Nehemiah

H.K.

, Dunston

S.D.

, Kannan

Feature selection and classification using bio-inspired algorithms for the diagnosis of pulmonary emphysema subtypes, International Journal of Imaging Systems and Technology (2023).

19.

Chandrashekar

, Sahin

, A survey on feature selection methods, Computers & Electrical Engineering 40 (2014), 16–28.

20.

Saeys

, Inza

, Larranaga

, A review of feature selection techniques in bioinformatics, Bioinformatics 23 (2007), 2507–2517.

21.

Diao

, Shen

, Nature inspired feature selection meta-heuristics, Artificial Intelligence Review 44 (2015), 311–340.

22.

Yusta

S.C.

, Different metaheuristic strategies to solve the feature selection problem, Pattern Recognition Letters 30 (2009), 525–534.

23.

Isaac

, Nehemiah

H.K.

, Isaac

, Kannan

, Computer-Aided Diagnosis system for diagnosis of pulmonary emphysema using bio-inspired algorithms, Computers in Biology and Medicine 124 (2020), 103940.

24.

Sweetlin

J.D.

, Nehemiah

H.K.

, Kannan

, Computer aided diagnosis of pulmonary hamartoma from CT scan images using ant colony optimization based feature selection, Alexandria Engineering Journal 57 (2018), 1557–1567.

25.

Raj

C.S.R.

, Elizabeth

D.S.

, Nehemiah

H.K.

, Distance based genetic algorithm for feature selection in computer aided diagnosis systems, Current Medical Imaging Reviews 13 (2017), 284–298.

26.

Heidari

A.A.

, Faris

, Mirjalili

, Ant Lion Optimizer: Theory, Literature Review, and Application in Multi-Layer Perceptron Neural Networks, Studies in Computational Intelligence 811 (2020), 23–46.

27.

Yamany

, Fawzy

, Tharwat

Moth-Flame Optimization for Training Multi-layer Perceptrons, Proceedings In 11th International Computer Engineering Conference (ICENCO) (2015), 267–272.

28.

Moura

, Dartora

, Mattjie

, Barros

, Silva

A.M.

Texture-based feature extraction for Covid-19 pneumonia classification using chest radiography, EAI Endorsed Transactions on Bioengineering and Bioinformatics (2021), 1–9.

29.

Godbin

A.B.

, Jasmine

S.G.

, Screening of COVID-19 based on GLCM features from CT images using machine learning classifiers, SN Computer Science 4 (2022), 133.

30.

Duchesne

, Gourdeau

, Archambault

, Chartrand-Lefebvre

, Dieumegarde

, Forghani

, Gagne

, Hains

, Hornstein

, Le

, Lemieux

Tracking and predicting Covid-19 radiological trajectory using deep learning on chest x-rays, Initial accuracy testing, (2020).

31.

Mafarja

M.M.

, Mirjalili

, Hybrid whale optimization algorithm with simulated annealing for feature selection, Neurocomputing 260 (2017), 302–312.

32.

Wasule

, Sonar

Classification of brain MRI using SVM and KNN classifier, In 2017 Third International Conference on Sensing, Signal Processing and Security (ICSSS), (2017), 218–223.

33.

Mafarja

, Mirjalili

, Whale optimization approaches for wrapper feature selection, Applied Soft Computing 62 (2018), 441–453.

34.

Hoda Zamani

, Shahraki

, Hossein , Feature Selection Based on Whale Optimization Algorithm for Diseases Diagnosis, International Journal of Computer Science and Information Security 14 (2016), 1243–1247.

35.

Otsu

, A threshold selection method from gray-level histograms, IEEE transactions on systems, man, and cybernetics 1 (1979), 62–66.

36.

Gonzalez

R.C.

Digital image processing, Pearson Education India, 2009.

37.

Betshrine

R.R.

, Nehemiah

K.H.

, Marishanjunath

C.S.

, Manoharan

R.M.

, Diagnosis of Pulmonary Edema and Covid-19 from CT slices using squirrel search algorithm, support vector machine and back propagation neural network, Journal of Intelligent and Fuzzy Systems (2023), 1–4.

38.

Sweetlin

J.D.

, Nehemiah

H.K.

, Kannan

, Feature selection using ant colony optimization with tandem-run recruitment to diagnose bronchitis from CT scan images, Computer Methods and Programs in Biomedicine 145 (2017), 115–125.

39.

Goldbogen

J.A.

, Friedlaender

A.S.

, Calambokidis

, McKenna

M.F.

, Simon

, Nowacek

D.P.

, Integrative approaches to the study of baleen whale diving behavior, feeding performance, and foraging ecology, BioScience 63 (2013), 90–100.

40.

Hof

P.R.

, Van der Gucht

, Structure of the cerebral cortex of the humpback whale, Megaptera novaeangliae (Cetacea, Mysticeti, Balaenopteridae). The Anatomical Record: Advances in Integrative Anatomy and Evolutionary Biology, Advances in Integrative Anatomy and Evolutionary Biology 290 (2007), 1–31.

41.

Mirjalili

, Lewis

, The whale optimization algorithm, Advances in engineering software 95 (2016), 51–67.

42.

Sharawi

, Zawbaa

H.M.

, Emary

Feature selection approach based on whale optimization algorithm. In 2017 Ninth international conference on advanced computational intelligence (ICACI) 2017 Feb 4 (pp. 163–168). IEEE.

43.

Zamani

, Nadimi-Shahraki

M.H.

, Feature selection based on whale optimization algorithm for diseases diagnosis, International Journal of Computer Science and Information Security 14 (2016), 1243.

44.

Ruck

D.W.

, Rogers

S.K.

, Kabrisky

, Feature selection using a multilayer perceptron, Journal of Neural Network Computing 2 (1990), 40–48.

45.

Jyothula

L.H.

, Eppa

, Indhukuri

A.V.

, Mehdi

M.J.

Lung Cancer Detection in CT Scans Employing Image Processing Techniques and Classification by Decision Tree (DT) and K-Nearest Neighbor (KNN), In 2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN) (2023), 1–5.

46.

Basalamah

, Hasan

, Bhowmik

, Shahriyar

S.A.

, A Highly Accurate Dysphonia Detection System Using Linear Discriminant Analysis, Computer Systems Science and Engineering 44 (2023).

47.

Awad

F.H.

, Hamad

M.M.

, Alzubaidi

, Robust classification and detection of big medical data using advanced parallel K-means clustering, YOLOv4, and logistic regression, Life 13 (2023), 691.

48.

Zaw

H.T.

, Maneerat

, Win

K.Y.

, Brain tumor detection based on Naïve Bayes Classification, In 2019 5th International Conference on engineering, applied sciences and technology (ICEAST) 2 (2019), 1–4.

49.

Bakasa

, Viriri

, Vgg16 feature extractor with extreme gradient boost classifier for pancreas cancer prediction, Journal of Imaging 9 (2023), 138.

50.

Minz

, Mahobiya

MRimage classification using adaboost for brain tumor type, In 2017 IEEE 7th International Advance Computing Conference (IACC) (2017), 701–705.

51.

Ponomarenko

, Ieremeiev

, Lukin

, Egiazarian

, Jin

, Astola

, Vozel

, Chehdi

, Carli

, Battisti

, C.C. Kuo Color image database TID2013: Peculiarities and preliminary results, In European workshop on visual information processing (EUVIP), (2013), 106–111.

Diagnosis of Covid-19 from CT slices using Whale Optimization Algorithm,Support Vector Machine and Multi-Layer Perceptron

Abstract

BACKGROUND:

OBJECTIVE:

METHODS:

RESULTS:

CONCLUSION:

Keywords

1 Introduction

2 Related works

3 System framework

3.3 Feature extraction

Table 2 Outline of Parameters used for WOA Parameter Value Definition N 10 No. of search agents t 100 No. of iterations b 1 Spiral shape constant a Range [2, 0] Convergence factor r 0 to 1 Random number

4.1 Dataset description

Table 4 Dataset Outline Dataset No. of Patients Total no. of Slices No. of ROIs Training Set ROIs Testing Set ROIs Covid–19 Disease 26 342 343 274 69 Normal Lung 15 446 452 362 90 Total 41 788 795 636 159

References

Table 2
Outline of Parameters used for WOA

Parameter Value Definition

N 10 No. of search agents

t 100 No. of iterations

b 1 Spiral shape constant

a Range [2, 0] Convergence factor

r 0 to 1 Random number

Table 4
Dataset Outline

Dataset No. of Patients Total no. of Slices No. of ROIs Training Set ROIs Testing Set ROIs

Covid–19 Disease 26 342 343 274 69

Normal Lung 15 446 452 362 90

Total 41 788 795 636 159