Abstract
Deep learning-based models are employed in computer-aided diagnosis (CAD) tools development for pediatric pneumonia (P-Pneumonia) detection. The accuracy of the model depends on the scaling of the deep learning model. A survey on deep learning shows that models with a greater number of layers achieve better performances for P-Pneumonia detection. However, the identification of the optimal models is considered to be important work for P-Pneumonia detection. This work presents a hybrid deep learning model for P-Pneumonia detection. The model leverages the EfficientNetV2 model that employs various advanced methodologies to maintain the balance between the model scaling and the performance of the model in P-Pneumonia detection. The features of EfficientNetV2 models are passed into global weighted average pooling (GWAP) which acts like an attention layer. It helps to extract the important features that point to the infected regions of the radiography image and discard all the unimportant information. The features from GWAP are high in dimension and using kernel-based principal component analysis (K-PCA), the features were reduced. Next, the reduced features are combined together and passed into a stacked classifier. The stacked classifier is a two-stage approach in which the first stage employs a support vector machine (SVM) and random forest tree (RFT) for the prediction of P-Pneumonia using the fused features and logistic regression (LRegr) on values of prediction for classification. Detailed experiments were done for the proposed method in P-Pneumonia detection using publically available benchmark datasets. Various settings in the experimental analysis are done to identify the best model. The proposed model outperformed the other methods by improving the accuracy by 4% in P-Pneumonia detection. To show that the proposed model is robust, the model performances were shown on the completely unseen dataset of P-Pneumonia. The hybrid deep learning-based P-Pneumonia model showed good performance on completely unseen data samples of P-Pneumonia patients. The generalization of the proposed P-Pneumonia model is studied by evaluating the model on similar lung diseases such as COVID-19 (CV-19) and Tuberculosis (TBS). In all the experiments, the P-Pneumonia model has shown good performances on similar lung diseases. This indicates that the model is robust and generalizable on data samples of different patients with similar lung diseases. The P-Pneumonia models can be used in healthcare and clinical environments to assist doctors and healthcare professionals in improving the detection rate of P-Pneumonia.
Introduction
Pneumonia is an infectious disease caused by pathogens such as bacterial infection or a virus. It is an acute respiratory infection that mainly affects the lungs known as the alveoli. According to the reports published by the world health organization (WHO), the number of cases of pneumonia disease is gradually increasing over the years globally [1]. Recent years’ reports show that the number of deaths is more than 1.4 million children under the age of five years and the total percentage of deaths due to pneumonia compared to other diseases in children is 18% globally. It is predominant in South Asia and sub-Saharan Africa. Fever, issues related to breathing, cough, and cold are some of the common symptoms of P-Pneumonia [2]. P-Pneumonia diagnosis caused by virus pathogens is more challenging compared to bacterial P-Pneumonia. Bacterial P-Pneumonia can be treated by using antibiotics. Viral Pneumonia diagnosis can be done using radiography imaging. In recent days, radiography imaging techniques are effective in the diagnosis of P-Pneumonia [3]. Two types of imaging techniques are most commonly used in radiography of P-Pneumonia disease diagnosis. They are C-X-rays and chest computed tomography scans (CCTS). CCTS provides a 3D view of human organs with better imaging quality while C-Xray is able to provide a 2D image. The 3D view helps for a better and more detailed investigation of lung organs’ infected regions. The C-Xray device is smaller and less complex compared to CCTS. C-Xray is the most commonly used radiography imaging approach in the healthcare and medical environment. This is mainly due to the reason that the cost of C-Xray is lesser than the CCTS. The time required for imaging is less in C-Xray with lesser radiation exposure compared to CCTS.
Radiologists examine the C-Xray samples of patients to diagnose P-Pneumonia [4]. An early disease diagnosis of P-Pneumonia is essential otherwise leads to death. An expert radiologist’s results are accurate in the analysis of C-Xray of P-Pneumonia. In addition, there are high chance that expert radiologists can also misclassify the P-Pneumonia C-Xray as Healthy. Overall, the manual diagnosis of C-Xray is not cost-effective and in some cases, the analysis is not faster [5]. A report by WHO shows that global shortage of radiologists in radiology imaging globally [5]. Most importantly, the ratio of patients and doctors are not same in the developing countries compared to developed countries. Recently, CAD tools introduced to automate the analysis of radiology imaging. The CAD tools leverage advanced machine learning-based algorithms to learn the effective features to accurately detect and classify the P-Pneumonia using C-Xray and CCTS.
The application of employing machine learning and deep learning in medical and healthcare problems is considered to be one of the significant directions in research [5]. Various machine learning models are leveraged by researchers to the problems in healthcare and the medical environment. These models rely on feature engineering and optimal feature selection requires extensive domain-level knowledge of healthcare and medicine. For example, in the example of lung disease detection using chest radiographs, researchers employed various texture feature engineering approaches and classical machine learning models are trained on the features [5]. Recent literature reported by the researchers shows that the deep learning model performed better than the classical machine learning models in lung disease detection [5]. Mainly, the deep learning model automates feature engineering in lung disease detection. Researchers reported that convolutional neural network (CNN)-based pretrained models showed better performances compared to the normal CNN models that are trained from scratch in P-Pneumonia detection. A literature survey on P-Pneumonia detection shows that various CNN-based pretrained models are leveraged by researchers with the aim to achieve a better detection rate of P-Pneumonia. The literature on CNN and deep learning shows that researchers have done a detailed study and experiments with the ImageNet benchmark dataset to identify the good and better architecture. During the experiments, researchers considered the stability of model accuracy, the speed of a model during training and testing, FLOPs, etc. For example, recent studies on deep learning model performance show that the models like DenseNet and EfficientNet perform better for classification tasks compared to other CNN-based models, reducing parameter size and FLOPS by an order of magnitude. EfficientNetV2 is an enhanced model of EfficientNet and it has the capability to achieve better performances in P-Pneumonia detection. The current work investigates the performances of the EfficientNetV2 model for P-Pneumonia detection on publically available benchmark datasets of P-Pneumonia. The major contributions of the proposed work are given below This work presents a hybrid of the EfficientNetV2 model for P-Pneumonia detection. The model takes advantage of features from different scaling of the EfficientNetV2 model to achieve a better detection rate of P-Pneumonia. GWAP is employed in the CNN-based models to extract the features from the P-Pneumonia affected regions from the C-Xray. Dimensionality reduction of EfficientNetV2 model features to extract the important features to accurately detect P-Pneumonia. Feature fusion of EfficientNetV2 models to enhance the detection rate of P-Pneumonia. To enhance the classification of the P-Pneumonia model, stacked classifiers are employed on the fused features of the EfficientNetV2 model. Detailed investigation and analysis of the EfficientNetV2 model are shown for P-Pneumonia detection. To show that the proposed model is robust and generalizable, the training model performances are shown on a completely unseen dataset of P-Pneumonia. Comparison of the proposed EfficientNetV2-based model with other methods.
The remaining part of the paper is structured as follows. A detailed literature survey on P-Pnemumia classification using machine learning and deep learning is included in Section 2. The proposed method is presented in Section 3. Section 4 provides a description of the P-Pneumonia dataset and statistical metrics for evaluating the performance of the proposed model are included in Section 5. Section 6 presents the results and discussion of the proposed method. The paper concludes in Section 7 with future works.
The literature on deep learning shows that the models using deep learning methods showed good performance in various applications. Deep learning approaches are used for disease detection and classification in the field of medicine and healthcare. Mainly, the recent literature survey shows that the researchers employed CNN-based pretrained models successfully for lung disease detection and classification [5]. Though there are many works related to pneumonia detection in lung disease, the literature shows that the works related to P-Pneumonia are in the initial stage as there is a lesser number of published works on employing deep learning models for P-Pneumonia detection and classification using chest radiographs. This work proposes a hybrid of CNN-based deep learning model for P-Pneumonia detection. The performance of the proposed method is shown on more than one publically available dataset. This is mainly due to the reason that to show that the proposed model is robust and generalizable in P-Pneumonia detection and classification in medical and healthcare environments. A recent literature survey shows that deep learning and AI-based models can be effectively utilized in P-Pneumonia detection and classification to enhance the accuracy of the medical experts in healthcare and the medical environment. The model development on a CNN-based pretrained model has been improved in recent years. While developing a new model, the most important parameters considered are accuracy, speed of a model during training, validation, testing stages, etc. The current work leverages an efficient CNN-based pretrained model which can achieve better performances for P-Pneumonia detection with a lesser number of deep learning model parameters.
Texture analysis of ultrasound imaging is done for the detection of pneumonia in pediatric patients [6]. Authors have shown various statistical analyses on the texture features to effectively detect P-Pneumonia. A comparison of various classical machine learning algorithms is evaluated for P-Pneumonia detection using texture features [7]. Kernel methods such as SVM have shown better performances compared to other methods. An explainable deep learning approach is proposed for P-Pneumonia detection and the authors reported that the proposed model shows accuracy in the range of 90% -95% on the publically available dataset [8]. The proposed approach finetuned the ImageNet-based VGG-16 model on chest radiographs of children the ages 1-5. Transfer learning of a CNN-based model is proposed for P-Pneumonia detection using optical coherence tomography (OCT) images [9]. The authors reported that the proposed model achieves more than 95% of accuracy. The study materials such as the dataset and the codes for the implementation of the deep learning model publically made available by the authors for further research. A Deep CNN-based approach is better than the classical machine learning-based models for P-Pneumonia detection [10]. This is mainly due to the reason that the classical machine learning model relies on manual feature engineering. Identifying the right and optimal features for the accurate classification of pneumonia is not an easy task. The detailed analysis of deep CNN and manual feature engineering with classical machine learning model performance is studied by the authors. The study reported that the deep CNN model showed 80% accuracy and the classical classification model with manual feature engineering showed 69% accuracy. This indicates that the deep learning model has the capability to show better performances compared to the classical model with manual feature engineering for P-Pneumonia classification.
A Deep CNN-based approach is proposed by authors for P-Pneumonia detection [11]. In this work, the authors study the analysis of the proposed approach with a big annotated dataset. The performance of the proposed approach is studied with different loss functions with a different number of data samples in training and testing datasets. Most importantly, the class imbalance problem is studied in detail in the result and discussion of P-Pneumonia detection. The authors reported the accuracy of the proposed approach in the range of 60% -80%. A ResNet-based model was proposed by the authors for P-Pneumonia detection [12]. The model showed an accuracy of more than 95% in all the test experiments. However, there may be a possibility that the train and test datasets are not completely disjoint from each other. This may be one of the reasons the model is showing better performances even though the dataset is highly imbalanced. A fine-grained CNN-based approach was proposed for P-Pneumonia detection [13]. The author reports 100% accuracy using the publically available benchmark dataset. The authors reported that the proposed approach performed better than the existing methods such as Inception, ResNet, and VGG. This dataset is highly imbalanced and a detailed experimental analysis and its results are missing for this study. Hence, the work cannot be considered completely robust and generalizable for newer patient data samples in a healthcare environment. Authors employed ResNet and DenseNet-based models with the YoloV3 model to detect pneumonia in pulmonary chest radiographs in children [14]. This study reported accuracy in the range of 80% -90% in both detecting pneumonia and classifying pneumonia into its sub-family of pneumonia disease. Various test cases are shown by the authors mainly with the aim to show that the proposed model is robust in P-Pneumonia detection and classification. The method requires further enhancement to improve the detection rate of P-Pneumonia. This can be done by exploring different CNN-based pretrained models and modification to the pretrained models can improve the detection rate of P-Pneumonia.
A residual CNN-based approach is proposed by the author to detect pneumonia using chest radiographs [15]. The proposed approach showed 90% accuracy on the P-Pneumonia dataset and it showed better accuracy than the other models such as VGG16, DenseNet121, InceptionV3, and Xception. The proposed model results in a high misclassification rate and this can be improved by handling the imbalance of the patient data samples during training. It indicates that the proposed method is not robust for imbalanced datasets of P-Pneumonia. Authors have reported that the performance shown by models such as NASNetLarge and DenseNet121 is better in various test cases for P-Pneumonia detection. Both the models improved the performances of ResNet50, InceptionV3, VGG16, NASNetMobile, Xception, and InceptionResnetV2 [16]. Authors propose a lightweight CNN-based model for P-Pneumonia detection [17]. The authors reported 94% test accuracy and claimed that the proposed CNN performs better than the residual CNN. Since the dataset is highly imbalanced, the author employed a data augmentation approach to balancing the number of samples in the classes in the pneumonia dataset. Authors reported more than 99% accuracy in detecting pneumonia and classifying it into either viral or bacterial using a densely connected residual deep learning model [18]. The performance of the proposed study is compared with the existing studies and reported that the proposed approach showed better performance. Most importantly the model performed better in all the test experiments. The authors report an accuracy of 96.47% on P-Pneumonia detection using a hybrid deep learning approach [19]. In a hybrid deep learning approach, the authors used feature extraction using deep learning and classification using machine learning classifiers. The model has shown good performances for P-Pneumonia detection using benchmark P-Pneumonia dataset compared to other existing models. To enhance the performance of the P-Pneumonia detection model, the authors employ a fusion of texture features and features of a CNN-based model such as VGG [20]. The proposed model achieved 92.19% accuracy which is higher compared to the non-fusion of texture and features of the CNN-based model. With the aim to detect and classify the etiology of P-Pneumonia, the authors propose a ResNet-50-based model. The model performances are evaluated on the private dataset that was collected from healthcare and medical environments [21]. The authors report that similar to medical and healthcare experts’ decisions, the AI-based models can be deployed in healthcare and medical environments to use the tool as an early diagnosis tool for P-Pneumonia detection. A Deep CNN-based approach is proposed for P-Pneumonia detection [22]. The authors reported that the method achieves 92.7% accuracy and it outperformed the existing methods including IncpetionV3 and ResNet50 models.
An ensemble model of explainable deep learning approach and CNN models is proposed by the authors to enhance the detection performance of the CNN models by supporting the domain knowledge of medical experts [23]. The results shown by the authors indicate that the models with the support of the medical domain showed better performances compared to the deep learning-based CNN models. The detailed experimental analysis and results are shown by the authors with the publically available dataset. To improve the performance of the CNN-based pretrained model for P-Pneumonia detection, the author employs a histogram-based approach for radiograph enhancement, and later, the features of CNN-based pretrained models are passed into a stacked classification model for classification [24]. The authors reported more than 90% accuracy performance on the publically available dataset. The detailed experimental analysis of the proposed model with different network parameters and network structures is studied in detail by the authors. Since the dataset used in the study is highly imbalanced, there may be a possibility that the model has seen bias during training and testing a model. Thus, the proposed model evaluation in imbalanced data with detailed experimental analysis needs to be done to consider the proposed work as robust and generalizable on radiographs from new patients. An ensemble of DenseNet169, MobileNetV2, and Vision Transformer is done to detect pneumonia using chest radiographs [25]. The method showed 93.91% accuracy on the publically available P-Pneumonia dataset. The study reported an ensemble model can perform better than the individual model on P-Pneumonia detection. To support this statement, various experiments with different test cases of experiments done. The results are analyzed in detail and better interpretation is provided by the authors for the results on the P-Pneumonia detection. The model showed a better detection rate and it is better than the existing models in P-Pneumonia detection. The models such as VGG-16, VGG-19, ResNet-50, Inception-V3, Xception, MobileNet, and SqueezeNet were studied in detail to enhance the detection rate of P-Pneumonia [26]. The best-performed models such as ResNet-50, MobileNet, and Xception are ensembled and showed that the ensemble model showed better accuracy than the single model by reporting an accuracy of 90.71%. Since the dataset is imbalanced, the author employed a data augmentation approach in the data preprocessing stage. To handle an imbalance in the P-Pneumonia, authors propose a cost-sensitive deep learning approach by assigning larger class weights to the classes that contain fewer chest radiographs and lesser weights to the class that contains more chest radiographs [27]. The authors showed that the cost-sensitive deep learning approach avoids bias during the training of a model. Moreover, the model is robust to detect new data samples from different distributions of the training dataset of chest radiographs. A multichannel deep learning-based approach is proposed for lung disease detection and classification [28]. The author reports that the multichannel deep learning model showed better performances than the single-channel deep learning model on lung diseases such as CV-19, P-Pneumonia, and TBS.
A survey of the literature shows that the existing works are not effective in identifying the affected regions P-Pneumonia. For this, the current work employs attention to the feature maps of the CNN layer of deep learning. To maintain the balance between the performance of the P-Pneumonia detection model and the P-Pneumonia model scaling, the current work employs an improved model of EfficientNet namely EfficientNetV2. EfficientNetV2 has more than one model based on different scaling and the current work employs EfficientNetV2B0, EfficientNetV2B1, EfficientNetV2B2, and EfficientNetV2B3. Since there is more than one deep learning-based finetuned model for P-Pneumonia detection, the current work fuses the hidden layer features. The classification model is implemented by using the fused features of the deep learning model.
Proposed model for P-Pneumonia detection
The proposed architecture for P-Pneumonia detection is shown in Fig. 1. Detailed information on the proposed architecture is discussed in this section.

Proposed P-Pneumonia detection model.
In the input layer of the proposed model, the radiology images are transformed into specific dimensions according to the CNN-based pretrained models, and the pixels are converted into a range of 0-1 by applying normalization. Since the dataset is imbalanced, data augmentation approaches are employed during the training to balance the data samples in the classes of the P-Pneumonia dataset. A survey of literature on P-Pneumonia detection shows that the CNN-based models were employed for P-Pneumonia detection using C-Xray images. The CNN pretrained models are trained on the ImageNet database with 1,000 classes. The database contains a big dataset of natural images. These pretrained models are finetuned by the researchers on the P-Pneumonia dataset to enhance the generalization ability of the model towards detection and classification of P-Pneumonia. The finetuned models showed better performances in the literature for P-Pneumonia detection compared to the models that were trained from scratch. Because the pretrained models have rich feature representation of images. Finetuning the existing weights on the P-Pneumonia dataset is easier and extracts optimal features required to accurately detect the P-Pneumonia. The current work employs CNN-based pretrained models such as VGG16, Xception, ResNet50, InceptionV3, DenseNet121, MobileNet, and EfficientNet models including EfficientNetB0, EfficientNetV2B0, EfficientNetV2B1, EfficientNetV2B2 and EfficientNetV2B3 pretrained models for P-Pneumonia detection. These models contain an input layer, hidden layers, and an output layer. The hidden layers have more than one convolutional layer, pooling layer, fully connected layers, batch normalization, and dropout layers. The output layer contains a fully connected layer with two neurons such as Healthy and P-Pneumonia for classification.
The literature on P-Pneumonia detection and classification shows that the researchers considered studying the effectiveness of the CNN models mainly ImageNet-based pretrained models for P-Pneumonia. While developing a model, the main important features considered are accuracy, speed, FLOPs, etc. Researchers considered model scaling ability as the main important feature in developing the CNN model to achieve a better P-Pneumonia detection rate. The important ImageNet-based pretrained models that perform model scaling efficiently are DenseNet and EfficientNet. Both models have the capability to balance between the accuracy and the model size. Most importantly, these models perform better than the other pretrained models in various problems in classification. In addition, the EfficientNet models can achieve better performances than the DenseNet. ResNet-RS is a family of ResNet models that optimizes the hyperparameters to improve training efficiency. Transformer blocks in the vision transformer help to improve the training efficiency. Recently, the enhancement for the EfficientNet model was done by the authors, named EfficientNetV2 [29]. The smaller batch size of the images slows down the training process in EfficientNet. The EfficientNet model uses depthwise convolutions at the beginning layers during the training of a model. Though these operations result in fewer parameters compared to the standard convolutions, the depthwise convolutions fail to utilize modern accelerators. Compound scaling in the EfficientNet model fails to fully contribute towards achieving more training speed and fewer parameters. To avoid the bottlenecks of the EfficientNet model, the researchers introduced Fused-MBConv and training-aware neural architecture search (NAS) and scaling in the search space of the model. These techniques together help to jointly optimize the model accuracy, size of the network parameters, and training speed. Based on the model scaling by considering the accuracy, parameter efficiency, and other network parameters, the researchers introduced various models in EfficientNetV2. The models are EfficientNetV2B0, EfficientNetV2B1, EfficientNetV2B2, EfficientNetV2B3, EfficientNetV2S, EfficientNetV2M, and EfficientNetV2L. EfficientNetV2 is an improved model of EfficientNet architectures which has the capacity to achieve better performance than the EfficientNet for classification problems. In addition to the detection rate of P-Pneumonia by EfficientNetV2 model, the model requires a lesser number of parameters with less inference time. Generally, the model performance will be reduced when the number of parameters is less. However, the models of a family of EfficientNet achieve better performances by lowering the number of model parameters. Neural architecture search (NAS): Random search and reinforcement learning techniques are included in the NAS of the EfficientNetV2 model. Because the size of the search space is reduced by removing unnecessary operations related to pooling. These techniques help in choosing the optimal network structure and its parameters by considering the detection rate of P-Pneumonia. It prefers smaller kernel sizes and removes the last stride-1 stage in the EfficientNet. Model Scaling: An improvement to the compound scaling approach of the EfficientNet model is done to avoid memory issues when the size of the image is bigger dimension. Training: To improve the performances including the efficiency during training a model, the EfficientNetV2 model provides the guidelines to be followed to train a model, and several new regularization approaches were introduced in the EfficientNetV2 model. Progressive learning: The size of the C-X-rays increased steadily during training a P-Pneumonia model. The progressive learning strategy achieves better accuracy with less time during training a model. By considering the size of the image, the model adaptively changes regularization to avoid overfitting. This can be the ratio of dropout, magnitude of data augmentation, etc. Convolutions and building blocks of EfficientNetV2: The EfficientNetV2 model proposes an enhancement of MB Conv, called Fused-MB Conv. Fused-MB Conv operations were used in model selection from a search space. it can also replace depthwise convolutions. NAS dynamically searches for the best combinations of convolutional operations of MB Conv and Fused-MB Conv.
The models of EfficientNetV2 for P-Pneumonia detection are tested by following a non-uniform scaling property. Like a CNN model, the EfficientNetV2 is composed of the input layer, output layer, and more than one hidden layer. The hidden layer employs a convolution in convolution layer with a pooling approach for feature reduction followed by more than one fully connected layer. Using fully connected layers after the feature extraction using convolution and pooling layers might result in overfitting or this type of model may hinder the model’s generalization ability to detect P-Pneumonia of new radiography images. Since the dimension of the extracted features is high at the convolution layers of EfficientNetV2 models, there will be more than one fully connected layer. To avoid overfitting and increase the speed of the model, training, dropout, and normalization are placed between the fully connected layers. However, this type of model representation results in the loss of important features to detect P-Pneumonia. The current work employs a GWAP approach instead of global average pooling. It gives importance to the important features by assigning a weighted score. The dimensionality of the extracted features from the GWAP is high, the current work employs kernel-based PCA. In addition. the data sample of classes in the P-Pneumonia dataset has high inter-class and intra-class features. The kernel-based PCA maps the features into a high-dimensional space to separate the data samples of P-Pneumonia. RBF kernel is chosen randomly for PCA in this work. There can be other kernels that can give better performances. Detailed experiments on the different kernel functions can be studied for P-Pneumonia detection. The reduced feature representation of the EfficientNetV2 model is combined. The literature shows that feature fusion is studied in detail for medical image classification problems and there are many standard methods developed by the researchers. However, the current work employs a simple approach by concatenating the features of all the models together. A detailed analysis of the other advanced feature fusion approaches in P-Pneumonia detection in the proposed model can be considered as future work. Finally, the reduced features are passed into the stacked classifier.
The stacked classifier follows a two-stage approach, in which the first stage of the current work includes SVM and RFT for prediction of P-Pneumonia, and the second stage includes LRegr for classification using the first stages predictions of P-Pneumonia. This enhances the learning methodology of P-Pneumonia detection and achieves a better P-Pneumonia detection rate. In both the first and second stages, instead of employing the SVM, RFT, and LRegr, other classical machine learning classifiers can be employed. However, this work selects the kernel-based well-known approach SVM and RFT in the first stage, and LRegr in the second stage. SVM is a type of kernel method in machine learning. Researchers used SVM for classification and regression problems in signal and image processing, natural language processing, and computer science. Mainly, SVM is a well-known and most commonly used algorithm for classification challenges. SVM maps the P-Pneumonia dataset features into an N-dimensional plane and identifies a hyperplane to distinctly classify the data samples of P-Pneumonia. The extreme points in the hyperplane are called support vectors. The hyperplane is chosen based on the maximum distance between data points of the classes of P-Pneumonia. Using support vectors, the margin of the classifier can be maximized. The hyperplane dimension is the same as the number of features of the dataset. The optimal performance of SVM relies on the kernel and since the features of P-Pneumonia are highly non-linearly separable, the current work employs the rbf kernel. RFTree is an ensemble of decision trees. The decision tree outputs a prediction and based on the voting approach the RFTree prediction is made. RFTree maintains a low correlation with the ensemble of the decision trees. This feature of the ensemble method helps to obtain higher accuracy instead of a single decision tree classifier. Logistic regression fits an S-shaped logistic function i.e. sigmoid which predicts either 0 or 1 by considering the independent variables as input. The most commonly used classical machine learning algorithms for classification problems are SVM, RFTree, and LRegr. These three algorithms have parameters and the performance of these models highly depends on the optimal parameters. To achieve a better detection rate of P-Pneumonia, the current work runs experiments for parameters of SVM, RFTree, and LRegr. The optimal parameters of SVM are 0.01, 3500, 2, and RBF for tolerance, max_iter, C, and kernel respectively. The random state is identified as 50 for both SVM and RFTree. The optimal parameters of RFTree are 200 and 200 for n_estimators and max depth respectively. The iterations are set to 150 in LRegr and other parameters such as C and tolerance set to 1.5 and 0.001 respectively.
This work considers the dataset developed by Kermany et al. for evaluating the performance of the proposed model in P-Pneumonia detection [9]. The dataset is developed at Guangzhou Women and Children’s Medical Center, Guangzhou. This dataset is collected from children of age from one to five years using a retrospective cohort. The radiography imaging of C-XRay was done as part of patients’ regular visits to clinical care. The dataset contains radiography samples of Healthy and P-Pneumonia in C-Xray format. During the development of the database, the authors used various scanning devices. These devices are operated by different users. Thus, the database contains radiography images of different quality. The detailed statistics of the dataset are shown in Table 1. Since the data distribution of the Health and P-Pneumonia are not balanced, the proposed work employs data augmentation methodologies to avoid bias during the training phase. Figure 2 shows the C-X-rays of the patients of the P-Pneumonia dataset. The samples are randomly chosen from the dataset. In the beginning, analysis was done on the dataset to understand the characteristics of the dataset. Figure 2 clearly shows that the similarity between the patients of the Normal and P-Pneumonia is high in both intra-class and inter-class. The analysis and classification study manually by the expert radiologist is a daunting task and misclassification can be high. To support and automate the classification process, the current work employs a deep learning model. The classification results of the proposed model can be further analyzed by the radiologist in P-Pneumonia detection. This type of approach improves the detection accuracy of P-Pneumonia and speeds up the P-Pneumonia detection and classification. The proposed model employs an improved EfficientNet-based deep learning model for feature extraction and the features are further passed into the stacked machine learning classifier to accurately differentiate between the Normal and P-Pneumonia patient data samples.
C-X-rays dataset of P-Pneumonia
C-X-rays dataset of P-Pneumonia

Radiography C-Xray samples of Healthy and P-Pneumonia in Guangzhou Children dataset.
With the aim to show that the proposed method is robust and generalizable for newer datasets in the healthcare and medical environment, the current work considers another dataset from VinDr [30]. The dataset can be downloaded from the PhysioNet website. The annotation is done by an expert radiologist and along with the label for the radiology image, the database contains bounding box information for the infected regions of the radiology image. This dataset was collected at the Phu Tho Obstetric & Pediatric Hospital (PTOPH) in 2020. The current work randomly chooses 1,000 data samples of P-Pneumonia from VinDr.
The proposed model performances are evaluated on other lung diseases such as CV-19 and TBS. This type of detailed experimental analysis study is done in this work to show that the proposed method performs well on more than one dataset. This type of experiment and results help that the proposed tool can be used as an early disease diagnosis at healthcare centers. This can assist healthcare experts and enhance the detection rate of lung diseases. For each lung disease in the current work, the proposed method performances are shown and compared with the existing studies.
The CV-19 database of the current work is taken from Mendeley [31]. It contains C-X-rays samples for CV-19 and Non-CV-19 patients. The training dataset of the CV-19 disease contains 2,000 C-X-rays samples for Healthy patients and 2,000 C-X-rays samples for CV-19 patients. The testing dataset of CV-19 disease detection data set contains 977 C-X-ray samples for healthy patients and 1,315 C-X-ray samples for COVID-19 patients. During the training in CV-19 disease detection, a validation dataset was used to set the hyperparameters. This dataset contains 1,000 C-X-ray samples for healthy patients and 1000 C-X-ray samples for CV-19 patients.
In the literature, there are datasets for TBS, mainly the database contains samples of C-X-rays. Most of the existing datasets contain a lesser number of data samples with classes such as Healthy and TBS. The current work considers the dataset from the authors Liu et al. [32]. This database contains the C-X-ray samples of patients that are categorized into Normal, Sick but Not TB, and TBS. The annotations were done by expert radiologists. The database and its annotation were validated by a team of expert radiologists. The training dataset of TBS disease detection contains 2,600 C-X-ray samples for Healthy patients, 2,700 C-X-ray samples for Sick but not TBS patients, and 580 C-X-ray samples for TBS patients. The TBS disease detection testing database contains 1,133 C-X-ray samples for Healthy patients, 1,000 C-X-ray samples for Sick but not TBS patients, and 219 C-X-ray samples for TBS.
Figure 3 shows the C-X-rays of the patients of the CV-19 dataset. The C-X-rays of the TB database for each of the classes are shown in Fig. 4. The samples shown in the figures are randomly chosen from the datasets. These figures show that the similarity among the classes in both datasets is high. The samples of the classes contain high intra-class and inter-class characteristics. The proposed tool can make decisions at the early diagnosis and further, the radiologist can investigate the decision manually. This type of system increases the accuracy of disease detection and lessens true negatives and false positives.

Radiography C-Xray samples of Healthy (left to right: first two images) and CV-19 (left to right: last two images).

Radiography C-Xray samples of Healthy (first row), Sick but not TB (second row), and TB (third row).
The highest value of Accuracy, Precision, Recall, and F1-score is 1 for a good P-Pneumonia detection and classification model. The Accuracy, Precision, Recall, and F1-score are estimated using the true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The TP, FP, TN, FN, Accuracy, Precision, Recall, and F1-score is defined as follows
Accuracy statistical metric is defined as the ratio of correct predictions out of all predictions made by the proposed P-Pneumonia detection and classification model. Since the Accuracy metric weights equal importance for the classes present in the P-Pneumonia dataset, the current study considers Precision, Recall, and F1-score are the good statistical metrics to evaluate the performance of the proposed model.
The precision statistical metric gives the true predictions out of all the positive predictions outputs by the proposed P-Pneumonia detection and classification model. The high precision of the proposed model for P-Pneumonia detection and classification depends on the false positive rate.
Recall is a true positive rate or sensitivity. It gives the number of correct predictions out of all the data samples predicted as true. The high recall of the proposed model for P-Pneumonia detection and classification depends on the false negatives. The model shows less recall if the model outputs high false negatives.
F1-score combines recall and precision. It can be used to measure the trade-off between precision and recall. It results in zero if either precision or recall is zero.
The proposed P-Pneumonia detection and classification model can be considered good if the values of precision, recall, and f1-score is closer to 1. The TP, FP, TN, and FN are defined as follows TP: The P-Pneumonia model identifies P-Pneumonia patients as P-Pnemonia. TN: The P-Pneumonia model identifies Normal patients as Normal. FN: The P-Pneumonia model identifies P-Pneumonia patients as Normal. FP: The P-Pneumonia model identifies Normal patients as P-Pneumonia.
The values of TP, TN, FP, and FN were identified using the confusion matrix. It is a matrix representation of the true predictions versus the predictions made by the proposed P-Pneumonia model. The dimension of the matrix depends on the number of classes of the dataset. The current work reports the performance of the model for individual classes of P-Pneumonia and an average of all the classes of P-Pneumonia. The precision, recall, and f1-score metrics are reported for both weighted and macro. Since the dataset of the current study is imbalanced, the macro metric is chosen over the weighted. Because the weighted metric assigns an average weight of classes of P-Pneumonia.
The current work uses online GPU-based platforms to implement and run machine learning and deep learning models. The GPU-based platform is accessed on the Kaggle data science website. The platform provides GPU P100 with 16 GB GPU memory, 13 GB CPU RAM, and a 73.1 GB hard disk. The current work implements the machine learning models using scikit-learn and deep learning models using Keras and TensorFlow.

Training accuracy and Training loss of the CNN-based models for P-Pneumonia detection (left to right).
This work employs pretrained models based on CNN architecture in deep learning for P-Pneumonia detection and classification. The ImageNet-based pretrained models are Xception, VGG16, ResNet50, InceptionV3, DenseNet121, EfficientNetB0, EfficientNetV2B0, EfficientNetV2B1, EfficientNetV2B2, and EfficientNetV2B3. The optimal performance of P-Pneumonia detection and classification of the pretrained model depends on the parameters and structure of the network in deep learning. To achieve a better P-Pneumonia detection rate, the current work experiments with different networks with network parameters and network structures. The C-X-rays of P-Pneumonia patients were shuffled during training, validation, and testing procedures. In the beginning, this work runs experiments with a model of a reasonably less in size for P-Pneumonia detection and classification. The model contains an input and output layer. The input layer takes C-X-rays as input and an output layer results in either P-Pneumonia or Normal in P-Pnumonia detection. Between the input layer and the output layer, the model contains several hidden layers. The hidden layers are composed of a Convolution layer, a pooling layer, and a fully connected layer. The experiments are run for parameters of the CNN network such as epochs, learning rate, optimizer, and batch size. The ImageNet-based pretrained models are loaded and the weights are updated based on the C-X-ray samples of P-Pneumonia during training. The optimizers considered are Adamax, SGD, Adagrad, Adadelta, and adam. The experiments were run for each of the optimizers. Optimizer SGD showed better performances compared to other optimizers for P-Pneumonia detection. The learning rates are 0.1, 0.001, 0.0001, and 0.00001. The experiments of lower learning rate training speed are less compared to the higher learning rate. Experiments with a learning rate of 0.0001 attained a better P-Pneumonia detection rate. The performance of 0.0001 is better than the other learning rates in the experiments of training, validation, and testing with the P-Pneumonia dataset. A learning rate lesser than 0.0001 showed better performances. Since the improvement rate is less than 0.1% percentage of accuracy, the current work didn’t reduce the learning rate to lower rate from 0.0001. This is mainly due to the reason that lower the rate, the speed of the model during training is reduced. To balance between the time for the model training and the accuracy, the current work considers the learning rate 0.0001. To identify the optimal parameter for the batch size, the experiments were run with different batch sizes such as 32, 64, 128, and 256. The experiment with batch size 128 showed good performances compared to other batch sizes. When the batch size is increased from 128 to 256, the model required more memory. To balance the memory and the performance of the model, the current work considered a batch size of 128. Finally, to identify the optimal value for the epoch parameters, the current work considers running experiments training the P-Pneumonia detection and classification model until 60 epochs. The model’s accuracy remains the same after 33 epochs for P-Pneumonia detection and classification. To avoid overfitting and bias during the training of CNN-based pretrained models, the experiments were stopped at epochs 35. The detailed training accuracy and training loss are shown in Fig. 5. At the end of epoch 35, the models reached 95% training accuracy during the training of a P-Pnuemonia detection and classification. Most importantly, the model’s accuracy reached 90% at the end of epoch 16. During training, the models improved the accuracy and loss over the epochs. Most importantly, the experiments with the EfficientNet models showed higher accuracy and less loss. The performances of these models are higher compared to all the ImageNet-based deep learning models. Experiments with EfficentNet models showed successive improvement of accuracy and loss across the epochs during training and validation. The experiments of CNN-based pretrained models have reached a training loss of less than 0.1 at the end of epoch 37. Most of the models showed less than 0.2 training loss after epochs 10. Most importantly, the models belonging to a family of EfficientNet have shown an improvement in training loss across epochs from 1 to 37. In particular, EfficientNetV2B0 model is almost closer to 0 at the end of epochs 37.
The training accuracy and training loss of the CNN-based finetuned models for CV-19 detection is shown in Fig. 6. During training, the loss of the model is validated using the validation dataset. The validation dataset helps to improve the accuracy of the unseen C-X-ray samples and avoids overfitting and underfitting the model. The EfficientNetV2B1 model showed better training accuracy and training loss compared to all the other finetuned models for CV-19 detection. It showed more than 95% training accuracy at epoch 10 with a loss of less than 0.1. The model showed performance improvement across the epochs till 40. Other models such as EfficientNetV2B0 and EfficientNetV2B1 showed similar performances as EfficientNetB1, however, the figures clearly show there is an improvement with the EfficientNetV2B1 model in both the training accuracy and training loss for CV-19 detection. The model EfficientNetB0 showed lesser training accuracy and higher training loss compared to the improved EfficientNetV2 models for CV-19 detection.

Training accuracy and Training loss of the CNN-based models for CV-19 detection (left to right).
The proposed model performances are evaluated on TBS in this work. The performances of the training accuracy and training loss are shown in Fig. 7. The model performances are improved across the training accuracy and training loss. Both the loss and accuracy during training are monitored with the validation dataset. The EfficientNetV2B1 model reached more than 95% accuracy with less than 0.1 training loss by epochs in the range 15 to 20. Models such as EfficientNetV2B0 and EfficientNetV2B2 show similar performances to EfficientNetV2B1. However, the EfficientNetV2B1 model slightly showed improved performances in both accuracy and loss during training across epochs 40. The EfficientNetB0 model showed less accuracy and high loss compared to all the other EfficientNetV2 models. The figures for training accuracy and training loss clearly show that the EfficientNetV2B1 model is considered to be good for TBS detection compared to all the other models.

Training accuracy and Training loss of the CNN-based models for TBS detection (left to right).
The finetuned model of P-Pneumonia detection learnable, non-learnable, and total parameters: 14715201, 0, and 14715201 for VGG16, 20809001, 54528, and 20863529 for Xception, 23536641, 53120, and 23589761 for ResNet50, 6954881, 83648, and 7038529 for DenseNet121, 21770401, 34432, and 21804833 for InceptionV3, 4008829, 42016, and 4050845 for EfficientNetB0, 6514465, 62048, and 6576513 for EfficientNetV2B0, 7702403, 67568, and 7769971 for EfficientNetV2B1, 10697769, 87296, and 10785065 for EfficientNetV2B2, and 17550409, 125200, and 17675609 for EfficientNetV2B3. The learnable, non-learnable, and total parameters of the finetuned models for P-Pneumonia detection are high. Thus, the models require good computing platforms to finetune the models and evaluate the models on the testing dataset. The CNN-based finetuned model performances in P-Pneumonia detection are shown in Table 2.
Detailed results for P-Pneumonia detection
During testing, the fintuned models, namley VGG16, Xception, ResNet50, DenseNet121, InceptionV3 showed an accuracy of 85%, 86%, 87%, 92%, and 88% respectively for P-Pneumonia detection. The EfficientNet models such as EfficientNetB0, EfficientNetV2B1, EfficientNetV2B2, and EfficientNetV2B3 P-Pneumonia detection accuracy are 96%, 97%, 98%, and 99% respectively. The number of data samples of the P-Pneumonia dataset is not balanced, the current work considers the macro metrics of Precision, macro metric Recall, and macro metric of F1-score. The macro metric is insensitive to the imbalanced dataset. The classes in the datasets are considered the same. In addition to the macro metric, weighted metrics of Precision, weighted metric of Recall, and weighted metric of F1-score are reported for P-Pneumonia detection. The experiments with the finetuned EfficientNet models improved the performances of the P-Pneumonia detection of the existing ImageNet-based models. The models showed accuracy, and macro and weighted average of Precision, Recall, and F1-score of 94% to 98%. The family of EfficientNetV2B1 models outperformed the other existing models for P-Pneumonia detection mainly the EfficientNetB0 model. The fintuned models, namley VGG16, Xception, ResNet50, DenseNet121, InceptionV3 showed lesser performances than the EfficientNet models during testing and validation. Most importantly, the ResNet50, DenseNet121 and InceptionV3 model performances in terms of macro metric precision, recall, and f1-score is more than 5% which is lesser compared to the testing results of the EfficientNet models. Both Xception and VGG16 showed lesser performances for P-Pneumonia detection compared to ResNet50, InceptionV3, and DenseNet121.
The proposed model enhances the performance of the family of EfficentNetV2 models by 3% for macro precision, 3% for macro recall, and 3% for macro f1-score. Since the dataset is highly imbalanced, the current study considers the macro metrics of precision, recall, and f1-score for better comparison of the propose methods. Since each of the EfficnetNetV2 models have the capability to learn its own feature representation to detect accurately P-Pneumonia, the current work fuses the features extracted from the models. The results showed that the fused models of features of EfficnetNeV2 showed better performances compared to single models of EfficientNet. The proposed P-Pneumonia detection model performed better in P-Pneumonia detection and classification compared to the other deep learning-based models with a performance improvement of 1% in macro metrics of Precision, Recall, and F1-score. The model scaling ability in the EfficientNet with the better search for the parameters in the model helped to extract the important features to distinguish between the C-X-rays samples of Normal and Pneumonia. The ensembling classifier with the concatenation of the features of more than one EfficientNet finetuned model improved the accuracy of the classifier for P-Pneumonia detection. Mainly, the model helped to achieve better accuracy with good generalization capability. The proposed model showed good accuracy on the dataset that was from different patients. This indicates the proposed method is robust and generalizable to the dataset from different patients in the healthcare and clinical environment.
Table 2 provides the confusion matrix of the models for P-Pneumonia detection. The proposed P-Pneumonia detection model is good compared to the other finetuned models in each class of the P-Pneumonia dataset. The proposed finetuned P-Pneumonia detection model misclassified 5 samples of Healthy C-Xrays as P-Pneumonia and 4 samples of P-Pneumonia as Healthy. The models from the family of EfficientNet models have misclassification higher in the ranges of 30-55 in total for both the classes such as Healthy and P-Pneumonia. The EfficnetNetB0 model showed a higher misclassification rate for both classes compared to the EfficnetNetV2 models in P-Pneumonia detection. The other models such as Xception, VGG16, ResNet50, InceptionV3, and DenseNet121 misclassified more than 100 C-Xrays altogether from classes Health and P-Pneumonia. VGG16, InceptionV3, Xception, DenseNet121, ResNet50 finetuned models misclassification rates are 15%, 12%, 14%, 8%, and 13% respectively. The model DenseNet121 showed less misclassification rate compared to the models such as VGG16, InceptionV3, Xception, and ResNet50. The model scaling property of DenseNet121 can be the reason towards attaining less misclassification rate. In the current work, the detailed study towards the model scaling proeprty is considered. The detailed analysis and study of the improved models in EfficientNet are employed for P-Pneumonia detection. In general, the proposed P-Pneumonia detection model showed less misclassification rate in each classes of the P-Pneumonia dataset compared to all the experiments of the other finetuned models.
The detailed performances of the proposed model at each class in P-Pneumonia detection are reported in Table 3. The proposed finetuned model performances in terms of macro precision, recall, and f1-score for class Healthy are 98%, 98%, and 98% respectively on P-Pneumonia detection. The precision, recall, and f1-score of the proposed model for the class P-Pneumonia are 99%, 100%, and 100% respectively. The result at each class level of P-Pneumonia detection shows that the Healthy class has a higher misclassification rate compared to the P-Pneumonia. A detailed study can be done to understand the reason behind the misclassification and this work can be considered future work.
Detailed results of Normal and P-Pneumonia classes
To show that the proposed model is robust and generalizable, the proposed model performances trained on the P-Pneumonia dataset are evaluated on the dataset from different environments. The model is trained using the Guangzhou Children dataset and evaluated the trained model performances on the VinDr Children dataset. The data samples are completely disjoint and unique in the datasets such as Guangzhou Children and VinDr Children. The detailed performances of the trained models of CNN on VinDr are reported in Table 2. The VinDr dataset contains 612 P-Pneumonia C-Xrays.
The finetuned model performances on the testing dataset are shown for CV-19 in Table 4. The EfficientNetB0 model showed 91% accuracy and EfficientNetV2B0, EfficientNetV2B1, and the proposed model showed 95%, 97%, and 99% respectively for CV-19 detection using C-X-rays. The proposed model showed 99%, 97%, and 98% as macro precision, a macro recall, and macro F1-score respectively. The weighted precision, weighted recall, and weighted F1-score of the proposed model are 99%, 99%, and 99% respectively. The proposed model showed better performances compared to all the other models. The improved models of EfficientNetV2 showed better performances than the EfficientNetB0 model. The proposed model showed good performances for CV-19 detection and the model performances almost remain the same as P-Pneumonia detection.
Detailed results for CV-19 detection
Experimental results of the testing dataset for TBS detection are reported in Table 5. The proposed method performed better than the existing methods with an accuracy of 96%. It showed a performance improvement of accuracy of 3%, 5%, and 9% than EfficientNetV2B1, EfficientNetV2B0, and EfficientNetB0 respectively. The EfficientNetB0 model showed less accuracy, precision, recall, and F1-score compared to all the models. The proposed model showed 96% macro and weighted precision, 96% macro and weighted recall, 96% macro, and weighted f1-score. The proposed method’s performance in terms of precision, recall, and f1-score for TBS detection is higher compared to all the other models. The proposed model performed well but still, the performance can be improved in TBS detection. However, the proposed model almost showed similar performances as shown in the detection of P-Pneumonia and CV-19.
Detailed results for TBS detection
VGG16, DenseNet121, ResNet50, Xception, and InceptionV3 models showed a high misclassification rate on the VinDr dataset with misclassification of 51, 32, 37, 42, and 38 data samples respectively. In general, the DenseNet121 finetuned model performed better compared to the VGG16, ResNet50, Xception, and InceptionV3 models with the misclassification of 32 C-X-rays on the VinDr dataset. EfficientNet models showed a less misclassification rate of patient C-X-rays. The C-X-rays samples 10, 11, 8, and 0 are misclassified by the EfficientNetB0, EfficientNetV2B1, EfficientNetV2B2, and EfficientNetV2B3 respectively. The proposed P-Pneumonia detection and classification classified all the samples correctly for the VinDr dataset. The misclassification rate shown by the family of EfficientNet models is less compared to the other models such as Xception, VGG16, ResNet50, InceptionV3, and DenseNet121. The results reported in Table 2 show that the proposed model showed better performance on the completely unseen C-Xray. The results of the proposed P-Pneumonia model on more than one testing dataset show that the model achieves generalization and is robust in identifying the P-Pneumonia on data samples collected from similar patients.
In the current work, the author presents a hybrid deep learning-based approach for P-Pneumonia detection. The model employs EfficientNetV2 models for feature extraction and stacked classifiers for classification. More than one EfficientNetV2 model is trained on the P-Pneumonia benchmark datasets. With the aim to focus on the infected regions of P-Pneumonia radiography images, GWAP is employed in the penultimate layer of EfficientNetV2 models. The features of GWAP are reduced by employing K-PCA. The reduced feature representation of each model is unique and disjoint. The current work combines the features and pass into the stacked classifier. The stacked classifiers support large-scale learning by identifying a separating line between the classes of P-Pneumonia in a high-dimensional plane of fused features. In the first stage, the proposed model employs SVM and RFT that helps to predict the P-Pneumonia, and later LRegr is employed on the predictions of the SVM and RFT to accurately detect the presence of P-Pneumonia in the radiography images. The proposed approach improved the existing model accuracy by 4% and in all the experimental settings, the proposed approach performed better than the other deep learning-based models. The model’s performance on the completely unseen datasets for P-Pneumonia detection is similar and this indicates that the model is robust and generalizable on the new datasets of radiography images in P-Pneumonia detection in healthcare and medical environments. Both the deep learning model and the stacked classifiers are disjoint in the current proposed P-Pneumonia detection model. There may be a chance that the performance can be increased by proposing a loss function to integrate the deep learning-based feature extraction with a stacked classifier. This can be considered future work. In the current work, there may be a chance that the important features can be removed during dimensionality reduction, and a detailed analysis of the various dimension of features is required to understand the effectiveness of features. This type of analysis will be considered one of the significant directions toward future work. The detailed analysis of the proposed approach in an adversarial environment is not studied in the current work. This type of testing is required to show that the model is robust against adversaries. Recent literature presents an improvement in adversarial deep learning and there may be a possibility that these models can bypass the proposed model in P-Pneumonia detection and classification. The current study can be further enhanced in detecting P-Pneumonia by integrating the clinical features into the radiography images.
