Abstract
BACKGROUND:
Chest X-ray images are widely used to detect many different lung diseases. However, reading chest X-ray images to accurately detect and classify different lung diseases by doctors is often difficult with large inter-reader variability. Thus, there is a huge demand for developing computer-aided automated schemes of chest X-ray images to help doctors more accurately and efficiently detect lung diseases depicting on chest X-ray images.
OBJECTIVE:
To develop convolution neural network (CNN) based deep learning models and compare their feasibility and performance to classify 14 chest diseases or pathology patterns based on chest X-rays.
METHOD:
Several CNN models pre-trained using ImageNet dataset are modified as transfer learning models and applied to classify between 14 different chest pathology and normal chest patterns depicting on chest X-ray images. In this process, a deep convolution generative adversarial network (DC-GAN) is also trained to mitigate the effects of small or imbalanced dataset and generate synthetic images to balance the dataset of different diseases. The classification models are trained and tested using a large dataset involving 91,324 frontal-view chest X-ray images.
RESULTS:
In this study, eight models are trained and compared. Among them, ResNet-152 model achieves an accuracy of 67% and 62% with and without data augmentation, respectively. Inception-V3, NasNetLarge, Xcaption, ResNet-50 and InceptionResNetV2 achieve accuracy of 68%, 62%, 66%, 66% and 54% respectively. Additionally, Resnet-152 with data augmentation achieves an accuracy of 83% but only for six classes.
CONCLUSION:
This study solves the problem of having fewer data by using GAN-based techniques to add synthetic images and demonstrates the feasibility of applying transfer learning CNN method to help classify 14 types of chest diseases depicting on chest X-ray images.
Keywords
Introduction
Chest-related diseases will become a significant concern in the coming years due to an increase in air pollution, which causes the development of these diseases [1]. These diseases do not affect only a single organ but can cause problems in other organs, as some of them will cause oxygen levels to drop. Most convenient and the first method to detect these diseases is through X-ray images, which are read and interpreted by radiologists. With the increase in the number of patients in the medical system, they can easily become overwhelmed. This burden can be eased by automating the process through the latest image processing and pattern recognition tools. There has been ongoing research that tries to solve this problem by using computer-aided detection or diagnosis schemes using deep learning neural networks aiming to help radiologists in reading and interpreting chest X-ray images [2, 3].
A neural network tries to mimic the inner workings of the human brain toward a specific task. Just as a human brain, it consists of neurons that are modeled based on mathematical equations. These equations have parameters in which the neural network tries to approximate to make the correct prediction. The neural network trains a set of algorithms on existing data to perform clustering or classification tasks [4, 5].
Imbalancing of dataset is a common problem when training a neural network. Data imbalancing causes a model to be biased toward classes with majority data., thus affecting the accuracy of the model. While reducing the data of each class to the size of class with minimum data seems like a sound remedy, this process causes another problem. However, scholars have discovered that CNN models used for classification requires huge amounts of data. CNN models train for large sample sizes and return high levels of accuracy. To solve the problem of fewer data, new data is generated through the deep convolutional generative adversarial network (DCGAN), which synthesizes images. GANs are the networks that generate new images by learning from previously fed images.
This research solves the problem of chest pathology classification via convolutional neural networks (CNNs). Models used were not built from scratch, as the researchers utilized built-in models due to computational restraints. This not only saves time and computational power but also works better when the dataset available is small. Pre-trained models used in this study were NasNetLarge, Xception, InceptionV3, InceptionResNetV2, ResNet50 and ResNet-152. Pre-trained models are trained on greater data, which is specifically related to the problem being solved, i.e., the ResNet152 model was first trained on the ImageNet dataset. To further enhance the models to train for chest x-rays, we modified classifications according to our needs. Thus, the primary contribution of this research includes following: Fine-tuning pre-trained models by adding more layers for the classification of 14 chest diseases and normal chest x-ray images. Mitigating the effect of the imbalanced dataset by generating x-ray images using DCGAN. Comparing different pre-trained models and the different number of classes to get the best model for lungs disease detection using X-rays.
The rest of the paper is arranged into five sequential sections. A brief literature review of works in the area of chest disease detection is presented in Section 2. Next, Section 3 explains the dataset used for the current research. In addition to stating the parameter values, Section 4 discusses the models used for the experiments carried out by the investigators. After the fourth section, Section 5 analyzes the study outcomes. Finally, concluding thoughts and points are presented in Section 6.
Related work
There is a rich body of literature regarding chest disease detection within the medical field. Maintaining an impressive accuracy of 86.14%, Sivasamy et al. [4] predicted chest diseases using chest X-rays using the Keras framework. Rajpurkar et al. [6] detected pneumonia from frontal-view X-rays using binary classification. Salehinejad et al. [7] used a deep convolutional generative adversarial neural network (DCGAN) to generate artificial chest x-rays. Irvin et al. [8] identified 14 classes of chest diseases using different neural network models. Apostolopoulos et al. [9] used five different pre-trained models to detect COVID-19 from chest X-rays. Pardamean et al. [10] applied transfer learning on mammogram data for breast cancer detection. Bhandary et al. [11] worked on detecting lung abnormalities from chest X-rays and CT scans using pre-trained models. Wang et al. [12] worked with localization of thorax disease using a deep convolutional neural network.
In addition, several researchers also analyzed the segmentation of lungs and other organs for better classification of chest diseases. For example, Wang et al.’s [13] work on segmenting the ribs from chest X-rays helped classify chest diseases. Wang et al. [14] yielded better feature extraction for segmentation by proposing a unified end-to-end trainable deep neural network. Zhao et al. [15] localized organs and then segmented them using the voxel-wise label map. Using this method, Zhao and colleagues concluded segmentation models work good on large organs but mostly fail in classifying small organs.
Overall, previous research obtained state-of-the-art results by using promising techniques with only a few chest diseases or diseases having data present in large quantities. Table 1 provides an overview of the literature by detailing utilized methodology, obtained findings, and identified scholarly gaps for each of the referenced articles. This table also provides insight into the scholarly gap which the current study is attempting to address. As previously stated, the main purpose of this study is to propose a new classification procedure to identify chest diseases based on chest X-rays. Furthermore, this study uses GAN to mitigate the effects of less and imbalanced data, implementing and tuning pre-trained models to get optimal results and best accuracy.
A comparison of the chest disease diagnoses
A comparison of the chest disease diagnoses
A deep learning model requires a vast amount of data for training. The effectiveness of deep learning models enhances with large data sets. Having few data points or imbalanced data can cause the model to under-fit or overfit in specific classes. The data set used in this research was taken from NIH [17], which consisted of 14 underlined diseases related to the chest. As shown in Fig. 1, the NIH dataset contained 91,324 frontal-view CXRs images across 32,717 patients. These images included X-rays with no disease as well as some with disease detection. As presented in Table 2, majority of the data belonged to six classes of chest-related (Thoracic) diseases. A more complete breakdown of the NIH dataset is shown in Table 3, which provides a distribution of the data across all 14 chest-related disease. Additionally, the NIH dataset analyzed in this study included samples across training, testing, and validation in ratios of 70%, 20%, and 10% respectively.

A bar chart showing 14 underlined diseases related to the chest addressed by this study.
Six classes of chest related diseases in the dataset used
14 classes of chest related diseases in the dataset
Images having no disease are the majority (n = 60,361) of the data. Therefore, a significant imbalance in the data for proper training exists, as this might cause the model to over-fit or under-fit for the specific class of “no disease.” This problem was resolved by using additional data from the Kaggle challenge [18] and data generated through image augmentation.
To analyze the data, distributions on the basis of age and gender were done for all chest diseases, which can be seen in Fig. 2. Normal images have more frequency than images having conditions. Among disease case count, infiltration had the maximum frequency of cases (n = 9,547) whereas hernia had the minimum frequency (n = 105).

Gender and age distribution of chest related disease.
Give the observed examination that the NIH dataset was highly imbalanced, we deduced that: Infiltration, Effusion, Atelectasis, Nodule, Consolidation, Pleural Thickening, Emphysema, and Hernia affected more males than females. Pneumothorax, Cardiomegaly, Edema, and Fibrosis affected more females than males. Cardiomegaly starts affecting people from age ten and is common among people aged 35 and 65, with a median age of 50. Effusion has the same trend as cardiomegaly but has a median age of 55. Atelectasis affects people ages 25 years old or older. There is only one case in the data which appeared to be an outlier. Mass affects people from a younger age, and the number of cases increases with increasing age. Emphysema primarily affects people ages 40 or older. Cases of individuals under age 40 can be outliers. The available dataset is highly imbalanced as individuals having no chest disease are at 60,361.
Above deductions make the training of the NN model very difficult, as it can cause over-fitting. To mitigate this effect, DCGAN was used to generate images.
The NIH dataset was divided into training and testing. Each of these two sample types then contained 15 folders regarding types of X-rays, one for each chest-related disease and one for normal images. Validation data was extracted during the model training process. Before using the images, the X-rays were pre-processed by reshaping them into 150, 150, and 3 using nearest neighbor interpolation and then normalizing. To better train the model, data augmentation was applied and images were rotated. Parameters for data augmentation were as follows: rotation range = 40, width shift range = 0.2, height shift range = 40, shear range = 0.2, zoom range = 0.2, horizontal flip = true, and fill mode = nearest. This data augmentation was applied during training on both the already existing and generated images.
GANs are the neural networks that generate new images by learning from previously given images. They consist of two parts –(a) a convolutional network (discriminator) that extracts the learned features from the images, and (b) a deconvolutional network (generator) that tries to generate images on basis of features learned through the convolutional network. The GAN creates new images by analyzing the error generated when comparing the original images to the images generated by the deconvolutional network. Figure 3 shows the basic working diagram of GANs.

Basic architecture of Generative Adversarial Network (GAN) model.
DCGAN was proposed by Radford et al. [16]. A summary of the DCGAN architecture used is shown in Fig. 4. Although the same architecture was used as in the original paper, the parameters were tuned using GAN hacks [19, 20]. During the training, Batch size was taken 64 with hyperparameters for the generator (α= 0.0002, β1 = 0.5) and for the discriminator (α=0.00015, β1 = 0.5). Here, α represents the learning rate and β1 symbolizes the exponential decay rate for the 1st-moment estimates. The different numbers of images were generated for different classes to balance the dataset. The number of images used for each class was 5,000. Consequently, the number of images generated was different for some classes. For example, the Infiltration class had 9,547 images; therefore, only 5,000 images were used and extra images were not used. On the other hand, there were 3,955 images present for the Effusion class, which resulted in 1,045 generated images. This was done because we were precautious in using more images generated by DCGAN. Although the images were similar to the original ones, there was still room for error. This dataset was rationally divided further into 3 sets: 70% for training, 20% for testing, and 10% for testing.

Summary of discriminator model (Up), Summary of generator model (Down).
Tables 2 3 show the distribution of all X-ray images across the six classes and all 14 classes before data augmentation respectively. While Tables 4 5 show the class distribution after data augmentation.
Six classes of chest related diseases in the dataset used after data augmentation
14 classes of chest related diseases in the dataset used after data augmentation
Pre-trained models were previously trained on other dataset, which gave the advantage of less training time when the same model is trained on some other dataset. Pre-trained models used by us were NasNetLarge, Xception, InceptionV3, InceptionResNetV2, ResNet50 and ResNet-152. These models were pre-trained on the ImageNet dataset. This dataset classifies 1,000 classes. These models were used because mostly it is said that the deeper model works better. This was true for AlexNet and VGG, however, the deeper networks were difficult to optimize due to a gradient exploding/vanishing problem. The models used in this research were based on architectural engineering. In the case of ResNet, the model t employed residual learning and input from previous layers that were used in preceding layers. InceptionNet applied filters of different sizes on output from previous layers and feeds it to next layers, thus making the network deeper and wider. In the end, NasNet worked as an AutoML network that found the best CNN architecture for the problem at hand.
These models were modified and fine-tuned by adding additional layers to meet the need of new NIH dataset. Table 6 shows the additional layers added to pre-existing layers in the pre-trained neural networks.
Six classes of chest related diseases in the dataset used
Six classes of chest related diseases in the dataset used
For the sake of classification, models used were ResNet-152, Inception-V3, Inception ResNet V2, Xception, NasNetLarge and ResNet-50. Pre-trained modules of these models have shown promising results. We trained these models on the collected and generated data for the purpose of CRXs classification. In all, each model was trained for 50 epochs. Table 6 shows the additional layers added to the already existing, and original models’ different values of hyper-parameters were tested for each model. The best parameter values were chosen, which produced the best accuracy. The best hyper-parameter values of each model used are shown in Table 7.
The proposed models and comparisons of their results
The proposed models and comparisons of their results
In the first model, the Adam optimizer was used with leaky-ReLU as the activation function. The model was trained for 14 classes of chest-related diseases and achieved a validation accuracy of 67%. Using the same ResNet-152 model, the next model did not apply time augmentation to the data. Instead, a 512 layered model was used with ReLU as activation function and Softmax as the activation function of the output layer with 14 classes. For fine-tuning, an extra batch normalization layer was added after every convolution layer and dense layer. This model achieved an accuracy of 62%, which is 5% less than the previous model.The following five models were transfer learning models: Inception V3, Inception ResNet-V2, Xception, NasNet large, and ResNet-50. Inception-V3 model had 1,024 fully connected layers and the ReLU activation function. RMSprop was used as an activation function and achieved 68% accuracy. The next four models were trained using ReLU as an activation function along with Adam optimizer. For the purpose of fine-tuning, an additional layer of average pooling was added, followed by a flattening layer. After that, two dense layers were added, each of which was followed by one layer of dropout (0.5). In the end, a dense layer was used as an output layer. The accuracy achieved by each of these five models is shown in Table 7.
The Inception-V3 model achieved maximum accuracy among all these models, although it was not a significant improvement. The last model, Also the ResNet-152 model, the last model only utilized six classes. This model also used image augmentation with the parameters mentioned above. This model was trained to classify only six classes, so it had an advantage over other models. In this instance, theResNet-152 achieved an accuracy of 83%, which was a significant improvement compared to other models. This showed that reducing the class variable drastically increases the model accuracy. Therefore, using a multistep model would be a better solution for classification of chest-related diseases in X-ray images and should be considered doing in the future.
Eight models were trained for automatic detection of chest diseases using X-ray images. After obtaining the results, we deduced that ResNet-152 trained on six classes with image augmentation had significant data available and produced promising accuracy of 0.83, then models trained on 14 classes, including classes having less available data with and without augmentation. The accuracy achieved in these cases was 67% and 62%, respectively. The ResNet-152 produced a training accuracy of 99% with a low validation accuracy of only 62%. These results indicate that the model was over-fitting. ResNet-152 with six classes showed promising results, indicating that balanced training data produces better results. The model trained with augmented images produced better accuracy than models trained on non-augmented data. The results showed low accuracy because the dataset used was still not adequate as 5,000 images per class was used. We hypothesize that increasing data points will increase the accuracy of model classification. It was also observed that pre-trained models helped reduce the computational cost and gave better accuracy.
It is important to acknowledge that we conducted a similar experiment to Sivasamy [4]. However, whereas the difference Sivasamy [4] used 10 classes, our study utilized 14 classes. Despite using 10 classes, Sivasamy [4] yielded lower results than our findings. Sivasamy [4] achieved a maximum AUC value of 0.74 for edema, while a minimum AUC of 0.52 was achieved with Edema. In addition to producing high results, we also showed that we can increase model accuracy by decreasing the number of classes to train on original images.
In summary, this research attempted to solve the problem of early chest disease detection using chest X-rays using advanced techniques in image processing and pattern recognition. This medical approach will not only increase the speed of detection but will also reduce the overall effort (i.e., time and energy) of radiologists. Using deep learning as the detection tool, this research explored the efficiency of different models in chest disease detection and under which conditions a model can work efficiently.
After experimentation, it is deduced that well-balanced data can produce better results. In our case, this is elaborated when ResNet-152 gave better accuracy with a smaller number of classes having an almost equal number of classes. Thus, it is also observed that data augmentation increases the accuracy of a model. For example, ResNet152 produces better results with data augmentation. As a result, the availability of fewer data negatively effects the model accuracy in a severe manner.
Finally, this study also indicates that a lot of research work can be done to detect and localize diseases in X-rays in the future. Models can be observed over time as more data becomes available. More classes can be added, and models can be fine-tuned accordingly to get exceptional performance.
Footnotes
Acknowledgments
The researchers would like to thank the Deanship of Scientific Research, Qassim University for funding the publication of this project.
Conflicts of interest
The authors declare that they have no conflicts of interest to report regarding the present study.
