Abstract
Accurately identifying the exact boundary region of the pulmonary nodules in lung cancer images are the most challenging tasks in the Computer Aided Diagnosing schemes (CADx). Detecting the boundaries from different nodule structures is crucial due to the presence of similar visualization characteristics between the nodules and its surroundings. The study proposed an approach for pulmonary nodule region of interest (NROI) detection and segmentation using Computed Tomography (CT) lung images. Lung nodule CT images are acquired from the Lung Image Database Consortium (LIDC) public repository having 1018 cases. In this paper, a methodology for automated tumor grading of pulmonary lung nodules is proposed using Convolutional Neural Network (CNN). The salient features of benign and malignant nodules from different nodule structures are automatically self-learned and classified based on the classification strategy. The stages involved in the methodology are: 1) Pre-processing the image datasets using discrete wavelet transforms (DWT). 2) NROI segmentation. 3) NROI Feature extraction using CNN. 4) Nodule classification. CNN are trained with self-learned extracted features from NROI and are further classified as benign or malignant. Analyzing and segregating these extracted features plays a vital role in the correct classification of malignancy levels. The methodology is compared with conventional state-of-art methods and traditional hand-crafted methods. A total of 710 pulmonary nodules are used in the study, with 258 benign samples and 452 malignant samples. A consistent behavior was observed using CNN with reduced low false positives and a classification accuracy of 96.5%, sensitivity of 96%, specificity of 96.55% and standard Receiver operating characteristic (ROC) curve with the highest value of 0.969 was recorded.
Keywords
Introduction
Lung cancer is a major challenging public issue worldwide in biomedical science leading to a high mortality rate among other types of cancer. According to statistics 2018 in the United States, nearly 234,030 new lung cancer individuals and 154,050 deaths were estimated by the American Cancer Society [1]. From the survey, lung cancer death rates were high when compared to other cancer combined. Early detection of lung cancer can improve the survival rate but it is hard to detect lung cancer as the symptoms normally arise when already in the final stages. Risk factors causing cancer are due to the consumption of tobacco, smoking and also due to biological, chemical (exposure to radon gas) and environmental conditions (exposure to secondhand smoke).
Advancing the fight against lung cancer at early stages helps prevent cancer growth to a certain extent. Certain measures include analyzing the presence of pulmonary nodules in the soft lung tissues according to their malignancy levels plays a challenging task in diagnosing the lung cancer at early stages using CT images [2] as the nodule densities may be similar to that of the other lung structures based to their anatomical properties [3]. In particular, pulmonary nodules are small oval or irregularly shaped growth in the lungs with the diameter up to 30 mm in size in the chest region [4]. In addition, Nodules are categorized according to the size, shape, location and texture information. Examining these nodule structures are rigorous and require expertise to reduce false positives. Therefore, detecting and diagnosing the nodules both play vital roles in lung cancer diagnosis at early stages to increase survival rates.
Various imaging modalities are recommended by radiologists in detecting pulmonary lung nodules [5]. Computed Tomography (CT) images are the commonly preferred imaging modality by the Radiologist’s to categorize the pulmonary nodule as normal (non-cancerous), benign (low malignancy levels) and malignant (high malignancy levels) due to their high sensitivity. Chest CT images are composed of a sequence of frames/images. These images may exhibit information related to different chest diseases such as tuberculosis, atelectasis, lung cancer, etc thus requires the radiologist’s to analyze the criticality. This could be a tremendous task for the radiologists to detect, diagnose and categorize the images. Hence computer-aided detection systems were developed to help and assist doctors/radiologists efficiently detect and diagnose the reliable nodules automatically [6].
The scope and application of image processing upon computer-aided diagnosis have a tremendous growth towards performance and accuracy over time. Lung CT Images are analyzed to detect pulmonary nodules using CAD systems. A typical CAD system examines the chest CT by 1) segmenting the region of interest (ROI) while eliminating the rest, 2) extracting the features of the segmented lung nodule, and finally, 3) classifying the nodules as normal, benign or malignant reducing false positives meanwhile generating good sensitivity scores. Some researchers have added a step of preprocessing prior to the lung segmentation process to enhance and make the image features visible in the scan.
Accurately segmenting the pulmonary lung nodules can directly affect the result analysis. Thus, recent advances in CADx systems have optimized the lung segmentation approaches to accurately enhance the performance of Lung cancer diagnosis based on the stages of malignancy levels categorized [7, 8, 9]. The lung nodule segmentation is a crucial step in identifying boundaries of different structures due to the presence of similar visualization characteristics between the nodules and its surroundings. For example, the intensity values of juxta-pleural (nodules attached to the walls of the chest) have similar intensity values to walls of the chest. Also, juxta-vascular are nodules attached to blood vessels. Differentiating the intensity values of true nodules to false nodules is difficult.
After detecting the potential candidate nodules, the salient features from different nodule structures were extracted. Extracted features using traditional hand-crafted methods are categorized as texture (structural features), morphological features, geometric (density) features and statistical (histogram) features. The existing CADx systems are in need to design these features as an essential model. But the process in manually extracting these features is time consuming and complicated [10]. Moreover, the extracted features must be in correlation to obtain expected performance measurements.
Recently, many researchers made progress in training and classifying huge datasets using deep learning CNN algorithms [11, 12]. The study reveals deep learning algorithms trained with huge datasets can outperform compared to traditional machine learning strategies [13]. This led to significant advancement in automated self-learning to extract feature maps using deep learning methods and are the trend changing concepts in medical imaging [14] in the recent past. However, hyper-parameters are to be explicitly defined for the neural network to capture the salient features from assorted volumes of CT imagesets having same size of input patches.
After feature extraction, the extracted features were classified based on homogeneity/classes of interest. In the study, the appropriate extracted features from the NROI are trained with same size of input patches and tested accordingly for nodule classification to eliminate false positives. In addition, some of the other factors to be added to system’s performance while detecting the abnormalities in the lungs include – Image acquisition, nodule size, shape and location identification, Image reconstruction, optimizing the system using some cross-validation process to ensure no false positive results.
The methodology presented in the paper focuses on the nodule region of interest segmentation, feature extraction, and classification stages. In the study, an end-to-end CNN are trained and tested with NROI samples and are classified as benign or malignant based on tumor patterns. The approach is compared with the state-of-art methods and traditional hand-crafted methods for performance evaluation.
In the paper, Section 2 describes the related work. Section 3 describes the methodology for segmenting and classifying the candidate regions using the features extracted from CNN and traditional hand-crafted methods. Section 4 illustrates the experiments and evaluation of the proposed method. In Section 5, the results obtained are discussed.
Related work
The proposed methodology aims to segment and analyze the extracted features of irregularly structured candidate nodules for better classification according to the levels of malignancy. The extracted features for nodule classification from both the traditional hand-crafted based and CNN based methods have challenging tasks to overcome as discussed above. In this section, the previous works related to the proposed method are discussed and organized more precisely into two categories based on nodule detection and classification methods. The works related to traditional segmentation and classification methods are summarized in Table 1. Table 2 represents the works related to deep learning based methods.
Category 1: Works related to traditional segmentation and classification methods
Ref [15] proposed a novel approach for early detection of lung cancer diagnosis Genetic K-Nearest Neighbour methods. A combination of genetic algorithm with K-NN classifies the pulmonary nodules quickly resulting in a classification accuracy of 90%.
Ref [16] implemented growing neural gas grouping technique to extract the pulmonary structures from the lungs. SVM systems were used to classify the extracted pulmonary structures as nodules or non-nodules. The images used in the study were acquired from LIDC dataset comprising of 29 exams having 48 nodules. The experiment reported a mean accuracy of 91%, sensitivity of 85.93% and specificity of 91%.
Ref [17] proposed a methodology to CAD systems for segmenting the lung region from each CT slice using a greedy snake algorithm. The nodule regions of interest were extracted using region growing methods. The shape and textures features were extracted from the region of interests and classified as cancerous or non-cancerous using radial basis function neural network. Total of 150 test images out of 1564 slices was acquired from LIDC dataset resulting in an accuracy of 94.44% and Sensitivity of 92.3%.
Ref [18] proposed a novel approach to CAD systems for segmenting and detecting lung nodules in CT images. Intensity thresholding was used to segment the lungs from the computed tomography images. The morphological closing operation was applied to segment the nodules attached to the lung wall (pleura connected). K-means clustering and morphological opening operation were applied to primarily detect and segment potential pulmonary nodules. The segmented nodules were categorized into six groups based on their thickness and connectivity with the lung walls. The LIDC and Mayo hospital dataset used in the study contains 133 different lung nodules identified from 1302 slices by 2 expert radiologists. The system’s sensitivity for small and large nodules is 83.33% and 93.8% respectively. Overall the system sensitivity was 91.65% and accuracy was 96.22%.
Ref [19] proposed an Efficient classification model for lung cancer stage diagnosis using Fuzzy C-means and SVM classification. Total of 70 image samples from LIDC image dataset was used in the experiment. As a pre-processing step, the Gaussian filter was used for smoothing and Gabor filter was used for enhancing the images. The lung images were segmented using FCM and classified using SVM classifiers with a classification accuracy of 93%.
Ref [20] developed a methodology for segmenting the candidate nodules on the lobes using self-organizing maps (SOM) based neural network model. Artificial neural networks were used to classify the nodules as benign and malignant samples. Images were acquired from CT scanners at Abant Izzet Baysal University Medical Faculty in DICOM format. Total of 128 CT scan images from 47 patients having 76 benign and 52 malignant samples was used in the experiment resulting in classification accuracy of 90.63%, sensitivity of 92.30%, specificity of 89.47%.
Ref [21] proposed a novel computer-aided pipeline for early lung cancer diagnosis composing of 4 stages. 1) pre-processing – Lung volume extraction method using the circular Hough transform. 2) Segmentation using self-organizing maps (SOM). 3) Extracted feature reduction using principal component analysis (PCA). 4) Classification using probabilistic neural networks. Images used in the study were taken from LIDC dataset consisting of 38 images having 26 malignant and 12 benign samples. The experiment achieved a classification accuracy of 95.91%, specificity of 94.24% and the sensitivity of 97.42%.
Works related to traditional segmentation and classification methods
Works related to traditional segmentation and classification methods
Works related to deep learning methods
Ref [22] proposed a nodule classification methodology using deep structured Multi-crop convolutional neural network (MC-CNN). Images were acquired from LIDC-IDRI database having 880 low malignancy nodules and 495 high malignancy nodules (total of 2618 images) were used in the study. Experiments resulted in nodule classification accuracy of 87.14%, the specificity of 93%, the sensitivity of 77% and an AUC score of 0.93.
Ref [23] proposed an end-to-end machine learning algorithm to automatically extract the salient features from CT images for lung cancer detection. A total of 13688 samples from LIDC datasets were used in the study. The nodules were segmented using the expert radiologist’s annotations and markups. Three multichannel ROI based deep learning algorithms were designed 1) CNN 2) Deep belief networks (DBN) 3) Stacked denoising autoencoder (SDEA). The methodology recorded an AUC of 0.899 for CNN, 0.884 for DBN, 0.852 for SDEA, and 0.848 for traditional CADx.
Ref [24] designed a nodule classification methodology using Optimal Deep Neural Network (ODNN) and Linear Discriminate Analysis (LDA). The extracted salient features from candidate nodules were optimized using LDA and were further classified using Modified gravitational search algorithm (MGSA). The experiments were conducted on 70 images. For training the network, all 70 images (Normal – 27, Benign – 21, Malignant – 22) were used while inturn 30 images were used for testing (Normal – 8, Benign – 11, Malignant – 11). The methodology achieved an accuracy of 94.56%, the specificity of 94.2% and sensitivity of 96.2%.
Ref [25] implemented a CADx system based on transfer learning for lung nodule detection. Images were acquired from LIDC dataset comprising of 700 nodules and 700 non-nodule samples. The database samples are pre-processed and cropped into 224
Ref [26] designed a methodology to segment the nodule contours using an end-to-end fully automated framework. The algorithm involves 3 major phases. 1) candidate nodule detection using Faster regional CNN 2) false-positive reduction and candidate merging using CNN 3) candidate nodule segmentation using customized fully CNN. Images from LUNA 16 image datasets comprising of 888 CT scans having 223 nodules were used for experiments and resulted in a classification accuracy of 91.4% with false positives 1 per scan and 94.6% with false positives of 4 per scan.
Ref [27] proposed a lung nodule detection methodology using convolutional neural network. Raw image patches from LIDC datasets were used to train the CNN in the study to reduce the complexity of the system. Each CT image was split into several input patches with 3 nodule types and 3 non-nodule types for experiments. The methodology resulted in a sensitivity of 92.8% with 8 false positives per scan (FPs/scan).
Architecture of the proposed method.
Image dataset consisting of a) Lung CT image with Malignancy level 5 nodule b) Ground truth values c) Segmented NROI image.
The main goal of the study is to design and implement a segmentation process, extracting only the nodule region of interest for automated tumor grading from different lung nodule structures and classify them according to their levels of malignancy. The Fig. 1 shows the implementation of the proposed method with 5 phases: Data Acquisition, pre-processing, NROI segmentation, Feature Extraction, and Classification. From the segmented NROI, the statistical and texture features are extracted for comparison. The extracted features are analyzed and segregated for correct classification of pulmonary nodules as benign or malignant candidate regions.
Data acquisition
Lung image database consortium (LIDC) and Infectious Disease Research Institute (LIDC/IDRI) is one of the most popular publically available databases for detecting and diagnosing lung cancers. LIDC image datasets consist of lung cancer screening clinical thoracic computed tomography (CT) scans with marked-out annotated lesions in an xml file [28] by four radiologists. Seven academic centers and eight medical imaging companies are collaborated to create the LIDC dataset consisting of 1018 cases having images with the scan slice thickness varying from 1.25 mm to 30 mm and malignancy levels varying from Level 1 to Level 5. The CT images are pre-processed and the corresponding four Radiologists marked-up annotations and ground truth (GT) values help to segment the NROI. Each xml file consisting of annotations categorize the pulmonary nodules into 3 classes: nodule less than 3 mm, nodule greater than or equal to 3 mm and non-nodule greater than 3 mm. The images retrieved are of size 512
Image samples of 52 
Prior to the nodule detection, pre-processing the CT image enhances the quality and reduces the noisy artifacts occurred while capturing the images. Some of the image processing techniques used for noise removal and contrast adjustments are the wavelet transforms [30], median filter, Gaussian filter [31, 32], Laplacian and Sobel filter, contrast stretching (normalization), histogram equalization, CLAHE [33] and Gabor filter [2] and so on. In the study, discrete wavelet transform is applied on CT images to enhance the salient features or hidden details at different scales by decomposing the images into 4 frequency sub-bands as represented in Eq. (1) using low pass and high pass filters by applying Daubechies filters. Daubechies filters achieve the perfect reconstruction of the original signal when compared to other filters. These filters help in identifying the sudden changes in intensities in detail with respect to the original image. Finally all 4 frequency sub-bands are reconstructed using inverse DWT resulting in enhanced original image.
The candidate regions of pulmonary nodules are segmented using marked-up annotations and its corresponding ground truth values and masks. Each annotation in the xml file in correspondence to the image slice is retrieved and their corresponding locations on CT images are traced for candidate nodule region of interest segmentation [29]. Based on their malignancy levels, the candidate region’s pixels values are retained using the masks meanwhile the rest are padded with zero, forming tiff image samples of the 52
NROI images and their corresponding cropped CT scans.
Feature extraction is a significant method used for computing dimensionality reduction of input data into a set of minimal features. In the study, salient features from the NROI segmented images are extracted from both traditional hand-crafted and CNN methods that play a prominent role in classifying the nodule images according to the categories of classes specified. A total of 23 features are extracted from traditional hand-crafted statistical and texture behavior that define discriminative features.
Traditional based feature extraction
Statistical features are histogram based features indicating the frequencies of pixels varying from 0–255 gray level values present in image datasets. The statistical features are extracted for both benign (low malignancy) samples and malignant (high malignancy) samples. Some of the commonly extracted histogram features are mean, standard deviation, variance, skewness and kurtosis. The corresponding equations are derived as follows.
Mean: is an average gray level pixel value of a particular region/segmented area. It is roughly towards intensity values but not actually related to texture.
Variance: are gray level fluctuations from the actual mean gray level pixel values. The statistical distribution of variance helps distinguish profiles with low contrast using its texture.
Standard deviation: is the square root of variance indicating the image contrast. Images with low contrast have low variance values whereas images with high contrast have high variance values.
Architecture of the CNN. Skewness: Skewness is the measure of asymmetry with respect to gray level values around the sample mean. Skewness of the histogram is categorized into positives, negatives and zero curves.
Kurtosis: is a measure of how prone a distribution is with respect to an outlier. Kurtosis depicts the shape of distribution of the histogram tail.
Texture features – The texture of the image is calculated using their probability scores. The gray level coherence matrix (GLCM) and wavelet features are extracted for better classification performance and accuracy in the methodology. GLCM texture features indicate the estimation of recurrence of occurrences pair-wise having the same pixel values considering spatial relationships. GLCM features extracted in the experiment include energy, entropy, correlation, contrast, homogeneity, autocorrelation, cluster Prominence, cluster shade, Difference in entropy, Difference in variance, dissimilarity, information measure on correlation1, information measure on correlation2, inverse difference, maximum probability, sum average, sum entropy, sum of square variance and sum variance. The features extracted like Energy, Entropy, Homogeneity, Contrast, Correlation, Sum Average and Sum Entropy plays a prominent role in classifying images as benign or malignant. The corresponding equations are derived as follows.
Energy: also known as second angular moment or uniformity. It is the sum of squared elements in GLCM that describes the consistency in gray level distribution contributing to shape the maximum strength of surface.
Entropy: Entropy refers to image data that requires a compression process. Images having of low entropy values exhibits low contrast meanwhile images having high entropy exhibits large contrast of pixel values.
Homogeneity: measuring how close the distribution of intensity values in GLCM with respect to GLCM diagonal. Contrast: Measuring the local variance that calculates the spatial varying moments in GLCM.
Correlation: Measuring the joint probability occurrences of gray levels of pixels with linear dependencies.

Learning curve of the proposed methodology using ‘Adam’ optimizer.
Figure 5 depicts the architecture of the proposed CNN algorithm. Each CT scan of dimension 512
The first convolution layer is of size 5
Segregation of input samples based on the malignancy levels
Segregation of input samples based on the malignancy levels
Nodule diagnosis by 4 expert radiologists.
For traditional feature extraction methods, support vector machine classifiers are used to distinguish the nodules as benign or malignant based on the suspicious malignancy levels. SVM are straightforward supervised learning algorithms used for classifying categories of classes. The re-sampled 52
Where
DCNN architecture performance measure using Adam optimizer
DCNN architecture performance measure using Sgdm optimizer
Images false positively accepted as high malignancy nodules.
Images false positively accepted as low malignancy nodules.
The proposed methodology was evaluated on the segmented NROI samples resulting in a classification accuracy of 93.46%. To improve the performance of the existing methodology, we interpreted the inconsistency raised in the nodule classification. The hand-crafted features from the statistical and texture behavior were observed with overlapping values. A study was carried out to identify the inconsistency with candidate regions having different malignancy suspiciousness. As the candidate nodules are segmented using marked-up annotations and its corresponding ground truth values and masks, each xml files associated with CT scans were interpreted for annotations and analyzed to trace the inconsistencies. With certain observations in some cases, a particular candidate region has different malignancy levels (3, 4, 5) evaluated by four different radiologists for the same CT as shown in Fig. 7. Example: One radiologist evaluates the candidate region as malignancy level 3 while the other radiologists have evaluated the same candidate region as malignancy level 4 and 5 correspondingly. Due to incorrect malignancy suspiciousness, false categorizing the segmented images as benign or malignant samples resulted in low-performance overall exhibiting false positive classification of candidate nodule regions.
From a detailed study, the images with overlapping feature values were identified and traced to re-check the corresponding radiologist’s evaluation. Some candidate regions were false positively accepted as nodules by radiologists (example: images with ribs, vessels or diaphragm). A number of candidate nodule regions were true positively accepted as high malignant samples which were supposed to be true positively considered as low malignant samples as depicted in Fig. 8. The extracted features of these images have values that fall under the range of values distinct to benign samples. Similarly, the Fig. 9 demonstrated some candidate regions true positively accepted as low malignant samples but were supposed to be considered as high malignant samples with high probability. Likewise, the extracted features of these images have values that fall under the range of values distinct to malignant samples. Thus by segregating the samples manually according to their malignancy levels, a total of 70 samples were potentially identified and tested in phases as shown in Table 3. The study depicts a total of 70 samples were false positively categorized based on their ambiguous malignancy suspiciousness. These candidate nodules greater than 3 mm with incorrect malignancy levels were segregated and grouped manually by correctly positioning the candidate nodules into its corresponding folders forming a nodule grouped dataset.
Proposed methodology tested using ‘Adam’ and ‘Sgdm’ optimizer.
Comparison of proposed CNN with traditional CADx
Comparison of results with works related to traditional methods
Comparison of results with works related to state-of-art methods
The CNN architecture was experimented with different configurations of convolution layers, filter sizes, and epochs for performance evaluation. CNN with nodule grouped dataset was trained with Adam optimizer with varying input parameters as shown in Table 4 with a highest classification accuracy of 96.48%. In addition, the experiments were repeated to train CNN with Sgdm optimizer and the results are depicted in Table 5 with maximum classification accuracy of 93.9%. Adam optimizer exhibited better performance measures compared to Sgdm optimizer as shown in Fig. 10. Finally, based on the experiments, we fixed the hidden layer’s neurons as [12, 8, 6, 4] with the kernel size of [5, 3, 5, 3] due to their relative stability while continuing our experiments.
A consistent improvement was observed with the proposed methodology using a nodule grouped dataset exhibiting the highest classification accuracy of 96.48% compared to the initial dataset. Table 6 depicts the performance of the CNN and traditional hand-crafted methods for both datasets. The methodology exhibits good results compared to traditional hand-crafted methods. In addition, the algorithm outperformed compared to the traditional segmentation and classification methods and conventional state-of-art methods as shown in Tables 7 and 8 respectively.
Area under the Receiver Operating Characteristic curve (AUC).
Confusion matrix of the proposed methodology.
The proposed approach used 710 candidate nodule images proportionally separated into 2 groups namely training and testing sets folds. The statistical and texture features were extracted from the segmented solitary nodules and were classified using SVM classifier. In addition, an end-to-end automated feature learning was employed using CNN. In general, few pulmonary nodules may be false positively classified as true nodules if the morphological appearances of irregular shaped structures are similar to the actual nodule structure features. Hence significantly identifying and designing a set of features relevant to particular image datasets for classification are really challenging. The procedures in designing features are time-consuming and may not guarantee good results if correlations between features are not properly considered. With a minimal set of extracted features maps, the approach was efficient in analyzing lung cancers by distinguishing different structures of nodules classified as benign or malignant according to the malignancy levels of classes categorized. The methodology exhibits good results for lung nodule classification with the area under the receiver operating characteristic curve (AUC) score of 0.969, classification accuracy of 96.5%, sensitivity of 96%, specificity of 96.55% compared to the previous works. Figures 11 and 12 depicts the corresponding ROC curve and confusion matrix respectively.
Also generalizing a template for segmenting only the solitary nodules with varying sizes of irregularly shaped nodules were difficult. Therefore all the images in the database were re-sampled creating a 52
Performance evaluation of the proposed method with varying testing datasets
All the above-described methods were run on Matlab 2018b version on a desktop machine with the memory of 8 GB, 12 (4 C and 8 G) core AMD A10 processor and an Nvidia GeForce GTX 960 GPU enabled to analyze the results of the proposed method.
The proposed methodology demonstrates consistent results for classifying lung nodules as benign (Low malignancy level) or malignant (high malignancy level) samples based on malignancy levels of classes categorized. The methodology proposes the efficiency of detecting and segmenting the solitary nodule regions of interest from lung CT images. The NROI rectangle of 52
Future work
Although the current results are encouraging, increasing the number of deep learning layers can improve the performance of detailed volumetric analysis of cancer nodules in medical image analysis. A detailed investigation on optimal size of input patches and filter sizes are to be carried out.
