Abstract
Lung cancer is the most common cancer throughout the world and identification of malignant tumors at an early stage is needed for diagnosis and treatment of patient thus avoiding the progression to a later stage. In recent times, deep learning architectures such as CNN have shown promising results in effectively identifying malignant tumors in CT scans. In this paper, we combine the CNN features with texture features such as Haralick and Gray level run length matrix features to gather benefits of high level and spatial features extracted from the lung nodules to improve the accuracy of classification. These features are further classified using SVM classifier instead of softmax classifier in order to reduce the overfitting problem. Our model was validated on LUNA dataset and achieved an accuracy of 93.53%, sensitivity of 86.62%, the specificity of 96.55%, and positive predictive value of 94.02%.
Introduction
Lung cancer has been the most common cancer causing highest mortality rate [1]. If most of the lung cancer is detected at later stage it increases the mortality rate of the patients [2]. Hence early diagnosis of lung cancer is very much in need to improve the survival rate of the patient which is possible by screening the high risk patients with low dose CT scans. Detecting malignant nodules at an early stage is quite challenging for the radiologist as there is high degree of similarity between early malignant nodules and benign nodules. These CT scans are analyzed by the radiologist to figure identify malignant nodules in each slices of CT scans, there is a need of computer aided detection system to automatically detect malignant tumors. Several machine learning algorithms have been proposed by researchers for classification of the nodules which includes SVM, Naive Bayes, decision trees, linear regression and neural network. Neural network is a machine learning algorithm that has three layers: input, hidden and output layer to learn the labeled data. Recently deep learning specially convolution neural network has played a vital role in medical image classification.
CNN is composed of three layers: Convolution, pooling and fully connected layer. A sequence of convolution and pooling layer extracts the features from the image whereas the fully connected and softmax layer classifies the image based on the low level and high level features extracted from the image. Several machine learning algorithms especially deep learning [3–6] have been proposed to detect pulmonary nodules in CT but CNN have shown promising results. CNN has the capability to automatically extract high level features from the image. Researchers have extracted both 2D-CNN [18, 19] and 3D-CNN [20, 21] features from CT scan images. 2D-CNN cannot extract spatial information between the pixel, but requires lower computational cost than 3D-CNN. Several researchers have proposed variation in CNN architecture to improve the performance of classification. Multi-Crop Convolution Neural Network [22], Multi Scale Convolution [23], Multi-Level based Deep CNN [24], Multi-Pathway CNN architecture [25], Multi-Resolution CNN [26], Multi-View CNN [27] are few variation in CNN architecture. Although these variation in CNN model have shown slight improvement in accuracy of the lung nodule classification, there is a need to enhance the performance of model to identify malignancy in nodule at an early stage.
Feature extraction is the most important phase in classification. Texture feature plays an important role in image classification and is defined as spatial variation in intensity of pixel. Texture analysis can be used to pathologically distinguish varying regions of the lung. Statistical texture analysis, both second and higher order has provided better results in medical image classification [14, 15]. In this paper we use second order and higher order statistical information from the CT image using gray level co-occurrence matrix and gray level run length matrix features along with the CNN features to improve accuracy in predicting the malignancy in lung nodule.CNN is composed of three layers: Convolution, pooling and fully connected layer. A sequence of convolution and pooling layer extracts the features from the image whereas the fully connected and softmax layer classifies the image based on the low level and high level features extracted from the image. Several machine learning algorithms especially deep learning [3–6] have been proposed to detect pulmonary nodules in CT but CNN have shown promising results. CNN has the capability to automatically extract high level features from the image. Several researchers have proposed variation in CNN architecture to improve the performance of classification. Multi-Crop Convolution Neural Network,Multi Scale Convolution, Multi-Level based Deep CNN, Multi-Pathway CNN architecture, Multi-Resolution CNN, Multi-View CNN are few variation in CNN architecture.
Feature extraction is the most important phase in classification. Texture feature plays an important role in image classification and is defined as spatial variation in intensity of pixel. Texture analysis can be used to pathologically distinguish varying regions of the lung. Statistical texture analysis, both second and higher order has provided better results in medical image classification [14, 15]. In this paper we use second order and higher order statistical information from the CT image using gray level co-occurrence matrix and gray level run length matrix features along with the CNN features to improve accuracy in predicting the malignancy in lung nodule.
Preliminaries
Convolution neural network
Convolution Neural Network (CNN) [7] uses multiple hidden layers to learn features from image in each layer. Our CNN architecture uses three convolution layer, three pooling layer, one fully connected layer and a softmax layer. Each convolution layer is followed by pooling layer and uses a ReLU activation function. The CNN architecture is explained in brief in Fig. 1.

Extraction of CNN features.
The input to the convolution layer is a nodule of size 50X50. This layer uses kernel of size 3X3 with a stride of 1 and produces 32 features of size 48X48 which is fed to the ReLU layer so that it can only retain the positive values. The output of the ReLU layer is sent to the max pooling layer of size 2X2 to get 32 features of size 24x24.This output is fed to second convolution layer which uses a kernel of size 3x3 and generates 64 features of size 22X22. The output of convolution layer is sent to ReLU layer which is then followed by the max-pooling layer so as to produce 64 features of size 11X11. The output of max pooling layer is fed to the third convolution layer with a kernel of size 3X3. The output of the third convolution layer is 128 features of size 9X9 and is fed to the ReLU layer. The output is finally fed to the max pooling layer to get 128 features of size 4X4.The output is flattened to get 2048 features. These features are then fed to the fully connected layer to obtain a feature map of size 128.
Texture is a very important feature for analyzing images. Texture based features [8] are used by most of the researcher and has also provided promising results. We use both second order and higher order statistical texture features using gray level co-occurrence matrix and gray level run length matrix respectively. Both the texture feature can be effectively used to extract the spatial relationship between grey levels in pixels.
Gray level co-occurrence matrix (GLCM)
GLCM determines second order statistical features i.e. the spatial relationship between the pixel using co-occurrence matrixes. Initially a co-occurrence matrix is created in different direction of ⊖ and with a specific distance ’d’ to determine relative position of a pair of pixels in the respective direction. Each value in the co-occurrence matrix represent the number of occurrence of neighboring pixels at a particular angle ⊖ for a specific distance d. GLCM thus obtained is used to extract statistical texture features from the lung nodule in CT scans. In this paper, we considered the value of d = 1, 2, 3, 4 and ⊖ = 0°, 45°, 90° and 135° and hence creating sixteen different gray level co-occurrence matrix. Haralick [9] used this co-occurrence matrix to define 14 statistical features. We extract six texture features such as contrast, dissimilarity, homogeneity, angular second moment (ASM), energy and correlation from four different co-occurrence matrixes. Since there are eight co-occurrence matrix and six features defined for each co-occurrence matrix, 96 features are gathered from each image.
The description of each features with respect to co-occurrence matrix P of size M X N are given below:
(1) Contrast: is the measure of local variation in the image and is given by
(2) Dissimilarity: is the measure of variations between pair of pixels and is given by
(3) Homogeneity: is the measure of similarity of the GLCM values of an element with an diagonal value of GLCM and is given by
(4) Angular Second Moment: is the measure of uniformity in image and is given by
(5) Energy: is the measure of information in image and is given by
(6) Correlation: is the measure of dependency of a gray level with neighbouring pixels and is given by
Where
Higher order statistical information can be obtained from Gray level run length matrix [10]. Gray level run refers to a group of pixels having same grey level values aligned in a given direction. The different combination of gray level values and length of gray level run in a particular direction is represented by a gray level run length matrix.
Galloway [10] introduced five texture features from the gray level run length matrix ‘S’ having N_S number of gray levels and N_R number of run length and formula for each of the feature are given below:
(1) Short Run Emphasis (SRE):
(2) Long Run Emphasis (LRE):
(3) Gray Level Non uniformity (GLN):
(4) Run Length Non uniformity (RLN):
(5) Run Percentage (RP):
A Chu [11] introduced two more texture features: LGLRF and HGLRE, which makes use of grey level distribution.
(6) Low Gray Level Run Emphasis (LGLRE):
(7) High Gray Level Run Emphasis (HGLRE):
(8) Short Run Low Gray Level Emphasis (SRLGE):
(9) Short Run High Gray Level Emphasis (SRHGE):
(10) Long Run Low Gray Level Emphasis (LRLGE):
(11) Long Run High Gray Level Emphasis (LRHGE):
In this study, we created four GLRLM in four different directions i.e. ⊖=0°, 45°, 90° and 135°. We extract 11 higher order texture features from four different GLRLM, thus creating a feature vector of size 44.
In this paper, we propose to design and implement a model which in turn combines the features extracted from CNN with second and higher order texture features. Apart from extracting dominant features from CNN, the model also gathers advantage from statistical features extracted from gray level co-occurrence matrix and gray level run length matrix to get spatial information between pixels.
A brief explanation of the proposed architecture is demonstrated in Fig. 2.

Proposed architecture.
The input to the model is a segmented lung nodule which is then fed to the CNN model to extract CNN features. The image is passed through a sequence of convolution and pooling layers to generate a feature vector of size 2048 and is finally fed to the fully connected layer to get a feature map of size 128. Before the feature is extracted from the image, we also train the CNN model with training datasets consisting of 1530 images of lung nodules, so as to update the weights. The lung nodule is also converted into sixteen GLCM for ⊖=0°, 45°, 90°, 135° and d = 1, 2, 3 and 4. We extract 6 statistical features such as contrast, dissimilarity, homogeneity, angular second moment, energy and correlation from each GLCM. Thus we extract 96 Haralick texture features from the lung nodule. The lung nodule is also converted to four GLRLM for 4 different direction 0°, 45°, 90°, 135°. We extract 11 higher order statistical features from each of the four GLRLM to extract a feature map of size 44. 128 CNN features are concatenated with 96 Haralick features and 44 GLRLM features and the concatenated feature map is fed to SVM classifier to predict the malignancy of the nodule which could be either benign or malignant. SVM has proved to be a very good binary classifier for high dimensional feature sets [13].
A brief explanation of training and testing phase is shown in Fig. 3. In the training phase, we extract the CNN, GLRLM and GLCM features from the training data set and concatenate the features after normalizing the feature maps. The concatenated features are used in training the SVM model with L2 regularization. We used 10-fold cross validation to evaluate the classification model. In testing phase, the CNN, fractal and GLCM features extracted from the test dataset and are concatenated. The final features are then used to predict the malignancy of the model using trained linear SVM classifier.

Training and testing phase.
Feature Extraction is a key phase in classification. The proposed approach combines both texture and deep features to gather the benefits of high level features from deep network and spatial relationship between pixels from GLCM and GLRLM. Heterogeneity of tumors can be easily detected using texture based metrics. A brief idea of the features extracted in the proposed work is demonstrated in Fig. 4.

Feature extraction stage in proposed approach.
SVM classifier works well on high dimensional dataset. We use SVM classifier with l2 regularisation instead of softmax classifier to provide a slight improvement in accuracy of classification system. This also reduces the risk of overfitting and generalization error in CNN.
Experimental setup
The training and test samples were evaluated on windows operating system with i7 processor, NVIDIA GeForce RTX 2070 (8GB on-board memory). The implementation of our networks is based on keras 2.2.4 library with tensorflow as backend.
Data description
The data set was gathered from publicly available LUNA16 [16] which contain 888 lung CT scans. The data set was split into 10 subsets of equal size and a CT scan of each patient contains 200 slices of size 512X 512 pixels. These ground truth of each lung nodule was evaluated by a team of four radiologists and if the average scores of these four radiologists are greater than 3 the nodules are labeled as malignant or else the nodule will be labeled as benign.
Data pre-processing and augmentation
CT images are stored in meta image format which includes.mhd and.raw binary file. The dataset provides a annotation file which contains information about UID, x, y, z coordinates of each finding, and the class it belongs to. Nodules were cropped based on the Cartesian coordinates given in the annotation file. The dataset contains 551065 annotations, out of which, 1351 were labeled as malignant and the rest of them labelled as benign. This class imbalance can cause overfitting problem. We overcome the overfitting problem by performing data augmentation and down sampling the negative samples. Data augmentation was achieved by rotating the positive sample by 90° and 180°.
Simulation
LUNA dataset also maintains a annotation file which keeps record of the x, y, z coordinates of the nodule. Nodules were then cropped based on the coordinates mentioned in the annotation. The database consist of 6131 training sets, 1903 test sets and 1534 validation datasets. Initially we train the CNN network with 70 epochs. The learning rate and batch size was set to 0.02 and 100 respectively. The loss function used for training was categorical cross entropy. In this experiment, Adam optimizers have provided better training accuracy. After every epoch the weights were updated. These updated weights were then used to generate the CNN feature map for training and testing. During training process, we use the trained CNN model to generate the CNN features for the nodule. We have used 80% of the dataset for training and 20% of the data for testing.
The following Table 1 gives a summary of accuracy of the proposed model with different classifiers.
Accuracy of the proposed work with different classifier
Accuracy of the proposed work with different classifier
It is evident from Table 1 that SVM has proved to provide better results when compared to other classifiers. SVM classifier provides an accuracy of 93.53%, sensitivity of 96.55% and specificity of 86.62% with the proposed architecture. Figure 5 gives a plot of training score and testing score of SVM classifier for different values of gamma parameter. The training and validation score increases as the value of gamma increases indicating that there is no over-fitting issue.

Validation curve with SVM classifier.
Receiver operating characteristic evaluates the performance of classification algorithm and is the graph of true positive rate v/s false positive rate which is shown in the Fig. 6. The AUC score of the model is 0.98 which indicate that the model has shown good degree of separation.

ROC Curve for the proposed work.
We also observe that the accuracy of classification system with addition of texture features increases. To evaluate the correctness of our classifier, we use these four metrics: accuracy, sensitivity, specificity, positive predictive value.
The formulas for each of these metrics are given below for reference:
The formulas for each of these metrics are given below for reference:
Where TP, TN, FP, FN is true positive, true negative, false positive and false negative respectively obtained from the confusion matrix.
Table 2 gives a summary of accuracy of proposed approach in concatenation with different deep learning features.
Comparison of accuracy of deep learning model with proposed approach
From Table 2, it is evident that the proposed approach has shown better accuracy of 93.53%, sensitivity of 86.62%, and specificity of 96.55%, when compared to other deep learning models.
In this paper, a computer aided detection system was proposed to extract CNN, GLCM and GLRLM features and lung nodule was classified using support vector machines. The proposed system has provided accuracy of 93.53% and positive predictive value of 94.02%. The proposed work has shown better sensitivity of 86.62% and also reduces the number of false positive. Addition of statistical texture features with CNN features has improved the accuracy of the CNN model. We also observed that SVM classifier has provided better classification accuracy when compared to other classification algorithm.
