Abstract
Colon cancer is one of the highest cancer diagnosis mortality rates worldwide. However, relying on the expertise of pathologists is a demanding and time-consuming process for histopathological analysis. The automated diagnosis of colon cancer from biopsy examination played an important role for patients and prognosis. As conventional handcrafted feature extraction requires specialized experience to select realistic features, deep learning processes have been chosen as abstract high-level features may be extracted automatically. This paper presents the colon cancer detection system using transfer learning architectures to automatically extract high-level features from colon biopsy images for automated diagnosis of patients and prognosis. In this study, the image features are extracted from a pre-trained convolutional neural network (CNN) and used to train the Bayesian optimized Support Vector Machine classifier. Moreover, Alexnet, VGG-16, and Inception-V3 pre-trained neural networks were used to analyze the best network for colon cancer detection. Furthermore, the proposed framework is evaluated using four datasets: two are collected from Indian hospitals (with different magnifications 4X, 10X, 20X, and 40X) and the other two are public colon image datasets. Compared with the existing classifiers and methods using public datasets, the test results evaluated the Inception-V3 network with the accuracy range from 96.5% - 99% as best suited for the proposed framework.
Introduction
Colorectal cancer is common, and the fourth-leading cause of cancer-related deaths worldwide. In 2018, 10% of the worldwide cancer cases were recorded and it was evaluated that colorectal cancer was the second most common in both men and women [9]. The heterogeneity of biological tissue structures poses a challenge for both manual and automated histopathological study of slides [19]. Precise tumor detection is crucial for the survival of the patient and can be effectively achieved by analyzing the stained histological sections collected through biopsy or surgery. However, the significant role of virtual microscopy in pathology departments is undoubtedly used but the intra- and inter-observer variability remains a challenge because of the qualitative slide inspection. Even experienced pathologists are still not in agreement with tissue classification, and this leads to the conclusion that expert assessment alone is not sufficient to be the gold standard for histopathological evaluation [37]. Therefore, the market for computerized automation to boost diagnostic reproducibility is in high demand [19] across different magnifications. Traditional approaches seek to incorporate pattern recognition systems for rapid and automated cancer diagnostics with a range of hand-made features (texture and morphological). These are extracted from histology images and trained over a classifier to identify cancerous cells. Paradigm has recently moved to deep learning techniques in which the extraction of features and classifications are integrated within a single body.
Deep CNN consists of feed-forward artificial neural networks of several hidden layers and has become common for several applications, such as object tracking, detection, image classification, computer vision, etc., as they have shown significant performance [18, 20]. However, a large data set is typically needed with ground truth labels to train these deep CNNs from scratch. This is particularly unlikely in cases where the labeled data are scarce as it is expensive and tedious and typically restricted due to privacy issues for biomedical datasets. These factors prohibit deep CNNs from being implemented in many realistic circumstances.
Transfer learning has become popular among researchers as a viable alternative to profound CNNs as the original structure of the pre-trained model is preserved and used either to extract features or can be further tuned using the data available. Although the natural and biomedical data differ, it has been shown that deep CNN models trained with large-scale natural image datasets can be refined effectively to produce consistent and reliable results for biomedical imaging analyses [28]. With the BreaKHis dataset [33], transfer learning was employed with AlexNet and VGG16 models and with the deep learned features provided improved results using Support Vector Machine (SVM) classifier [12]. Pre-trained network models VGG16, VGG19, and ResNet50 were later analyzed with the same dataset for magnification independent scenario, and VGG16 with logistic regression classifier performed the best [27]. ResNet-50 and DenseNet-161 models were tested for the classification of histopathological images, and the former performed better [36]. Thus, different pre-trained CNN models are used for particular types of histopathological images as their efficiency varies. Consequently, the performance of the transfer learning model depends on the choice of the pre-trained models, the dataset images, their magnification, and the classifier. Hence, the choice of the existing CNN model best suited for a particular type of histopathological image is difficult.
In light of the above facts, a magnification-independent feature learning framework for colon cancer detection with a BO-SVM classifier is proposed in this paper. Pre-trained networks AlexNet, VGG16, and Inception-V3 are analyzed by extracting the deep learned features and the best network is compared with the effect of traditional SVM and softmax layer classifier. This is done for the particular task of colorectal cancer screening using four datasets of various magnifications. Specifically, the contributions are as follows: Which pre-trained neural network is best suited for colon cancer detection in multiple datasets with various magnifications? The proposed model performance evaluation measures are compared for the pre-trained neural networks including the Alexnet, VGG 16, and Inception-V3 which were trained with colon images. Does the BO-SVM classifier perform better than traditional classifiers? To verify this, the proposed framework was compared with traditional SVM and softmax layer classifier with the high-level features derived from different pre-trained networks. How robust is the proposed framework? In this case, the model was evaluated using four datasets where two were collected from Indian hospitals with different magnifications, and the other two from public colon image datasets available. The model was compared with the existing techniques on the public datasets for generalization.
The following sections are structured as: Section 2 presents a brief overview of the related work. The proposed methodology is given in Section 3. Section 4 presents the experimental results and discussion. Section 5, concludes the paper.
Related works
Traditional supervised cancer screening approaches typically include the extraction of features and the training of a classifier for these features. For the colon cancer detection problem, a series of colon images were tested for different combinations of textural and morphological features, and an SVM classifier was trained to detect cancer [3, 5]. Various features including architectural, texture, and wavelet were extracted from the dataset images of colon histology for grade prediction [4, 26]. In [25], the colon biopsy samples are segmented into three clusters using Adaptive Pillar K-means, and, based on Lumen Circularity (LUC), measure images are categorized as normal and malignant with the decision tree. The colorectal gland tissues are segmented using an adjustable threshold procedure in [7] and geometrical features are extracted to classify them as cancerous and non-cancerous using an SVM classifier with an efficiency of 93.74%. A predictive precision of 98% is achieved for colorectal cancer detection with 174 images by a combination of traditional texture and morphology with some modern geometric features [23, 24]. Only low-level and non-representational features of histopathological images are the final features extracted. This can then contribute to a bad classification result for the final model.
The recent development of deep neural networks that can combine automated feature extraction and classification in a single framework has taken over the traditional framework due to tuned complex segmentation and hand-crafted features. [31] proposed a new deep learning method of locality-sensitive approaches for detecting and identifying nuclei on hematoxylin and eosin (H&E) stained histopathology images of colorectal adenocarcinoma based on CNNs. Segmentation and classification based on the deep neural network were proposed to classify the benign and malignant colon images with 95% accuracy [15]. To obtain a square patch, each image is cropped and applied with affinity transformations to acquire more training samples. These samples are used to train a customized CNN architecture consisting of three convolution blocks, three completely connected layers, and a softmax classification layer [8]. Though this technique surpassed handcrafted strategies and amounted to about 83.25% accuracy, a large data set with the ground-truth mark is always required to train these deep CNNs from scratch. However, the use of deep CNNs in many practical biomedical problems is paralyzed due to limited datasets and being computationally expensive.
Moreover, transfer learning has been identified by researchers as an alternative to deep CNN training from scratch and restricted data availability. [38] suggested using deep-neural CNN activation features from ImageNet in large-scale images of the brain tissue and colon histopathology for segmentation and classification with an SVM classifier. This proved CNN features are better than hand-crafted features. [2] proposed a shape feature - Best Alignment Metric (BAM) - extracted from the region of interest obtained after CNN segmentation and tabulated an accuracy of 97% with SVM classifier for colon histological images. The abstract features extracted from Alexnet were used with a tandem of classifiers for the breast and colon histological image classification with an accuracy of 86% and 98.2% for four-class classification [17]. An analysis of feature learning with the CNN models SqueezeNetv1.1, MobileNet-v2, ResNet-18, and DenseNet-201, and classification with an SVM classifier lead to higher accuracy on four publically available datasets when compared to the fine-tuned models [1].
Due to the complex structure, scarce annotated data, and time-consuming nature, training CNN from scratch is less preferred. Transfer learning with finetuning or feature learning with classification is considered a solution to this. SVM classifier is usually considered for binary classification where the performance depends on the hyperparameters, and hence, optimum values must be chosen for better performance. Depending on the type of histopathological images, various pre-trained models have been experimented with, and the efficiency of the framework varies. Thus, the performance of transfer learning for the detection of cancer in the colon is highly dependent on the CNN model and the classifier. This paper aims to analyze the CNN models best suited for colon cancer detection across various datasets where hyperparameters of SVM are optimized with Bayesian optimization.
Proposed methodology
In this paper, the proposed colon cancer detection model identifies the cancer images with the deep neural features derived from the existing pre-trained neural network with the BO-SVM classifier. The model comprises three stages as shown in Fig. 1; (i) Data Augmentation; (ii) Feature Extraction with Transfer Learning; (iii) Classification. Thus, the proposed model is analyzed for three pre-trained networks, Alexnet, VGG 16, and Inception-V3, and the best network is chosen for the colon cancer detection framework. The following section describes these stages.

Proposed methodology.
Herein the data augmentation process, the quality and degree of variance in the data training are enhanced. It is used to classify images for stronger and simplified models that are invariant in image transformations and quality [22]. To specify additional incremental operations on the training images: flip the training images randomly around the vertical axis and convert them randomly up to 30-pixels, and scale up to 10% vertically and horizontally. Moreover, the augmentation in data helps to prevent overfitting the network and store the exact information of training images. After the augmentation, all images were sampled down to the network input size.
Feature extraction with transfer learning
Specifically, the Alexnet, Inception-V3, and VGG- 16 networks were used for the transfer and feature learning experiments for the colon biopsy image classification. Such architectures are similar because they comprise of sequentially ordered, convolutionary, pooling, and completely connected layer that creates a classification at the imaging level. However, the basic structure and number of these layers differ in each architecture (Fig. 1). A brief description of these architectures is given below. Alexnet [16]: It is the name of Alex Krizhevskys CNN. It is based on the ImageNet database on over one million images. The network is 8 layers thick and categorizes images into a thousand categories including keyboards, mouse, crayons, animals, and so on. Therefore, the network has represented numerous images, and the input size of the network is 227 × 227. Thus, the features are extracted from the FC8 layer for classification. VGG-16 [29]: It is a neural network developed on over one million images of ImageNet. The network consists of 16 layers and recognizes pictures in 1000 object categories such as keyboard, mouse, crayon, and animals. Hence, it represents the variety of images in the network. The network has a 224 × 224 image feed, as well as extracted features from the FC8 layer after the transfer learning. Inception-V3 [35]: It is the CNN equipped with over a million ImageNet pictures. This is a wide network with 48 layers and can group images into 1000 categories of objects, including keyboards, mouse, pencils, and animals. Thus, the network has a rich depiction of a wide variety of images with a picture resolution of 299 × 299.
Though the architectural network model is initialized randomly in the final and completely connected stage, each architecture is considered by initializing its model weights based on pre-training in the image dataset in the transfer learning version. These pictures are converted into numeric feature vectors using the pretrained CNN, optimized weights, and without the need for fine-tuning. Further, these extracted features are used for the classification.
Classification
From the pre-trained networks, the high-level features are fed into the BO-SVM classifier to categorize the images into normal and malignant ones. Moreover, the performance of the SVM classifier was enhanced by optimizing the parameters with Bayesian optimization and tabulating the optimal hyperparameters [10, 32].
The SVM-based predictor model is created using sets parameters ω to a specific hyperparameters λ (such as the width of Radial Basis Function (RBF) kernel γ and C strength). Thus the objective function S is maximized with f (λ) as in Equation 1.
The data is assumed to be constrained from a subordinate joint distribution over samples with their binary labels μ as
Thus, the obtained optimized parameters of SVM are used to classify the samples as normal and malignant.
This section discusses the datasets used for evaluation and performance measures. Later, the results are analyzed.
Data sources and performance evaluation measures
The proposed model is analyzed using different colon biopsy image datasets with various magnification factors consisting of normal and malignant labels. A brief description is provided in Table 1. Dataset A: This dataset comprises images of H&E stained slides of colon biopsy, 5-6-μmm-thick tissue section taken at different magnifications (4X, 10X, and 40X) for normal and malignant images from the Ishita Pathology Center and Cytocare Pathology Center. Magcam CD5 with Olympus CX33 was used to capture the images. Dr. Ranjana Srivastava, the Senior Consultant at the Ishita Pathology Center analyzed the H&E slides and prepared the image labels for the dataset. Dataset B: This dataset consists of images of H&E-stained colon biopsy samples of a 5-6μmm-thick tissue section taken at various magnifications (10X, 20X, and 40X) for normal and malignant images from Aster Medcity, Kochi, India. The NIS element viewer microscope was used to view the slides, and Nikon eclipse Ci was used to capture images. Dr. Shahin Hameed and Dr. Sarah Kuruvila, Department of Pathology, Aster Medcity, Kochi, India, analyzed H&E colon biopsy slides. They prepared the dataset providing the ground-truth labels. ImeadiaTreat [34]: This dataset provided by Stoean et al. includes images captured from H&E-stained colon, tissue slides obtained by the Emergency County Hospital of Craiova, Romania. GlaS dataset [30]: consists of colon images from the University Hospitals Coventry and Warwickshire, UK, collected by a team of pathologists. Zeiss MIRAX MIDI was used for collecting images of colon tissue that were stained by H&E.
The performance measures used to evaluate the proposed model is given in Table 2 with true positive (P
v
), true negative (N
v
), false positive (FP), and false negative (FN).
Datasets used for evaluation
Datasets used for evaluation
Performance evaluation measures
To validate the predictive precision of the model, we used 10-fold cross-validation to formulate data for the training/testing. The proposed model is evaluated in different stages as follows: The proposed model is evaluated with three pre-trained networks; Alexnet, VGG-16, and Inception-V3 to find the best-suited network for colon cancer detection. We compared the BO-SVM classifier with the traditional SVM and softmax layer classifier for different pre-trained networks. The proposed model is evaluated with various magnifications of Dataset A and B to demonstrate the robustness. Finally, we compared the existing methods in the GlaS dataset with the proposed framework.
The proposed model is evaluated on four datasets consisting of various magnifications using three networks - Alexnet, VGG-16, and Inception-V3 - as demonstrated in Table 3. We considered 4X, 10X, and 40X magnified images from Dataset A, and Dataset B, 10X, 20X, and 40X images were also taken for training and testing with cross-validation. Thus, Datasets A and B considered all the magnified images. Moreover, Datasets A, B, and Glas have the highest accuracy of 98.33%, 99.07%, and 96.67% respectively on Inception V3 network, whereas, the Dataset ImediaTreat, gives the highest accuracy of 100% on Alexnet network. However, considering other parameters such as F-Score, Kappa Statistics, and MCC, we concluded that the Inception-V3 network is best suited for colon cancer detection with the proposed framework. F-Score of 0.9836, 0.9907, 0.9804, and 0.9655 was observed for Datasets A, B, ImediaTreat, and GlaS respectively with the Inception-V3 network, which is high when compared to other networks.
Performance measures of proposed model on different datasets
Performance measures of proposed model on different datasets
The proposed method classified the samples with BO-SVM with the features extracted from each of the pre-trained networks. Figure 2 demonstrates the accuracy comparison between the proposed method with the SVM classifier and the softmax layer classifier for the three network architecture. With Alexnet, the proposed framework gave a higher accuracy of 0.933, 0.9722, 1, and 0.8889 when compared to the SVM classifier and the softmax classifier. However, with the VGG-16 network, the softmax layer classifier gave better accuracy of 1 and 0.9231 on Dataset B and GlaS when compared with other classifiers. Additionally, when comparing Inception V3 with SVM and softmax layer classifier, the proposed framework gives better accuracy of 0.9833, 0.9907, 0.9808, and 0.9667 for Datasets A, B, ImediaTreat, and GlaS respectively. Figure 3 shows the execution time taken by these classifiers on different datasets on the Inception-V3 network. It shows that the softmax layer classifier takes time for complete training and tuning in transfer learning on the Inception-V3 network than others. It is also noted that as the size of the dataset increases, the training time also increases. However, BO-SVM takes more time than SVM but produces better results than SVM in all of the datasets. Thus, by analyzing all the statistical error parameters, the proposed framework works better when compared with SVM and softmax layer classifier on the Inception-V3 network.

Bayesian Optimised SVM (BO-SVM) comparison with SVM and Softmax classifier across different network architectures and datasets.

Execution time comparison between SVM, softmax layer classifier and BO-SVM on different datasets.
To illustrate the superiority of SVM over other classifiers, features learned from different CNN models are classified with various classifiers - k-Nearest Neighbor (NN) with k=5, Naive Bayes, Adaboost with a decision tree, and SVM with Weka 3.8 - and the accuracy comparison is given in Fig. 4. k-NN performs least with all the networks. SVM performs better when compared with the other classifiers across datasets as well as CNN models with the highest accuracy of 0.9583, 0.9722, 0.9688, and 0.9444 with Datasets A (Inception-V3), B (Alexnet, VGG 16), ImediaTreat (Inception-V3), and GlaS (Inception-V3) respectively. Hence, in the proposed framework, the hyperparameters of SVM were optimized to enhance the performance.

Accuracy comparison of feature learning with different classifiers.
The proposed framework has been evaluated for various magnifications, where 4X, 10X, and 40X; and 10X, 20X, and 40X images are considered from Dataset A and B respectively. This evaluation is shown in Fig. 5. With 4X of Dataset A, VGG-16 and Inception-V3 networks give an accuracy of 0.975, whereas, 10X and 20X in Inception-V3 network have the highest accuracy of 0.9833 and 0.9808 respectively when compared with the other networks. With Dataset B 10X magnification, all the networks obtained an accuracy of 0.9444, whereas, with 20X and 40X magnifications, the VGG-16 network performs better with 1 and 0.9722 respectively. The difference in the performance is due to the variations in the image acquisitions, the staining methods, and the different illumination conditions. However, considering all the differences and other statistical error parameters, the proposed model performed better than all the traditional models across the various magnifications.

Proposed method evaluation across different magnifications for Datasets A and B.
The high-level features extracted from the trained Inception-V3 performs better than the general texture features. Figure 6 gives the comparison of the proposed framework with existing texture features including GLCM [13], Histogram [14], LBP [21], HOG [11], and hybrid (GLCM+ Histogram + LBP + HOG). Further, these features are extracted from the stain normalized and contrast-enhanced colon images and classified using BO-SVM. In this evaluation, the performance of these features varies in each of the datasets. The LBP and GLCM features perform better on Dataset A, and no particular feature performs in all the datasets. However, the proposed model features from the Inception-V3 network performed well in all the datasets with an accuracy of more than 96.5%. Thus, the high-level features extracted from the pre-trained networks can be generalized for colon cancer detection.

Accuracy comparison of the proposed framework with handcrafted features on different datasets.
Moreover, analyzing the performance evaluation measures on different existing network architecture, found that the proposed methodology works best on the Inception-V3 network with the BO-SVM classifier. The feature map downsizing is usually carried out with max-pooling for AlexNet and VGG-16, where the maximum pooling followed by a convolution layer is too costly. The powerful reduction in grid size in launch V3 integrates two sets of feature maps into 640 feature maps and leads to the next stage of the inception module. Thus, the Inception-V3 network with BO-SVM detects cancer in colon biopsy images with better performance in all four datasets.
Table 4 illustrates the comparison of the proposed method with different existing methods for the GlaS dataset. The proposed generalized model was tested with four different datasets across various magnifications, and 96.67% accuracy was observed on the GlaS dataset. The overall results of the proposed model were compared to the existing methods as demonstrated in Table 4. However, [2, 24] exhibited better performance in accuracy, F-Score, and specificity as these techniques were tuned to work for this dataset. Image segmentation for finding the region of interest in these techniques depend on the magnification and have been proven only on the GlaS dataset. 10X images were also evaluated in [24], whereas the proposed model was evaluated across multiple magnifications (40X, 20X, 10X, and 4X). The handcrafted features are extracted from the region of interest in the existing methods, while high-level features are extracted from the Inception-V3 trained network without any segmentation in this study. However, existing techniques [17, 34] with ImediaTreat network are not included as these have contributed toward multiclass classification and grading and detection are less explored. Thus, the task of the proposed model detects cancer from the features extracted regardless of the segmentation tasks. Interestingly, the proposed method is a promising generalized colon cancer detection framework that extracts high-level features from the Inception-V3 trained network. It has an average classification accuracy of 96.5% -99% across four different datasets from different countries across various magnifications (40X, 20X, 10X, and 4X).
Comparison of the proposed method with existing methods on GlaS Dataset
This paper emphasized on colon cancer detection framework based on features extracted from existing deep learning networks, where transfer learning was deployed and classified using Bayesian optimized SVM classifier. The networks are built on the large ImageNets image dataset. Based on the specific task, the number of neurons inside the finally attached layer can be calculated and trained again with the fully connected layer parameters. The proposed framework was experimented with for three networks including the Alexnet, VGG-16, and Inception-V3 using four datasets; and the evaluations showed that the Inception-V3 network is the best for colon cancer detection with an accuracy of 0.9833, 0.9901, 0.9808, and 0.9667 for Datasets A, B, ImediaTreat, and GlaS respectively. Thus, the proposed framework performed better when compared to the existing classifiers and techniques on public datasets. However, the most important limitation lies in the heterogeneity of the histopathological representations of colon cancer. This creates problems in trying to classify an image as benign. Therefore, the emphasis of our future work should be on how this effect on the study of histopathological photos of colon cancer could be avoided or reduced. Further grading of the malignant images could also be explored.
Footnotes
Acknowledgment
We thank Ishita Pathology Center (Allahabad) and Aster Medcity (Kochi) for providing the datasets for smooth research. We also thank Dr. Ranjana Srivastava (Consultant Pathologist, Ishita Pathology Center) and Dr. Sarah Kuruvila (Former Senior Pathologist, Aster Medcity), Dr. Jyotima Agarwal (Cytocare Center, Allahabad) for extending their helping hand for the completion of this research. Dr. Shahin Hammed (Consultant, MVR Cancer Center, Poolacode, Kerala) was working with Aster Medcity at the time of dataset collection and is continuing his valuable support throughout the research.
