Abstract
Papillary thyroid carcinoma (PTC) is a common carcinoma in thyroid. As many benign thyroid nodules have the papillary structure which could easily be confused with PTC in morphology. Thus, pathologists have to take a lot of time on differential diagnosis of PTC besides personal diagnostic experience and there is no doubt that it is subjective and difficult to obtain consistency among observers. To address this issue, we applied deep learning to the differential diagnosis of PTC and proposed a histological image classification method for PTC based on the Inception Residual convolutional neural network (IRCNN) and support vector machine (SVM). First, in order to expand the dataset and solve the problem of histological image color inconsistency, a pre-processing module was constructed that included color transfer and mirror transform. Then, to alleviate overfitting of the deep learning model, we optimized the convolution neural network by combining Inception Network and Residual Network to extract image features. Finally, the SVM was trained via image features extracted by IRCNN to perform the classification task. Experimental results show effectiveness of the proposed method in the classification of PTC histological images.
Keywords
Introduction
Thyroid carcinoma is the most common malignant tumor in endocrine system. It has many subtypes, unquestionably, papillary thyroid carcinoma has the highest incidence accounting for about 85% and showed an increasing trend year by year. The reason for this may be attributed to two aspects. One is related to the increase of actual number of papillary thyroid carcinoma patients. The other is related to the Incorrect classification of some benign proliferative thyroid nodules (such as nodular goiter, adenomatous goiter, adenomatous hyperplasia) as PTC. It is well known that the clinical management and prognosis of benign proliferative thyroid nodules are completely different from the PTC. Therefore, accurate pathological result is the premise and guarantee for patients to get appropriate clinical management. At present, the diagnosis of PTC depends mainly on the microscopic observation of the pathologists who analyze the abnormal image of tissue structure and cell characteristics. However, it has certain subjectivity, and the consistency among the interobservers is relatively low. Furthermore, some benign proliferative thyroid nodules have many similar papillary structure like papillary thyroid carcinoma which could confuse the pathologist and make a wrong or missed diagnosis. It is a hard wok for pathologists to distinguish the subtle differences in sections only by naked eye observation and personal experience [1, 2]. Therefore, it is a challenging task for pathologists to make a diagnosis of PTC rapidly and accurately.
The widespread application of computer-aided diagnosis (CAD) can be traced back to the early 1990 s. Today, CAD has become one of the main research topics in medical image and radiological diagnosis [3, 4]. In the field of pathological diagnosis, CAD could reduce the work of pathologists by screening out obvious malignant areas which allows pathologists to focus on more difficult and suspicious cases [5]. However, in the current CAD system, the diagnostic performance is still not satisfactory for most of the medical image feature extraction are manual extraction. Along with the advent of machine learning, CNN and many other network models derived from it provide a more reliable classification method for histological diagnosis of papillary thyroid carcinoma, especially the application of CNN [6] that is the most ideal among them. CNN can take the image data as input directly, no need to manually carry out extra feature extraction of the image, and achieve high precision classification results by its unique fine-grained feature extraction, therefore, CNN is widely used in the field of medical image diagnosis [7–10]. Wang et al [11] used four pre-trained DCNNs (DenseNet-121 [12], ResNet-50 [13], Inception-3 [14], VGG-16 [15]) to extract histological image features of breast cancer, and used an ensemble support vector machine (E-SVM) classifier instead of the softmax layer in DCNN to complete the classification task to improve classification performance. Lin et al [16] proposed a Taguchi-based convolutional neural network to classify lung nodule images as malignant or benign, and obtained useful information through fewer experiments, which effectively improved the classification accuracy of AlexNet [17]. Daimary et al [18] proposed Res-SegNet, a hybrid of SegNet [19] and ResNet in the study of brain MRI segmentation, the structure of Res-SegNet overcomes the problem of small brain tumor information loss during the down sampling process and achieved ideal performance in the public dataset. However, the study of deep learning on thyroid cancer started late and the diagnostic performance was not satisfactory. The application of deep learning to the histopathological diagnosis of thyroid tumors has two challenge sides: one is that the histological image dataset is relatively small and unbalanced, and the other is that the network is too deep, both of which will cause the deep learning model to overfit the training data easily and affect the final diagnosis performance. Thus, the histopathologic diagnosis of thyroid nodules remains a complicated problem.
This work aims to propose a more accurate and effective histological image classification method based on IRCNN and SVM to further improve the diagnostic performance of papillary thyroid carcinoma. We designed a classification framework using the IRCNN model as a feature extractor to extract features from pre-processed histological images of papillary thyroid carcinoma, and then used an SVM classifier to accomplish the classification task. Our framework demonstrates the feasibility for deep learning-based classification of Papillary thyroid carcinoma, while showing higher accuracy compared to commonly used models.
Materials
The study was conducted in accordance with the Declaration of Helsinki, and was approved by the Qingdao Hospital of Traditional Chinese Medicine Medical Research Ethics Committee. This dataset is a set of high-resolution (1600×1050 pixels) image of papillary thyroid carcinoma. All the papillary thyroid carcinoma were thyroidectomy specimens from Pathology department of Qingdao Hospital of Traditional Chinese Medicine (the time period is from January 2012 to June 2020). All patients signed a formal consent form approved by the ethics board. All specimens were subjected to 4μm continuous sections, hematoxylin-eosin (H&E) staining, and BX60 Olympus (Olympus BX60) microscopy. The magnification of the images was 4×,10×,20×and 40×, respectively. The histological diagnosis of the images was benign hyperplasia and papillary thyroid carcinoma, respectively. Figure 1 illustrates each type of image mentioned in the dataset. This dataset includes 1044 histological images of benign thyroid (BT) and 1728 histological images of papillary thyroid carcinoma (PTC). The number of various images is shown in Table 1.

Various types of images in the dataset.
Number of various types of images
The deep learning model represented by CNN has been widely used in the field of medical image processing. However, the traditional SoftMax classifier in CNN does not require intra-class compactness and inter-class separation, which will cause the model to have insufficient generalization capabilities. In view of this, we propose a method for classification of histological images of papillary thyroid carcinoma based on IRCNN and SVM. Figure 2 shows the overall framework of our proposed classification method, it mainly includes three parts: pre-processing, feature extraction and classification. Firstly, we pre-process the original histological image of papillary thyroid carcinoma using color migration and mirror transformation. Secondly, we use our proposed IRCNN model as a feature extractor to extract image features. Finally, the classification task is completed by training an SVM classifier with better performance.

The histological image classification framework for papillary thyroid carcinoma.
Different chromatic aberration in the histological image of papillary thyroid carcinoma because of different histological specimens and dyeing intensity. Other than that, the papillary thyroid carcinoma histology image dataset is small and difficult to train, all of which has a certain impact on classification. In response to this problem, our proposed pre-processing module consists of two parts. The purpose of the module is to preprocess the input original papillary thyroid carcinoma histological images, which can effectively improve the robustness of the classification method.
Color transfer
In this work, we use the Reinhard algorithm [20] to perform color transfer processing on the original image. The algorithm converts RGB signals to perception-based color space lαβ, and calculates the means and standard deviations for each channel separately in lαβ space, then, determines a set of linear transformation based on the statistical analysis to ensure that target images (image that needs to be color transferred) and source images (color reference image) have the same means and standard deviations in lαβ space, finally, output images (image after color transfer) are obtained by converting the results back to RGB. As shown in Fig. 3.

Color transfer.
As we all know, the histological image dataset is relatively small and unbalanced, which generally leads to overfitting during training, finally, results in low medical image recognition rate and unsatisfactory diagnosis performance. Data augmentation is one of the effective ways to solve this problem.
Mirror transform is a data augmentation technology that can achieve the purpose of obtaining more training data from the existing training data set by transforming the original image pixels in the position space. Therefore, after color transfer of the original dataset, we transform this dataset by horizontal mirroring, vertical mirroring, and vertical-horizontal mirroring to obtain an augmented dataset. The number of images in the augmented dataset is four times that of the original dataset. An example of histological images in the augmented dataset are shown in Fig. 4.

Mirror transform.
Integration module design
With the increase of network depth, it often leads to the excessive number of neural network parameters and overfitting phenomenon. A large amount of computing resources need to be consumed while increasing the network scale and result in a waste of computing resources. On the other hand, when the depth of the network reaches a certain degree, the shallow neurons are prone to the problem of gradient dispersion, which caused the model to fail to converge. What we need to consider is how to solve the problem caused by the deep network.
The integrated convolutional module proposed in this paper is a two-channel network structure, as shown in Fig. 5. The features extracted by the upper network are used as input which will be divided into two channels. One channel is similar to the Inception Network [14], we introduced a convolution filter with a 1×1 convolution kernel to fuse the features between layers which reduced network parameters greatly compared to traditional deep convolutional neural network. Then we added the convolution kernel with a 3×3 convolution filter which enhanced the depth of the network and improved the nonlinear expression ability of the network. Finally, we connected the features of the three branches together. In the other channel, we use the ‘Quick Connection’ structure in the Residual Network [13] to make full use of the network features of each layer to ensure that the difference between benign and malignant tissues can be captured relatively accurate. Due to the existence of this channel, the convolution module can obtain shallow features accurately, which effectively avoided the problems of gradient disappearance and gradient explosion caused by the increase of network layers. We added a 1×1 convolution filter to the Inception Network channel before the output features of the two channels are summed up to ensure that the output dimensions of the two channels are matched.

The structure of the integrated convolution module.
The network structure of the method proposed in this paper is an IRCNN model that combined convolutional layer and two integrated convolution modules (I-Module). The input of the network is the histological image of papillary thyroid carcinoma after the pre-processing. In order to increase the receptive field of IRCNN, three convolutional layers with kernel sizes of 3 and strides of 1 are added. A pooling layer is set up after each layer of convolution, and max pooling is used. Furthermore, we used two integrated modules to increase the depth and width of the network. Through the fully connected layer, finally, features were input in the classifier and the classification results was output. IRCNN network parameters are shown in Table 2.
IRCNN network parameters
IRCNN network parameters
Support vector machine (SVM) is a relatively new learning method which not only derived from statistical learning theory but also based on it [21]. SVM uses structure to minimize the risk of misclassification during classification, while traditional classifiers are usually trained to minimize empirical risk, so SVM-based classifiers have better performance than traditional classifiers.
Our dataset is non-linearly separable in the original features. SVM adopts the strategy of introducing a mapping function, and maps the original feature space into a high-dimensional space through the mapping function, which makes the indivisible data in the original space linearly separable in the high-dimensional space, thus completing classification. As shown in Fig. 6. The kernel function selected by SVM classifier in this paper is Radial Basis Function (RBF), and the penalty factor C is 1 to ensure that the classifier is in a relatively ideal state. We use convolutional neural network to extract features of papillary thyroid carcinoma histological images, and make full use of the advantages of convolutional neural network in image feature extraction. At the same time, the application of SVM avoids the problem of overfitting effectively and improves the generalization ability of the classification model greatly.

Classification process.
Setup and environment
We use papillary thyroid carcinoma dataset described in section 2 to conduct experiments, and trained with the IRCNN architecture using the Adam optimization method for 100 epochs in total. The hardware platform in this work including an Intel Core i3-6100 CPU, an NVIDIA GeFore GT 720 GPU and 4GB RAM, and we used the Tensorflow 1.15 as the development environment.
Evaluation metrics
In this section, the metrics to evaluate our method are introduced. In order to avoid the contingency of the experimental results, in all our experiments, we randomly divided the dataset into three parts, the ratio of training set, validation set and test set is close to 8 : 1:1. The displayed results are the average of multiple experiments. We evaluated the performance of our proposed papillary thyroid carcinoma histological image classification model in our dataset in terms of accuracy (ACC), precision (Pre), Recall, F1 score, and area under receiver operating characteristic (ROC) curve (AUC). The definitions of evaluation metrics are as follows:
In the above formula, TP, TN, FP and FN are the numbers of true positive, true negative, false positive and false negative respectively. M is the number of positive samples, and N is the number of negative samples. ∑rankiis the sum of the serial numbers of the positive samples.
Experiment of pre-processing module
In order to verify the effectiveness of the preprocessing module proposed in this article, we use the two most common convolutional neural network models (LeNet [22] and AlexNet [17]) in deep learning to perform classification experiments on the original data and the pre-processed augmented data respectively. As it was shown in Table 3. LeNet is the earliest proposed CNN model, with a total of seven layers of network structure. AlexNet is an improved version of LeNet, which successfully applied ReLU, Dropout and LRN in CNN structure for the first time. The augmented data processed by the pre-processing module proposed in this work has different degrees of improvement in classification accuracy compared with the unprocessed original data. In particularly, in the magnifications of 20×and 40×, the classification accuracy of histological images has great progress, while it was not so obvious in the magnification of 4×and 10×. It may be due to features in the histological images of papillary thyroid carcinoma. these two neural network models cannot easily extract the papillary structure features in the histological images of papillary thyroid carcinoma. Thus, the larger the magnification, the less effect it has. Hence, the preprocessing module has a significant improvement on histological images with magnifications of 20×and 40×. The accuracy of the two model classification experiments is about 8% higher than the original data respectively, which is sufficient to prove that our method is effective.
Comparison of original dataset and augmented dataset between different models
Comparison of original dataset and augmented dataset between different models
In this section, we evaluate the performance of the SVM classifier and compare the two structures of IRCNN and IRCNN+SVM. In order to increase the persuasiveness of the experimental results, we conducted comparative experiments on the original data and augmented data. In Table 4, we show the accuracy of IRCNN and IRCNN+SVM at different magnifications in the original data and augmented data. It can be seen that compared with IRCNN+SVM, the IRCNN model has a lower accuracy rate in various types of data and a higher rate of misdiagnosis.
Comparison of accuracy between IRCNN and IRCNN+SVM
Comparison of accuracy between IRCNN and IRCNN+SVM
In order to further verify the superiority of the IRCNN+SVM model, we mixed the histological images of papillary thyroid carcinoma with four magnifications prior to classifying them. It can be seen from Table 5, that compared to the 88.99% accuracy rate of IRCNN in the original mixed data, the accuracy rate of the IRCNN+SVM model has increased by about 3.51%, while all the other performance indicators have been improved. From Table 6, it can be seen that the classification accuracy of the IRCNN+SVM model in the augmented mixed data is about 0.91% higher than that of the IRCNN, and so are the other performance indicators. The above results showed that SVM can effectively improve the generalization ability of the model based on the design characteristics of structured risk minimization.
Comparison of performance between IRCNN and IRCNN+SVM in mixed dataset
Comparison between the proposed method and other methods
This part analyzes the performance of the whole histological image classification method for papillary thyroid carcinoma proposed in our work. In order to compare the accuracy of our method and other methods, the experimental results of the first two sections are summarized in Table 6 (Aug: Augmented). Table 6 explicitly showed that the average accuracy of our method in the histological images of papillary thyroid carcinoma with four magnifications is significantly improved than other commonly used models. In 40×images, our proposed method obtains the best classification accuracy of 98.57%. Furthermore, our method can still achieve an accuracy of more than 95% in image classification at magnifications of 4×and 10×. Although it is difficult to extract the papillary structure features in the histological image of papillary thyroid carcinoma when processing histological images with relatively small magnification, the result shows that our model effectively solves this problem encountered in the first section of the experiment. In addition, Table 7 shows the Precision, Recall, F1-score of our method in various images. Figures 7 and 8 show the typical area under the ROC curve and confusion matrices of the proposed method for different magnification factors, respectively. These results explicitly illustrate that the proposed method has certain advantages in the classification of histological images of papillary thyroid carcinoma.
The proposed method performance of each magnification
The proposed method performance of each magnification

ROC curve with AUC of the proposed method for different magnification factors.

Confusion matrices of the proposed method for different magnification factors.
Papillary thyroid carcinoma is the most common malignant tumor of thyroid. The identification and differential diagnosis of papillary structure pose the guarantee for the early detection, early diagnosis and treatment of thyroid tumor, which is of great significance to improve the survival and prognosis of patients. At present, with the rapid development of medical image diagnosis technology, deep learning has been studied in the field of ultrasound imaging, especially in the preoperative diagnosis of thyroid tumors. One study [23] proposed a DCNN-based classification method for ultrasound images of thyroid nodules, which loading a set of pre-trained weights on ImageNet onto the VGG16 model, and replacing the fully connected layer in VGG16 with a global average pooling and Sigmoid layer, and had achieved good results. Another study [24] made use of the characteristics of RPN network and fast RCNNN network to design an ultrasound image recognition scheme for papillary thyroid carcinoma, and adopt multi-scale input and layer connect to extract tumor features better. Through large-scale experiments, the detection model can identify complex tumor features and has good recognition performance. It can be seen from the retrieved literature [25–30] that most of the current studies are aimed at identifying ultrasound images of papillary thyroid carcinoma. Nevertheless, it is well known that histopathological diagnosis is the gold standard for the clinical diagnosis. The diagnostic results of histopathological images, which are an important basis for clinicians to formulate and implement treatment schemes, are more convincing than ultrasound images.
In recent years, researchers are trying to combine deep learning to propose a more reliable classification method to promote the application of deep learning methods in thyroid cancer diagnosis. In previous study [31], the size of the training data was increased by flipping and rotating to achieve the expansion of the dataset, and used the VGG-19 to diagnose thyroid nodules in histopathology images. However, we found that simply increasing the number of datasets still cannot completely solve the overfitting problem, because of the color difference among the tissue specimens and staining intensity, which is certainly an important factor to the classification results. Therefore, in order to overcome the limitations of the previous methods, we proposed a pre-processing module consist of color migration and mirror conversion, which expands the dataset and solves the problem of image color inconsistency. The comparison result in Table 3 shows that our pre-processing module has achieved the promising effect. The experimental results in research [13, 14] show that Inception network and residual network can effectively solve the phenomenon of excessive neural network parameters and overfitting. In view of this, we proposed the network structure of IRCNN in order to solve the problem caused by the network being too deep, and used the SVM classifier to complete the classification task. It can be seen from Table 6 that our classification model achieves an accuracy of more than 95% in various magnifications images, especially in 40×images with an accuracy of 98.57%. Furthermore, the AUC values in figure7 and confusion matrices in figure 8 are very ideal, with the AUC value of more than 0.91 when the proposed method is applied to four different magnifications images. In the histological images with magnification of 10×, which showed the highest rate of misdiagnosis, 2 images of BT were misdiagnosed as papillary thyroid carcinoma and 1 images of PTC were misdiagnosed as benign thyroid. In the histological images with magnification of 20×and 40×, which showed the lowest rate of misdiagnosis, 1 images of BT were misdiagnosed as papillary thyroid carcinoma and no error classifications occurred in the PTC histological images. All the above results demonstrate that the proposed method in this paper can not only achieve high accuracy, but also have strong stability.
Our findings indicate that deep learning does play an active auxiliary role in the diagnosis of histological images of papillary thyroid carcinoma. However, it also has its limitation for our classification model only targets papillary thyroid carcinoma and benign thyroid nodules, but does not include other subtypes of thyroid cancer, such as follicular thyroid carcinoma (FTC), medullary thyroid carcinoma (MTC) and anaplastic thyroid carcinoma (ATC), etc. Although these subtypes of thyroid cancer account for a relatively small proportion, their impact cannot be ignored. Therefore, in future work, we will devote ourselves to developing some deep learning models of multiple subtypes of classification and exploring better recognition methods in the field of histological image classification of thyroid nodules.
Conclusions
In this work, a new deep learning model was proposed to identify histological images of papillary thyroid carcinoma. In terms of data preprocessing, a pre-processing module including color transfer and mirror transform was applied to process dataset. Hence, the accuracy of the classification was improved due to the available experimental data was richer and more balanced. In terms of network structure, we designed the IRCNN network structure that combines the strength of Inception Network and Residual Network to extract image features. As a result, the excessive number of neural network parameters and overfitting caused by the deep network were solved. Moreover, we trained the SVM classifier by histological image features to improve the classification stability. The experimental results showed that the proposed method in this paper achieves considerable high accuracy and can be used as an effective decision support tool for papillary thyroid carcinoma diagnosis in a clinical environment.
Footnotes
Acknowledgments
This work was supported by the Key R & D Projects of Shandong Province (Grant No. 2019JMRH0109).
