Comparison of medical image classification accuracy among three machine learning methods

Abstract

BACKGROUND:

Low-quality medical images may influence the accuracy of the machine learning process.

OBJECTIVE:

This study was undertaken to compare accuracy of medical image classification among machine learning methods, as classification is a basic aspect of clinical image inspection.

METHODS:

Three types of machine learning methods were used, which include Support Vector Machine (SVM), Artificial Neural Network (ANN), and Convolution Neural Network (CNN). To investigate changes in accuracy related to image quality, we constructed a single dataset using two different file formats of DICOM (Digital Imaging and Communications in Medicine) and JPEG (Joint Photographic Experts Group).

RESULTS:

The JPEG format contains less color information and data capacity than the DICOM format. CNN classification was accurate for both datasets, whereas SVM and ANN accuracy decreased with the loss of data from DICOM to JPEG formats.

CONCLUSIONS:

CNN is more accurate than conventional machine learning methods that utilize the manual feature extraction.

Keywords

Deep learning CNN DICOM JPEG

1 Introduction

Machine learning refers to the various computer algorithms that use experience to learn regularity and laws. For example, support vector machine (SVM) and artificial neural network (ANN) are conventional machine learning methods that can be used for image classification by inputting features. In addition, convolutional neural network (CNN), which is useful for image recognition, has recently gained considerable interest. Specifically, CNN is a deep learning method that exhibits very high accuracy for image classification, based on repeated convolution and pooling to extract features. Research and development of these machine learning methods are currently underway for medical applications [1, 2].

Medical images are essential for diagnosis. Indeed, we export medical images to the picture archiving and communication system (PACS) after confirming that the images are in the order required by doctors and provide a satisfactory diagnosis. This “image inspection” is quite important, but is sometimes insufficient because of a shortage or overburdening of staff. However, this problem can be solved by introducing machine learning into the system and automating image inspection.

Medical images differ from general color images in that medical images exhibit many gradations, very high definition, and large capacity, all of which increase the computation time required for machine learning within image detection, and pose a challenge to widespread adoption of automated image inspection. Calculation time can be shortened by reducing image size and number of gradations, but these lower-quality images may influence the accuracy of the machine learning process. In a previous study, SVM, ANN, and CNN were used to compare the accuracy of classifying features from input signals for a brain-computer interface [3]. In addition, using CNN, prior studies have demonstrated the use of CNN to classify organs (e.g., spinal cord, mandible, or parotid gland) from CT images [4] and cerebral cortex from MR images [5]. However, no prior studies have demonstrated the adaptation of CNN to image classification by modality, compared with machine learning. Therefore, we evaluated the accuracy of three kinds of machine learning (SVM, ANN, CNN) in classification of computed tomography (CT) images, magnetic resonance (MR) images, and X-ray images, using Digital Imaging and Communications in Medicine (DICOM) (the original medical image format) and Joint Photographic Experts Group (JPEG) (a general image format).

2 Methods and materials

2.1 Overview

Our dataset consisted of 240 CT, MRI, and X-ray images. DICOM images were acquired from open databases, then used to generate JPEG images; these image sets were respectively designated as DICOM and JPEG datasets. Six features (median, entropy, area, contrast, energy, and homogeneity) were extracted from the images for use in SVM and ANN. CNN utilized AlexNet [3]. A total of 168 images were used for learning and the remaining 72 images were used for evaluation.

The software used in this study was MATLAB R2017a (MathWorks, Massachusetts, USA). The GPU was a Quadro M 3000 M 4 GB (NVIDIA, California, USA). Because calculation time depends on GPU performance, we measured the time required for the GPU used in this study. The authors declare that they have no conflict of interest in this article.

2.2 Dataset

A total of 240 DICOM images were acquired from open databases. Chest CT images (80 images) were obtained from the Lung Image Database Consortium (LIDC); Brain MR images (80 images) were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI); and Chest X-ray images (80 images) were obtained from the Japanese Society of Radiological Technology (JSRT). The numbers of gray levels in DICOM images were: CT, 13 bits; MR, 12 bits; and X-ray, 12 bits. Sizes of the images were: CT, 512×512; MR, 256×256; and X-ray, 2048×2048. The pixel sizes of the images were: CT, 0.75 mm; MR, 1.0 mm; and X-ray, 0.175 mm. All images were resized to 227×227 with the bicubic method, which is the weighted average of the nearest 4×4 pixels. The DICOM dataset was then converted to JPEG format, to generate the JPEG dataset. The number of gray levels in all JPEG images were 8 bits. JPEG pictures use a compression algorithm when converting pictures to this format; thus, the compression rate of the JPEG data set was specified as 100%. Figure 1 shows examples of the dataset. The data amount of DICOM images was approximately 100 KB, while that of JPEG images was approximately 8 KB.

Fig.1

Examples of medical images from open databases: chest computed tomography (CT) images (Lung Image Database Consortium), brain magnetic resonance (MR) images (Alzheimer’s Disease Neuroimaging Initiative), and chest X-ray images (Japanese Society of Radiological Technology).

2.3 Machine learning

2.3.1 SVM and ANN

SVM is a nonlinear pattern classifier that incorporates techniques to maximize the margin; a primary technique is the use of a hyperplane, known as soft margins, which separates more data points. ANN is a classifier model used to express neurons and their neural circuits in mathematical terms. In this study, a three-layered ANN was used, consisting of one input layer (6 input units), one hidden layer (10 hidden units), and one output layer (3 output units); this structure is simple, but a learning algorithm, known as back propagation (BP) method, can be used. The values of the control parameters of the BP method were as follows: maximum epochs: 1000, minimum gradient: 1 e-06, sigma: 5 e-05, lambda: 5 e-07.

SVM and ANN need to input feature quantities of images for identification. Therefore, commonly used features (median, entropy, area, contrast, energy, and homogeneity) were calculated as below— these are presumed to show features of the image: $Median = {\begin{matrix} x (\frac{n + 1}{2}) (n : odd number) \\ \frac{x (\frac{n}{2}) + x (\frac{n + 1}{2})}{2} (n : even number) \end{matrix}$ (1)

$Entropy = - sum (p . * \log 2 (p))$ (2) $Contrast = \sum_{i, j} {| i - j |}^{2} p (i, j)$ (3) $Energy = \sum_{i, j} p {(i, j)}^{2}$ (4) $Homogeneity = \sum_{i, j} \frac{p (i, j)}{1 + | i - j |}$ (5)

where p is the number of histograms and p(i, j) is the matrix element. Including the “area,” which is the total area obtained by binarizing the image, a total of six features were used.

2.3.2 CNN

CNN is an effective classifier based on deep network learning [4], which is highly capable of automatically learning appropriate features from input data by optimizing the weight parameters of each filter, using forward and backward propagation to minimize classification errors [4]. We focused on transfer learning [5, 6], using a pre-trained AlexNet. AlexNet has five convolutional layers, three pooling layers, and two fully connected layers [7]. Parameters in the convolutional and fully connected layers are fixed and used as deep image extractors [7]. The fully connected layer was replaced by an SVM that was learned again with the dataset (Fig. 2). Classification was performed using an SVM, based on pre-trained AlexNet features and a learning batch size of 32.

Fig.2

The configuration diagram of AlexNet. We used transfer learning, which is a combination of AlexNet and support vector machine (SVM) classifiers. The fully connected layer was replaced by an SVM classifier.

2.4 Learning and evaluation

All images were labeled as “CT,” “MRI,” or “X-ray.” Three classifiers were learned, using 56 of 80 images from each imaging modality, for a total of 168 images. We evaluated the accuracy of the classifier with the remaining 24 images from each imaging modality, for a total of 72 images, and compared accuracy between DICOM and JPEG datasets. Based on the numerical output from each classifier and the label associated with the image, the classification was judged to be correct or incorrect. All results were obtained by consistent labeling of input and output images.

3 Results

The computational times of processing one DICOM image were 4, 5 and 10 seconds when using SVM, Ann and CNN, respectively. In comparison, the computational times of processing one JPEG image were reduced to 3, 3 and 7 seconds when using SVM, ANN and CNN, respectively. Classification accuracy, using the DICOM dataset, was 100% for CNN, SVM, and ANN (Table 1); while using the JPEG dataset, accuracy was 100% for CNN, 94.4% for SVM, and 88.9% for ANN (Table 2), respectively. Figures 3 and 4 show example images classified erroneously in the JPEG dataset.

Table 1
Accuracy ratios for the Digital Imaging and Communications in Medicine (DICOM) dataset

SVM ANN CNN

CT 100 100 100

MRI 100 100 100

X-ray image 100 100 100

Accuracy ratio (total) 100 100 100

	SVM	ANN	CNN
CT	100	100	100
MRI	100	100	100
X-ray image	100	100	100
Accuracy ratio (total)	100	100	100

Classification accuracy using the DICOM dataset was 100% for both convolutional neural network (CNN), support vector machine (SVM), and artificial neural network (ANN).

Table 2

Accuracy ratios for the Joint Photographic Experts Group (JPEG) dataset

	SVM	ANN	CNN
CT	87.5	70.8	100
MRI	95.8	95.8	100
X-ray image	100	100	100
Accuracy ratio (total)	100	100	100

Classification accuracy using the JPEG dataset was 100% for convolutional neural network (CNN), 94.4% for support vector machine (SVM), and 88.9% for artificial neural network (ANN).

Fig.3

Support vector machine (SVM) classification errors of Joint Photographic Experts Group (JPEG) images: three computed tomography (CT) images were classified as X-ray images, and one magnetic resonance (MR) image was classified as a CT image.

Fig.4

Artificial neural network (ANN) classification errors of Joint Photographic Experts Group (JPEG) images: six computed tomography (CT) images were classified as X-ray images, and one image was classified as a magnetic resonance (MR) image. One MR image was classified as a CT image.

Table 3 shows SVM classification results, using the JPEG dataset. Twenty-one of 24 test CT images were correct classified, whereas three images were erroneously classified as X-ray images. Further, 23 of 24 test MR images were correctly classified, and one image was erroneously classified as a CT image.

Table 3

Results of support vector machine (SVM) classification of the Joint Photographic Experts Group (JPEG) dataset

	CT	MRI	X-ray image
CT	21	1	0
MRI	0	23	0
X-ray image	3	0	24

Table 4 shows ANN classification results, using the JPEG dataset. Seventeen of 24 test CT images were correctly classified, whereas six images and one image were erroneously classified as X-ray images and an MR image, respectively. Twenty-three of 24 test MR images were correctly classified, and one image was erroneously classified as a CT image. All X-ray images were classified correctly by both SVM and ANN. In addition, Tables 5 and 6 show the feature quantities for SVM and ANN, which are the basis for classification results of SVM and ANN.

Table 4

Results of artificial neural network (ANN) classification of the Joint Photographic Experts Group (JPEG) dataset

	CT	MRI	X-ray image
CT	17	1	0
MRI	1	23	0
X-ray image	6	0	24

Table 5

Averages of six feature quantities of the Digital Imaging and Communications in Medicine (DICOM) dataset

	Median	Entropy	Area	Contrast	Energy	Homogeneity
CT	326.1	0.759	51245	0.383	0.654	0.993
MRI	390.6	0.878	45082	0.383	0.575	0.993
X-ray image	3364.5	3.1*10	65535	0.002	1.000	1.000

4 Discussion

Investigating image quality to the performance of machine learning methods has been attracting research interest recently [9, 10]. In this study, we found that classification accuracy was 100% for all three types of machine learning when using the DICOM dataset, whereas accuracy was reduced for SVM and ANN when using the JPEG dataset (Tables 1 and 2). For classifying classes with conventional machine learning methods (SVM and ANN), optimal feature quantities are needed to provide the most accurate classification. Differences between median values of CT and MRI images are small in the JPEG dataset, and large in the DICOM dataset (Tables 5 and 6). Therefore, the median is the feature that contributes greatly to accuracy of CT and MRI classification. Since median and entropy of X-ray images have values which are significantly different from those of the other imaging modalities, these are regarded as accurate classifications. JPEG utilizes an image compression process that is accompanied by deterioration— part of the image data is deleted and resulting image quality is degraded. For example, a DICOM CT image that is resized to 227×227 occupies approximately 98 KB of data; however, when converted to JPEG format, the same image occupies approximately 8 KB. Notably, JPEG compression reduces color information and omits fine gradations within images, which naturally affect pixel values and features within images. For example, there was a remarkable decrease in the classification accuracy of CT images in two different imaging locations (lung field and mediastinal) (Tables 3 and 4). In contrast, original DICOM images contain sufficient information, such that, using only six features, accuracy was consistent across CNN, SVM, and ANN methods.

Table 6
Averages of six feature quantities of the Joint Photographic Experts Group (JPEG) dataset

Median Entropy Area Contrast Energy Homogeneity

CT 59.96 5.500 23098 0.211 5.409 0.945

MRI 60.04 5.958 30192 0.214 6.119 0.916

X-ray image 200.1 7.046 49609 0.076 7.046 0.966

	Median	Entropy	Area	Contrast	Energy	Homogeneity
CT	59.96	5.500	23098	0.211	5.409	0.945
MRI	60.04	5.958	30192	0.214	6.119	0.916
X-ray image	200.1	7.046	49609	0.076	7.046	0.966

When using general image file formats, decreases in accuracy of CNN have been reported, which are largely due to blurring and noise; the influence of distortion from the use of JPEG and JPEG 2000 file formats is small [8]. As the network was pre-educated with a large number of general images, it was able to maintain high accuracy, even when used to classify medical images that were resized and converted to JPEG format. The extended processing time necessitated by CNN classification of large capacity images is a great challenge; our current research suggests that good accuracy can be obtained with lowered data capacity, and that images can be checked— to some extent— at the Preview image stage (the format for general image, not DICOM), which is displayed on the inspection terminal.

An increased number of samples is need in a future study, as learning performance can be affected by the size of the training set; this is especially true for ANN and CNN, where a larger-sized training set usually provides higher classification accuracy [8]. Importantly, CNN is able to attain higher accuracy with smaller numbers of samples, compared with ANN [8]. In addition, this study used images of the same body position and same body parts; thus, it is necessary to consider whether accuracy can be maintained when body positions and parts change, as would be expected in the course of clinical practice. Some previous studies have reported that CNN is more accurate in medical image classification than SVM and ANN, which are both conventional machine learning methods [11, 12]. Whereas SVM and ANN require manual feature extraction and parameter selection, these tasks are incorporated within the CNN algorithm. Moreover, transfer learning enables a significant time reduction. Medical image file size is much larger than general image file size; however, high precision was maintained when using CNN classification, even after conversion of DICOM to JPEG for shortened processing time. In this study, CNN was revealed to be a suitable machine learning method for discrimination and classification of medical images, consistent with previous studies.

5 Conclusion

We applied three machine learning methods of SVM, ANN, and CNN for classification of CT, MR, and X-ray images, and compared accuracy of these methods using DICOM and JPEG datasets. Using the DICOM dataset, accuracy was high for all methods, whereas when using the JPEG dataset, CNN exhibited greater accuracy than conventional SVM and ANN methods. It is a great advantage of CNN demonstrates a greater ability to accurately identify medical images with less data capacity, in a shorter time span; thus, we find that it is effective to incorporate CNN into the computer-aided medical image processing and inspection systems.

References

Xu ,

Feng ,

Mi , Deep Convolutional Neural Network-Based Early Automated Detection of Diabetic Retinopathy Using Fundus Image, Molecules 22(12) 2017.

E.Y.

Kim ,

V.A.

Magnotta ,

Liu and

H.J.

Johnson , Stable Atlas-based Mapped Prior (STAMP) machine-learning segmentation for multicenter large-scale MRI data, Magn Reson Imaging 32(7) (2014), 832–844.

Trakoolwilaiwan ,

Behboodi ,

Lee ,

Kim and

J.W.

Choi , Convolutional neural network for high-accuracy functional near-infrared spectroscopy in a brain-computer interface: Three-class classification of rest, right-, and left-hand motor execution, Neurophotonics 5(1) (2018), 011008.

Ibragimov and

Xing , Segmentation of organs-at-risks in head and neck CT images using convolutional neural networks, Med Phys 44(2) (2017), 547–557.

Ceschin ,

Zahner ,

Reynolds ,

Gaesser ,

Zuccoli ,

C.W.

Lo , et al., A computational framework for the detection of subcortical brain dysmaturation in neonatal MRI using 3D Convolutional Neural Networks, Neuroimage 178 (2018), 183–197.

Krizhevsky ,

Sutskever ,

G.E.

Hinton , ImageNet Classification with Deep Convolutional Neural Networks, Proc Adv Neural Inf Process Syst 1(1) (2012), 1106–1114.

Bar ,

Diamant ,

Wolf and

Greenspan , Chest pathology detection using deep learning with non-medical training IEEE International Symposium on Biomedical Imaging (ISBI), (2015), 12.

Tajbakhsh ,

J.Y.

Shin ,

S.R.

Gurudu ,

R.T.

Hurst ,

C.B.

Kendall ,

M.B.

Gotway et al., Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? IEEE Trans Med Imaging 35(5) (2016), 1299–1312.

Dodge and

Karam , Understanding How Image Quality Affects Deep Neural Networks. Quality of Multimedia Experience (QoMEX), 2016 Eighth International Conference on (2016), 6.

10.

Wang ,

Zhou ,

Li ,

Chen ,

Lu ,

Wang et al., Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from (18)F-FDG PET/CT images, EJNMMI Research 7(1) (2017), 11.

11.