Abstract
BACKGROUND:
Dividing liver organs or lesions depicting on computed tomography (CT) images could be applied to help tumor staging and treatment. However, most existing image segmentation technologies use manual or semi-automatic analysis, making the analysis process costly and time-consuming.
OBJECTIVE:
This research aims to develop and apply a deep learning network architecture to segment liver tumors automatically after fine tuning parameters.
METHODS AND MATERIALS:
The medical imaging is obtained from the International Symposium on Biomedical Imaging (ISBI), which includes 3D abdominal CT scans of 131 patients diagnosed with liver tumors. From these CT scans, there are 7,190 2D CT images along with the labeled binary images. The labeled binary images are regarded as gold standard for evaluation of the segmented results by FCN (Fully Convolutional Network). The backbones of FCN are extracted from Xception, InceptionresNetv2, MobileNetv2, ResNet18, ResNet50 in this study. Meanwhile, the parameters including optimizers (SGDM and ADAM), size of epoch, and size of batch are investigated. CT images are randomly divided into training and testing sets using a ratio of 9:1. Several evaluation indices including Global Accuracy, Mean Accuracy, Mean IoU (Intersection over Union), Weighted IoU and Mean BF Score are applied to evaluate tumor segmentation results in the testing images.
RESULTS:
The Global Accuracy, Mean Accuracy, Mean IoU, Weighted IoU, and Mean BF Scores are 0.999, 0.969, 0.954, 0.998, 0.962 using ResNet50 in FCN with optimizer SGDM, batch size 12, and epoch 9. It is important to fine tuning the parameters in FCN model. Top 20 FNC models enable to achieve higher tumor segmentation accuracy with Mean IoU over 0.900. The occurred frequency of InceptionresNetv2, MobileNetv2, ResNet18, ResNet50, and Xception are 9, 6, 3, 5, and 2 times. Therefore, the InceptionresNetv2 has higher performance than others.
CONCLUSIONS:
This study develop and test an automated liver tumor segmentation model based on FCN. Study results demonstrate that many deep learning models including InceptionresNetv2, MobileNetv2, ResNet18, ResNet50, and Xception have high potential to segment liver tumors from CT images with accuracy exceeding 90%. However, it is still difficult to accurately segment tiny and small size tumors by FCN models.
Introduction
Clinical diagnostic tools for studying liver tumors include sonography, magnetic resonance imaging, single-photon emission computed tomography, and computer tomography (CT). CT imaging is fast, high-resolution, and commonly used to diagnose various diseases. Hence, it is an essential imaging tool to examine liver tumors.
Image segmentation in the field of computer vision refers to the process of selecting or marking digital images based on features or object types. Image segmentation encircles the edges of different organs and lesions or marks them with specific colors. Image segmentation technology in medical imaging is mainly used in tissue volume measurement, diagnosis, and customized treatment plans. It takes much time to segment organs or tumors semi-automatically through images in the early stage [1–4]. However, with the advancement of technology, deep learning algorithms can automatically segment or identify many different objects from an image. The traditional main algorithms for image segmentation are cluster analysis, histogram, region growing, and level set method. With the rapid development of artificial intelligence, deep learning algorithms to automatically segment different objects in images have gradually matured. It has been applied in various research fields. In 2014, Jonathan Long et al. published the use of Fully Convolutional Networks (FCN) to perform Semantic Segmentation on color images of 2D natural objects [5]. Before this, semantic segmentation was often identified through feature definitions. The content and location in the image. Olaf Ronneberger et al. used the U-Net network architecture to study biomedical image segmentation and gradually began applying deep learning in image segmentation [6–8]. This research aims to train a segmentation model that can effectively segment the liver and liver tumors based on the computerized tomographic images of the liver through deep learning.
Nowadays, liver disease is one of the most common diseases among Chinese people, especially liver cancer, the second leading cause of cancer death. Image segmentation is an essential technology in medical imaging. At first, tumors or lesions need to be divided for staging and slice positioning in diagnosis; in radiotherapy, lesions and organs must be divided first. The treatment plan has been completed; after the treatment is completed, the operation Tracking requires image segmentation technology to evaluate the efficacy of the lesion after surgery. Segmentation technology is closely related to medical imaging, and the main segmentation method is manual or semi-automatic, which usually consumes much time. In order to reduce the time of image segmentation and improve the accuracy of segmentation, it is expected to improve the efficiency and accuracy of segmenting the liver and its tumors through deep learning.
Fully Convolution Networks (FCN) were developed to perform segmentation of image by classifying each pixel to a specific group. Each pixel needs to take the surrounding pixels as a patch for operation [9]. In 2015, Olaf Ronnebbrger et al. proposed that U-Net was a neural network architecture based on fully convolutional network architecture, suitable for medical imaging applications. U-Net is an encoder-decoder structure [7, 10]. The encoder extracts feature to reduce the spatial dimension, and the decoder is responsible for restoring the details and spatial dimensions. U-Net’s decoder and encoder are connected to repair the details more accurately. In 2017, Han designed a 2.5 D deep convolutional network [11]. Its 2.5D network architecture means that a set of adjacent slices are used as input (3D), and then a cut image of the intermediate slice image (2D) is output. The network architecture is modified based on the U-Net architecture. ResNet is added the residual connections to modify the network architecture [12]. Hence, the parameters in the external network can be effective training in terms of image pre-processing. It was successful that liver tumor segmentation is for the International Biomedical Imaging Symposium competition, with the highest score of 0.670 for liver tumor Dice Score (DSC) which was similar to the index of IoU.
Furthermore, it is important to use efficient network architecture in order to improve the performance of CNNs [13]. In order to know the performance of segmentation by using different famous backbones of convolutional neural networks, the primary purpose of this research is to use neural network architectures such as inception, mobilenetv2, resnet18, resnet50, and xception to train segmented models for abdominal computer tomography images of liver tumors. It is expected that at least one model could be reasonable and feasible to segment the boundary of liver tumors.
Materials and methods
The data
The data used in this research was obtained from the International Biomedical Imaging Symposium [14, 15]. A total of 131 patients with 3D liver CT images is partly shown in Fig. 1(A) as a two-dimensional image. The images were scanned with different imaging protocols on a CT device. Therefore, the image size and resolutions are not the same. The number of slices of 3D CT for patients ranges from 42 to 1026. The tumors in the liver were selected for performing the segmentation. In addition to 3D computed tomography images of the abdomen, each patient also has images marked by a professional radiologist. In order to obtain feasible visual contrast, the radiologist is usually adjusting the intensity levels for the diagnostic purpose, as shown in Fig. 1(B). The marked images were binary format. The liver tumors were marked as a white block, as shown in Fig. 1(C).
The liver tumor is hepatocellular carcinoma (HCC). The axial resolution and thickness of the image are 0.56 mm to 1.0 mm and 0.45 mm to 6.0 mm, respectively. A total of 7,190 2D liver images were used in this study. The image size was 512×512 with format NIFTI. The minimum and maximum counts of tumors in one image are zero to twenty-five. The 90.7% of tumor counts is about no more than five. The over six tumors in one image are about 9.3% (Table 1).

The original CT images (A), adjusting intensity level (B), and binary masks for liver tumors (C). In these 2D images, there are shown 22 tumors with boundaries.
The numbers of tumor were counted from 7,190 2D images
All the 2D CT liver images are none enhancement (i.e., without injection imaging agent.). A physician diagnosed the types of tumor and confirmed by pathological test. Figure 2 shows raw CT 2D liver axial image (A, D), binary labeling tumors (B, E), and fusion image (C, F). The contrast of tumors in images is not significantly differentiated from normal liver tissues (Fig. 2A, D). Therefore, the diagnostic image is poor by human eyes. However, without injection agent means acceptable by most of the people who might be compliant by imaging agent. The used images were not in DICOM (Digital Image Communication on Medicine) format. Hence, the range of gray levels is limited between 0 and 255 by transformed as PNG gray level with 8 bits per pixels (i.e., gray level range between 0 and 255). The contrast, brightness, sharpening, and details are lost or absent on the PNG image. The boundary of raw images was not clear and sharpening. That is why AI (Artificial Intelligent) with deep learning approaches are adopted in this study to segment the tumors on PNG images. Meanwhile, understanding the segmented performance among AI methods is quite important.

The 2D liver images (A, C), binary labeling images (B, E), and fusion images (C, F). A physician diagnosed the types of tumor and confirmed by pathology. (A–C) and (D–F) were two different tumors.
In order to prevent the over-fitting problem caused by insufficient data, it is necessary to increase the data set through data enhancement. The method used is to rotate, shift, zoom in and out, and mirror the training set images during each training period (Epoch) to improve the accuracy of models. The original image is then set through the above four parameter values (i.e., size of epoch, size of batch, ratio of training set to all images, and optimizers), and each time a new Epoch is entered, the system automatically performs a random image geometric transformation to increase the number of data sets. The flowchart of this study is shown in Fig. 3.

The flowchart of this research.
This study uses Xception [16], InceptionresNetv2 [17], MobileNetv2 [18], ResNet18 [19], and ResNet50 [19] as the backbone of FCN for image segmentation models (Table 2). The architecture of Xception is 36 convolutional layers to form extraction of features. The 36 convolutional layers are organized into 14 modules with linear residual connections. Meanwhile, the performance of xception is not increasing capacity of parameters but rather to prompt the efficiency of parameters in model.
The brief properties of investigated FCN models
The brief properties of investigated FCN models
The architecture of inceptionresnetv2 is 824 layers and 206 convolutional layers. The main property is the block designed in model. The inceptionresnetv2 is computational cost and fast training than the other CNNs. Meanwhile, inceptionresnetv2 could be significantly improved the performance of recognition problem.
Resnet18 has 71 layers, including one image input layer (224×224×3), followed by a two-dimensional convolution operation with 64 7×7 filters, horizontal and vertical steps of 2 pixels, complementing the zero mechanism designed for 3 pixels. There are 20 batch normalization layers, with the number of channels ranging from 64 to 512 (Channel); one maximum pooling layer and one average pooling layer; additional layers (Addition). There are eight groups in total. Each group includes two linear rectification layers, a two-dimensional convolution and batch normalization layer (Batch Normalization); a fully connected layer; a softmax layer; a classification layer, and an output layer. The main advantages of resnet18 were large depth of layers that was easily training without increasing the error. Resnets overcome the issues of vanishing gradients by using identity mapping.
Resnet50 has a total of 177 layers, including one image input layer (224×224×3), followed by a two-dimensional convolution operation with a 64-channel 7×7 filter size, horizontal and vertical steps of 2 pixels. The zero-padding mechanism is designed for 3 pixels; there are 52 batch normalization layers in total, and the number of passes is 64, 128, 256, 512, 1024, and 2048 (Channel); one maximum pooling layer and one averaging Pooling layer. There are 16 additional layers (Addition). Each group includes three linear rectification layers, a two-dimensional convolution and batch normalization layer (Batch Normalization); a fully connected layer; a softmax layer; a classification layer, and an output layer. The merit of resnet50 was ensured to improve accuracy and less training time by made of 3-layer bottleneck blocks simultaneously.
Mobilenetv2 has a total of 154 layers, including one image input layer (224×224×3), two-dimensional convolution with 32 3×3 filters with horizontal and vertical steps 2 pixels, 32 batch normalization, a linear rectified layer, and a two-dimensional grouped convolutional layer (2D Grouped Convolutional Layer or also known as Depth-Wise Convolution), batch normalization for every 32 samples. It can speed up the convolutional neural network training and reduce the sensitivity to network initialization; then, use the block design (there are 16 blocks). Each block has two linear rectification layers, three two-dimensional convolution layers, three batch standardization layers, an average pooling layer, a fully connected layer, a softmax layer, a classification layer, and an output layer. The prone of mobilenetv2 could achieve competitive accuracy with a few parameters and less computational complexity.
Stochastic gradient decent momentum (SGDM) and adaptive momentum (ADAM) are the two commonly used in FCN. Adam optimizer is equivalent to an optimizer that combines momentum optimization and RMSProp [20–22]. It can be applied to the situation where the gradient is sparse and the demand for memory is small. T represents the updated weights. m
T
and ν
T
represent the first-moment vector estimate and second-moment vector estimate of the gradients (Equations 1, 2). β1 and β2 are the setting parameters, and the recommended settings are 0.999. g
T
is the gradient.
In this study, the investigated parameters are listed in Table 3. The investigated parameters include two ratios for the training model (70% and 80%), two optimizers (SGDM and ADAM), five FCN models, five design batch sizes (3, 6, 9, 12, 15), and three design epoch size (3, 6, 9). The input image was re-sizing as 300×300 with the learning rate is 0.0001. Total 300 combinations of investigated parameters (i.e., 2×2×5×5×3) for FCN models are investigated in this study. Setting different parameters for the training model could find suitable FCN more efficiently.
The investigated parameters of FCN models
Because the training ratio, batch size, and epoch size were affecting the accuracy of the training model. A total of eighteen combinations (i.e., 2×3×3) of investigated parameters were applied to five FCN models. Each FCN model has executed eighteen combinations of investigated parameters. The global accuracy, mean accuracy, mean IoU (Intersection over Union), weighted IoU, and mean BF (Boundary F1) score were applied to evaluate the segmented performance among FCN with all combinations of parameters. Meanwhile, the optimal FCN model with parameters was evaluated using the mean BF Score and mean IoU value.
In Fig. 4, a diagram for computing matric performance with segmentation of one image by using FCN. Let I be an input image. Let T and G be a target and predicted area with blue and red squares. Let T c and G c be the outside of target and outside of predicted area. Hence, true positive (TP) area is defined as T ∩ G. False positive (FP) area is defined as T ∩ G c . False negative (FN) area is defined as T C ∩ G. The definitions of accuracy, IoU, and BF score are as Equations (6, 7).

A diagram for computing matric performance with segmentation using FCN.
Meanwhile, the global accuracy was computed is the ratio of correctly classified pixels, regardless of class, to the total number of pixels. Using the global accuracy metric could attain a quick and computationally inexpensive estimation of an accurate percentage of classified pixels. The mean IoU was computed for the aggregated data set that is the averaging IoU scores of all classes for all images. The weighted IoU was calculated the averaged IoU of each class which was weighted by the number of pixels in each class. The weighted IoU have considered the numbers of pixel in class in order to reduce the estimated errors caused by less pixels in class. Furthermore, the analysis of variance (ANOVA) was applied to test the statistical significance among invested FCN and parameters.
According to the validated results, the CV of the index, Global Accuracy, Mean Accuracy, Mean IoU, Weighted IoU, and Mean BFScore generated by mobilenetv2 are the highest among the other FCN in this study (Table 4). The mobilenetv2 is the middle size of layers (157) and minimum size of parameters (13 MB) among the investigated FCN. Both Mean Accuracy and Mean IoU provided by mobilenetv2 and Xception are over 0.8. The index of performance, both Mean Accuracy and Mean IoU are the best among all FCN. The mobilenetv2 is the potential FCN for segmentation of CT images usefully in this study. The Global Accuracy, Mean Accuracy, Mean IoU, and Weighted IoU were maximum among all FCNs generated by mobilenetv2 (P < 0.05). The coefficient of variation (CV) was defined as Mean/Std. The lower CV, the greater the level of dispersion around the mean. All of the performance index generated by mobilenetv2 were highest than the others in this study.
The segmented performance among used FCN models
The segmented performance among used FCN models
*means a significant difference among all of FCNs and the mean value is the maximum by ANOVA testing. The coefficient of variation (CV) was defined as Mean/Std. The lower CV, the greater the level of dispersion around the mean.
The investigated parameters include batch and epoch sizes for used FCNs. In order to rank the performance of used FCN for segmentation of CT tumors, the standard deviation (Std) among Global Accuracy, Mean Accuracy, Mean IoU, Weighted IoU, and Mean BFScore is applied to evaluate the FCNs. The high performance of segmentation with FCNs for CT images is defined when Std is smaller than 0.04. Meanwhile, a radar plot is used to visualize the performance of selected FCNs. Also, the frequency of selected FCNs is usually to investigate the strength of segmented potential. Table 5 shows the selected FCNs under Std smaller than 0.04, where Fun is the optimizer, FCN is the investigated models, G.Accuracy is the global accuracy, M.Accuracy is the mean accuracy, M.IoU is the mean IoU, W.IoU is the weighted IoU, and M.BFScore is the mean BF score.
The selected top-20 FCNs according to the value of mean IoU. G.Accuracy is global accuracy. M.Accuracy is mean accuracy. M.IoU is mean IoU. W.IoU is weighted IoU. M.BFScore is mean BF score. STD is standard deviation
A total of twenty selected FNCs are optimizer SGDM, batch size 3 to 12, and epoch size 3 to 9 under Std no more than 0.04. The frequency of inceptionresnetv2, resnet50, mobilenetv2, resnet18, and Xception are 6, 5, 4, 4, and 1 time. Hence, the radar plot has displayed the performance of selected FCNs (Fig. 5). In this section, the inceptionresnetv2 and resnet50 are the high potential FCNs for segmented tumors with CT images in this study.

The relationship between true counts and estimated counts of pixel of tumors was demonstration by scatterplot and linear regression.
The presented methods are compared with related works in Table 6. The most published works were developed the segmented methods with deep learning on the open dataset, including SYSU-CT, subCT, MICCAI, 3Dircadb, LITS, and ISBI. The samples size was counted by patient bases (i.e., the patient numbers). In previous studies, Gao et al. [13], Kim et al. [23], Luan et al. [24], He et al. [25], Zhang et al. [26], Kushnure et al. [27], Sahli et al. [28], Ayalew et al. [29], Jin et al. [30], and Chen et al. [31] had reported the developed methods for CT liver tumors segmentation. The dice scores (DSC) and IoU have relation as description by Eelbode et al. [32]. The DSC ranged from 0.3 to 0.96, and IoUs ranged from 0.59 to 0.87 on average. The presented results are similar to those of related published works.
The presented result was comparisons among published works. The relationship between DSC and IoU was demonstrated in Tom et al. 2020 for estimation of missing DSC or IoU in the Table
The presented result was comparisons among published works. The relationship between DSC and IoU was demonstrated in Tom et al. 2020 for estimation of missing DSC or IoU in the Table
The segmented performance of the presented FCN methods is related to the numbers of tumors in the CT image (Table 7). The global accuracy with 1–5 counts of tumor is higher than those of over five counts of the tumor. The mean accuracy with over five counts of tumor is higher than those of 1–5 counts of the tumor. The weighted IoU and mean BF score with 1–5 counts of tumor are higher than those of over five counts of the tumor. The Dice score with over five counts of tumor is higher than those of 1–5 counts of the tumor. The dice score is evidence to prove that the presented FCN is reliable to segment multiple tumors in a single CT image.
The segmented performance was compared with different counts of tumor
Meanwhile, the relationship between true counts and estimated counts of tumors was demonstrated in Fig. 5. Total of 131 patients were included in this study. The true counts of tumors with ranged from 31 to 700000 voxels for the 131 3D liver CT images. In the Fig. 5, the true counts of tumors are smaller than 31 voxels, then it could not be easily segmented by AI methods in this study. The positive relation was demonstration between true and estimated sizes of tumor by a regression model. The linear regression equation was shown the slope term 1.6759 and R2 0.8894 (P < 0.05). The estimated numbers of pixels for tumors were closed to the golden of actual counts in this study.
In the Fig. 6, the true counts of tumors are smaller than 31 pixels, then it could not segment FCN models. Therefore, the tiny size of tumors could be easily segmented by AI methods in this study. However, the tiny tumor with size 31 pixels cross 5 slices (white spots in the bottom row) could not segment by the presented FCM models.

The case was a tiny tumor in liver in this study (white spots in the bottom row). The volume size was 31 voxels and could not segment by FCN models.
The top-20 selected FNC models were the candidates to be segmented approaches under Mean Accuracy no less than 0.944. The occurred frequency of inceptionresnetv2, mobilenetv2, resnet18, resnet50, and Xception were 9, 6, 2, 5, and 3 times. Therefore, the InceptionresNetv2 has better performance than the others. Inceptionresnetv2, mobilenetv2, resnet18, resnet50, and Xception can potentially perform tumor segmentation on CT liver imaging with accuracy exceeding 90%.
This study provides an automated segmented model of liver tumors based on FCN (or U-Net). Through experimentation, several points can be found that can be improved in the future. The first point can be found that image standardization is essential and difficult in medical imaging or in the field of deep learning. Especially in medical imaging, there are individual differences between different patients, instruments, and imaging parameters. Correct standardization images could improve the accuracy of neural networks. Secondly, small tumors are not easy to be cut out, and it is challenging to learn the characteristics of small tumors, and specific structures or parameters are required to improve the probability of segmentation of small tumors. The third point is that it is difficult to obtain medical images and labeled images, which also results in limited training samples. This study only segmented 2D liver tumors. In the future, we could consider segmenting for different organs or use a 3D network architecture to segment 3D images and directly obtain 3D segmentation maps.
In this study, the FCN was trained using a full range of intensity in images without truncation. The FCN has approved a useful approach to segment tumors of the CT liver. The similarity of tumors between each other is almost likely and resemblance. To this end, the tiny spots of tumors almost cannot segment out by FCN in this study. Therefore, the probability of successful segmented spots depends on the tumor’s pixel size with the input image.
Meanwhile, the source of images comes from an open dataset in PNG format. Also, the contrast of images is different. The more heterogeneity images are used in FCN; the more difficult it is to obtain a helpful model. The original format of medical images is DICOM. The performance of segmentation is still to study on DICOM format images in the near future.
Author Contributions: Conceptualization, T.-B.C., C.-I.C., and A.M.; methodology, C.-I.C.; software, T.-B.C., N.-H.L. and S.-Y.H.; validation, K.-Y.L., T.-B.C. and N.-H.L.; formal analysis, C.-I.C. and Y.-H.H.; investigation, Y.-M.W. and Y.-H.H.; data curation, N.-H.L.; writing—original draft preparation, C.-I.C.; writing—review and editing, T.-B.C. and Y.-M.W.; project administration, T.-B.C.; funding acquisition, T.-B.C. All authors have read and agreed to the published version of the manuscript
Funding
This research received no external funding.
Institutional review board statement
Not applicable.
Informed consent statement
Not applicable.
Data availability statement
The liver CT was acquired from “LiTS - Liver Tumor Segmentation Challenge” at URL: https://competitions.codalab.org/competitions/17094.
Footnotes
Acknowledgments
The authors would like to thank the Ministry of Science and Technology in Taiwan, for partial financially supporting this research under Contract MOST110-2118-M-214-001. The authors would like to acknowledge AJE for the English editorial assistance.
Conflicts of interest
The authors declare no conflict of interest.
