TDS-U-Net: Automatic liver and tumor separate segmentation of CT volumes using attention gates 1

Abstract

Considering the heterogeneity, diffusive shape, and complex background of tumors, automatic segmentation of hepatic lesions in computed tomography (CT) images has been considered a challenging task. The performance of existing methods remains subject to segmentation uncertainties, especially in tumor boundary regions. The pixel information in these regions will be affected by both sides, thereby exposing the segmentation results to missing marks. To this end, a new network architecture named Two Direction Segmentation U-Net (TDS-U-Net) is hereby designed based on the classic Attention U-Net to tackle this problem. As the most important blocks of the Attention U-Net network, attention gates (AGs) focus on the target structures of different shapes and sizes. In the last layer of TDS-U-Net, two dichotomous convolutional networks are applied to obtain the segmentation maps of the liver and the tumor respectively. Superimposing two segmented maps to obtain the final image addresses the above problems. The entire structure has been verified on two widely accepted public CT datasets, LiTS17 and KiTS19. Compared with the state of the art, this method exhibits superior performance and excellent shape extractions with high detection sensitivity, perfectly demonstrating its effectiveness in medical image segmentation.

Keywords

Attention gates CT deep learning liver tumor segmentation kidney tumor segmentation U-Net

1 Introduction

Liver cancer, known as one of the most common cancers with a high mortality rate in the modern world, has caused a serious threat to the health and lives of contemporary humans. Generally, computed tomography (CT) is applied to tumor diagnosis and treatment. Manual liver lesion segmentation on CT is time-consuming, difficult to reproduce, and easy to be influenced by personal subjective experiences. In this case, a fully automatic method for liver lesion segmentation should be urgently developed. Although the liver segmenting task has achieved favorable results due to CNN, localization of liver tumors remains a demanding task to be further improved. One of the main issues in the segmentation task lies in the low contrast between liver and tumor tissues, high variation in size, shape, and number of lesions, and the similarity of nearby organs. Besides, the imaging noise makes it difficult to segment the lesions, thus making automatic liver lesion segmentation a challenging task.

To address the above-mentioned issues, numerous methods based on the deep learning technology have been proposed for the liver lesion segmentation task in recent years, which can be generally divided into two categories:

1) 2D models, e.g., Cascaded-FCN [1] train and cascade two FCNs for a combined segmentation of the liver and its lesions, FCN based on VGG-16 [2] achieved semantic level segmentation of the image by combining the deep abstract feature and shallow feature in depth by building the duplicate channel;

2) 3D models, e.g., densely connected volumetric ConvNets [3] proposed a novel 3D convolutional neural network with densely-connected layers to automatically segment the prostate from Magnetic Resonance(MR) images, which consisted of a 2D DenseUNet for efficiently extracting intra-slice features and a 3D counterpart for hierarchically aggregating volumetric contexts (H-DenseUNet) [4], a 3D patch-based fully dense and fully convolutional network (FD-FCN) [5]. The previous methods have exhibited much-improved performance, but still have considerable potential for further optimization. For example, previous 2D networks fail to fully utilize the depth information of the CT images, and the segmentation accuracy is quite low. Meanwhile, 3D networks are computationally expensive and have high requirements for hardware configurations. Under the same computing resources, large amounts of parameters and heavy computation jointly limit the depth and complexity of the 3D networks.

The traditional U-Net based method has achieved great success in medical image classification, but is still subject to certain limitations: firstly, the feature map of U-Net convolution lacks a targeted refining process of feature information; secondly, most CT files in the dataset present tumors, but the pixel area of the tumor is much smaller than that of the liver, thereby leading to a smaller number of tumor categories than liver categories. This results in a category imbalance that could easily lead to errors in small liver regions, discontinuous liver regions, and blurred liver boundaries; thirdly, the process of gradual down-sampling of the U-Net network may reduce the resolution of the image, thus decreasing the segmentation accuracy; lastly, the deepening of the network depth can easily induce degradation problems.

Considering the aforementioned facts, a Two Direction Segmentation U-Net (TDS-U-Net) based 2D model was hereby proposed for image segmentation, and the main contributions of this work are listed as follows:

1. A novel feature fusion method is proposed, which has four feature fusion channels to fuse the four-layer features of the decoder based on the original encoder-decoder layer, and thus obtain the low-level features;

2. The transition layer and the final output layer of the decoder are replaced by Atrous Spatial Pyramid Pooling (ASPP) to extract richer multi-scale feature information;

3. One new method is proposed to separately segment the liver and tumor.

2 Related work

In 2017, Christ et al. [1] trained two cascaded U-Net type FCNs: the first FCN segmented a liver from the rest of the inner body tissues, while the second FCN segmented lesions from the ROIs (regions-of-interest) predicated from the first FCN. Indeed, there is a remarkable overlap between liver and lesion intensities, which leads to low overall contrast. Paper [1] proved that the intensity of liver and lesion significantly overlap in the data set, thus resulting in the low overall contrast and segmenting difficulties. To successively segment the liver and the tumor within the liver region, two segmentation networks should be designed.

In 2018, Li et al. [4] developed a hybrid densely connected UNet (H-DenseUNet) that worked in an end-to-end manner, which consisted of a 2D DenseUNet and a 3D counterpart. In H-DenseUNet, the intra-slice representations and inter-slice features could be optimized jointly through a hybrid feature fusion (HFF) layer for accurate liver and lesion segmentation. It was trained on the MICCAI 2017 Liver Tumor Segmentation (LiTS) dataset [6], validated on the 3DIRCADb dataset, and achieved a liver DICE = 98.2% and a tumor DICE = 93.7%. Notably, exclusive experiments were also conducted on the 3DIRCADb dataset through cross-validation, and achieved a liver segmentation DICE = 94.7% and a tumor segmentation DICE = 65%.

Siriapisith et al. [7] adopted the concept of variable neighborhood and proposed a 2D segmentation method that could be searched by iteratively alternating the search through intensity and gradient spaces. They claimed that they had attained the segmentation performance with a DSC of 84.48±5.84% and 76.93±8.24% for large and small liver tumors, respectively.

Zettler et al. [8] presented a study mainly focusing on comparing 2D U-Nets and 3D U-Net counterparts, and the initial results indicated a Dice improvement of about 6% at most. Unexpectedly, liver and kidney problems, for instance, were significantly better tackled by applying the faster and GPU-memory saving 2D U-Nets. Besides, there were no remarkable differences in other abdominal key organs, but 2D U-Net was observed to have a remarkable advantage in GPU computation for all the studied organs. To conclude, the improved network based on 2D U-net is endowed with more advantages.

Xi et al. [9] proposed an integrated Yolov5-Deeplabv3 + real-time segmentation network (YDRSNet) for gear pitting measurement. This very two-stage network was constructed by using Yolov5 and an improved Deeplabv3+, and could be applied to processing the video samples in real time and overcoming the problem of sample imbalance.

Qin et al. [10] proposed an efficient architecture by distilling knowledge from well-trained medical image segmentation networks to train another lightweight network. A novel distillation module tailored for medical image segmentation was further devised to transfer semantic region information from the teacher to the student network. The lightweight network was improved up to 32.6% in the present experiment while maintaining its portability in the inference phase. The entire structure was verified on two widely accepted public CT datasets of LiTS17 and KiTS19.

Generally, the attention gates (AGs) are commonly applied to the language processing (NLP) for image captioning [11]. In the image analysis, trainable attention is categorized into hard attention and soft attention via feature map sampling. Soft attention is parameterized and learns weights depending on the relationship between features. Hence, it is derivable and could be embedded into the model for direct training. Gradients could be propagated back to other parts of the model via the attention mechanism module. In [12], additive soft attention is applied to sentence-to-sentence translation, while channel-wise attention is applied to channel highlighting in [13]. In [14], the author proposed SENet, the top-performer in the ILSVRC 2017 image classification challenge, which could be regarded as the attention mechanism of the channel dimension.

Besides, hard attention mainly conducts the crop random process in local feature areas, and samples a portion of the hidden state of the inputs based on the probability rather than that of the entire encoder. To realize the backpropagation of gradients, the monte Carlo sampling method was applied to the estimation of the gradient of modules.

3 Methodology

The hereby proposed TDS-U-Net network consists of an encoder responsible for feature extraction and a decoder for feature location, which constitute the symmetric structure (Fig. 2).

The whole architecture consists of 8 residual blocks, 4 pooling layers, 4 attention gates(AGs), 1 atrous spatial pyramid pooling (ASPP) block, 4 upsampling blocks, 1 deep network composite blocks, and 2 classification convolution blocks. The convolution kernel size is 3×3, the pooling layer size is 2×2, and that of the input image is 512×512×1. After a series of operations like convolution, feature extraction, and pooling, two 512×512 binary segmentation images are obtained for each feature image, i.e., the segmentation image of the liver and that of the tumor.

Fig. 1

Uneven edge segmentation at the intersection of the liver and tumor segmentation.

Fig. 2

The encoder part increases the number of filters by 1×1 convolution, performs 3×3 convolution, restores the number of filters by 1×1 convolution, and reduces the feature size by max-pooling. The decoder part combines the upsampling with the 3x3 convolution.

In the encoder convolution unit, ordinary units are replaced by block structures, and batch normalization and ReLU activation operations are performed after each convolution. The introduction of batch normalization not only reduces the sensitivity of the model to initialization parameters, but also imposes a regularization effect to a certain extent. In the ReLU function, its ability to effectively avert gradient disappearance problems makes it extensively applied to the activation. To better obtain information from the feature mapping of the encoder, AGs are added to the decoder to adaptively extract image features.

Thus, the network could focus on specific segmentation tasks by virtue of the channel attention mechanism (see Fig. 2). The specific operations of the AGs module are as follows. During the upsampling process of the decoder, even if the features of the fusion encoder cannot make the compensation, some features will be cut by the maximum pooling module. The extraction module is dense convolution, i.e., a combination of convolution and upsampling that aims to change the channel and size of features. The features of the four directions are combined with concatenation, and the final feature size is 512×512×1.

The network could be formulated as follows: $xi = σ_{0} (W_{o}^{T} g_{i} + b_{o}), (i = 1, 2, 3, 4)$ $σ_{0} (x_{i}) = \max (0, x_{i}) x = concat (x 1, x 2, x 3, x 4)$ $y 1 = σ_{1} (W_{i}^{T} x + b_{i})$ $y 2 = σ_{1} (W_{j}^{T} x + b_{j})$ $σ_{1} (x_{i}, c) = \frac{\exp (x_{i}, c)}{\sum_{i} \exp (x_{i}, c)}$ where xi (i = 1, 2, 3, 4) denotes the output of the four convolution layers in the upsampling channel; x, a combination with the concatenation of the outputs of these four layers (xi); y1, the segmentation of the liver; y2, the segmentation of the tumor; and y1 and y2, the segmentation results of the liver and tumor, respectively, whereas y1 contains the area of y2. The final results could be obtained by preprocessing the two segmentation graphs, as presented in Fig. 2.

Fig. 3

The structure of Convolution Block, Dense Convolution, and Upsampling Convolution.

In Fig. 2, after concatenation, the two feature maps are the segmentation of the liver and tumor. In both images, there are only two colors of pixels, the white is the result of the segmentation, and the black is the background. Take the white pixel position in the tumor segmentation image and set it to gray in the liver segmentation image to obtain the final image. Gray is the tumor, white is the liver, and black is the background.

Furthermore, to alleviate the problem of resolution degradation caused by multiple down-sampling, ASPP [15] is utilized as the transition layer of the network, as presented in Fig. 4. The ASPP module could capture the contextual information of the image at multiple scales, which promotes the inclusion of multi-scale semantic information in the extracted feature map. Similarly, the ASPP module is introduced to the output of the decoder part, thereby increasing the accuracy together with the transition layer.

Fig. 4

The structure of ASPP.

In [16], the author presented a new Dual Gradient-Color U-Net (DGCU–Net) model that contains two encoders and a single decoder paths. Additionally, the atrous spatial pyramid pooling (ASPP) block is utilized in the bridge stage between the encoder and decoder sub-networks. The proposed network architecture significantly improves the skin lesion segmentation results of the baseline U-Net model, and the ASPP module has a significant effect in the field of image segmentation.

3.1 Attention Gates

A novel attentional gates (AGs) model was proposed for medical imaging in the paper [17], which automatically learns to focus on target structures of different shapes and sizes. The location information of the model features its shallowness, and the model can learn the characteristics of implicit learning with the AG training model in the inhibition of the input image not related to the area, while highlighting remarkable characteristics used for specific tasks. Similar low-level features will not keep repeating all models in the cascade extraction, thus reducing computational resources and model parameters. The AG automatic learning focuses on the target structure without additional supervision.

The attention coefficients, $a_{i}^{l} \in$ [0, 1], identify salient image regions and prune feature responses to preserve only activations relevant to the specific task, as presented in Fig. 5. The output of AGs is the element-wise multiplication of input feature-maps and attention coefficients, i.e., ${\hat{x}}_{i}^{l} = x_{i}^{l} \cdot a_{i}^{l}$ . In a default setting, a single scalar attention value is computed for each pixel vector $x_{i}^{l} \in R^{Fl}$ , where R^Fl corresponds to the number of feature-maps in Layer l. In case of multiple semantic classes, it was proposed to learn multi-dimensional attention coefficients. Thus, each AG is recommended to learn to focus on a subset of target structures. As presented in Fig. 5, a gating vector g_i ∈ R^Fg is applied to each pixel i to determine the focus regions. The gating vector contains contextual information adopted to prune lower-level feature responses. Herein, additive attention was used to obtain the gating coefficient, which is formulated as follows:

Fig. 5

The structure of AGs.

$q_{att}^{l} = W_{φ}^{T} (σ_{1} (W_{x}^{T} x_{i}^{l} + W_{g}^{T} g_{i} + b_{g})) + b_{φ}$ $a_{i}^{l} = σ_{2} (q_{att}^{l} (x_{i}^{l}, g_{i}; θ_{att}))$ ${\hat{x}}_{i}^{l} = x_{i}^{l} \cdot a_{i}^{l}$ $σ_{1} (x_{i}) = \max (0, x_{i})$ $σ_{2} (x_{i}, c) = \frac{1}{1 + \exp (- x_{i}, c)}$

AG is characterized by a set of parameters θ_att including linear transformations Wx ∈ R^Fl*Fint, Wg ∈ R^Fg*Fint, Wg ∈ R^Fint*1, and bias terms b_φ ∈ R and b_g ∈ R^Fint. The linear transformations were computed by applying channel-wise 1×1×1 convolutions to the input tensors, and a sigmoid activation function was adopted. The experimental results indicate better training convergence for the AG parameters.

3.2 Loss functions

In [18], all models were found to be subject to a tendency of missing small targets when trained with BCE or Focal Loss. Moreover, the developed losses handle not the imbalance between lesion sizes but that between classes. Herein, a simple methodology was proposed to reweight loss functions in the way that all targets contribute equally, that is, small targets have greater weights. The Universal Loss and Dice Loss formulas are as follows: $DiceLoss = 1 - \frac{2 \sum_{i} p_{i} y_{i}}{\sum_{i} (p_{i}^{2} + y_{i}^{2})}$ $Wi = \frac{\sum_{k = 0}^{K} | L_{k} |}{(K + 1) \cdot | L_{j} |}$ $UniversalLoss = 1 - \frac{2 \sum_{i} {Wip}_{i} y_{i}}{\sum_{i} (p_{i}^{2} + y_{i}^{2})}$

At the training stage, a tensor of weights was formed for each incoming patch by splitting the corresponding ground-truth patch into K + 1 connected componentsL₀,..., L_k, among which,L₀ denotes the non-lesion component (background), and K represents the number of lesions in the current patch. Afterward, a weight inversely proportional to the volume of the component was assigned to each component.

In the formula, Wi denotes the weight assigned to each voxel inside the corresponding component L_j. The constant in the denominator ensures that the sum of the weights equals that of the unit tensors of the same size (see derivation details in Supplementary Materials). This method was named inverse weighting (iW).

3.3 Dataset information

The publicly available dataset of the MICCAI 2017 LiTS [6] Challenge was applied to the training of the hereby proposed method. The LiTS dataset contains 131 and 70 contrast-enhanced 3D abdominal CT scans for training and testing, respectively. This method was tested on the 70 LiTS challenge test cases. The dataset was acquired from numerous different clinical sites with different scanners and protocols. Each slice in all CT scans has a fixed size of 512×512 pixels with the number of slices per scan varying between 42 and 1026, whereas the image resolutions differ from scan to scan.

LiTS Liver is an open-source dataset initially developed for liver lesion segmentation of CT scans, and a total of 131 CT images were hereby split into three sets of 70/30/31.

The publicly accessible KiTS19 [19] dataset embraces 210 intact abdominal CT scans labeled with manual segmentation masks of kidney and kidney tumor. There are no pre-operative arterial phase data, and the slice thickness is from 1 mm to 5 mm. The image resolution ranges from 0.4 mm to 1.0 mm in axial, and the longitudinal fields of view range from 20 to 140. The volume of most tumors varies between 9.6 cm³ to 109.7 cm³. Organizers emphasize that every patient selected into this dataset has one or more kidney tumors.

4 Experiments and results

4.1 Results

Figure 6 presents the effect picture of the present network in the LiTS dataset. The proposed model exhibits a relatively high robustness while dealing with difficult segmentation problems, e.g., small liver regions, discontinuous liver areas, and blurred liver boundaries.

Fig. 6

The remarkable effect of the proposed network segmentation, which separates the liver from the tumor and then fuses it. (a) Data; (b) Label; (c) The liver cut from the network; (d) The tumor cut from the network; (e) Results.

Table 1 depicts the quantitative comparison results of the four models. It can be observed from Table 1 that the hereby proposed model is provided with significant advantages in the comparisons on three evaluation metrics, especially for the Dice. This very method obtains the highest accuracy of 97.21±0.55. Compared with SAR-U-Net [20] and Attention U-Net [17], the Dice of the proposed model has been greatly improved.

Table 1

The results of the proposed networks on LiTS17

Approach	DICE[% ]	Type
H-DenseUNet	96.2±1.1	Liver
	93.7±2.2	Tumor
Attention U-Net	94.7±0.5	Liver
	–	Tumor
SAR-U-Net	95.7±0.5	Liver
	–	Tumor
TDS-U-Net(our)	97.2 ± 0.5	Liver
	95.1±1.5	Tumor

Figure 7 presents the effect picture of this network in the KiTS19 dataset. Due to the shape characteristics of the liver organ, the label of the liver data is rather complete, and the pixel areas of the liver are connected together. The kidney is shaped like a bow. There is always a label with a space in the middle that does not belong to any organ. It is difficult for the network to segment such a shape, and the real area is not of this shape, which belongs to the label error information.

Fig. 7

(a) Data; (b) Label; (c) The kidney cut from the network; (d) The tumor cut from the network; (e) Results.

Table 2 shows the quantitative comparison results of the three models, where it can be observed that in the KiTS19 dataset, tumor segmentation is not as favorable as kidney segmentation regardless of the network. This is determined by the characteristics of the dataset. The labeling accuracy of the label in the KiTS19 is lower than that in the LiTS17.

Table 2

The results of this networks in the KiTS19 dataset

Approach	DICE[% ]	Type
UNet++	94.3±0.2	Kidney
	64.4±0.7	Tumor
ENet+PSPNet	96.7±0.1	Kidney
	59.9±0.9	Tumor
TDS-U-Net(our)	94.2±0.5	Kidney
	67.1 ± 0.2	Tumor

5 Discussion and conclusions

This paper presents a new TDS-U-Net network for CT liver segmentation. Based on the U-Net framework, an attention mechanism was added, and the branch network was applied to the final output step to separate the liver and the tumor and then fuse them. First, the AGS-based attention mechanism was adopted to adaptively extract image features and suppress irrelevant areas. Hence, the network focused on relevant features of the liver segmentation task. Afterward, ASPP technology was utilized as the transition layer and output layer of the network to realize multi-scale extraction of feature images. Compared with related classic methods, this method exhibits superior performance on the quantitative metrics. Specifically, the proposed method presents a remarkable improvement and robustness in handling small liver regions, discontinuous liver regions, and blurred liver boundaries.

References

Christ

P.F.

, Ettlinger

, Grün

et al.,Automatic liver and tumor segmentation of CT and MRI volumes using cascaded fully convolutional neural networks[J], arXiv preprint arXiv:1702.05970, 2017.

Z.W.

, Yang

, Lou

T.T.

et al., Extraction of pig contour based on fully convolutional networks[J], Journal of South China Agricultural University 39(6) (2018), 111–119.

Zhu

, Du

, Wu

et al., Adeep learning health data analysis approach: automatic3DprostateMRsegmentation with densely-connected volumetric ConvNets[C]//2018 International Joint Conference on Neural Networks (IJCNN), IEEE (2018), 1–6.

, Chen

, Qi

et al. H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes[J], IEEE Transactions on Medical Imaging 37(12) (2018), 2663–2674.

Yang

, Zhang

Fd-fcn: 3d fully dense and fully convolutional network for semantic segmentation of brain anatomy[J], arXiv preprint arXiv:1907.09194, 2019.

Bilic

, Christ

P.F.

, Vorontsov

et al. The liver tumor segmentation benchmark (lits)[J], arXiv preprint arXiv:1901.04056, 2019.

Siriapisith

, Kusakunniran

, Haddawy

A General Approach to Segmentation in CT Grayscale Images using Variable Neighborhood Search[C]//2018 Digital Image Computing: Techniques and Applications (DICTA), IEEE (2018), 1–7.

Zettler

, Mastmeyer

Comparison of 2D vs. 3D UNet Organ Segmentation in abdominal 3D CT images[J], arXiv preprint arXiv:2107.04062, 2021.

, Qin

, Wang

YDRSNet: An integrated Yolov5-Deeplabv3+ real-time segmentation network for gear pitting measurement[J], Journal of Intelligent Manufacturing (2021), 1–15.

10.

Qin

, Bu

J.J.

, Liu

et al. Efficient medical image segmentation based on knowledge distillation[J], IEEE Transactions on Medical Imaging 40(12) (2021), 3820–3831.

11.

Anderson

, He

, Buehler

et al. Bottom-up and top down attention for image captioning and visual question answering[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (2018), 6077–6086.

12.

Bapna

, Chen

M.X.

, Firat

et al. Training deeper neural machine translation models with transparent attention[J], arXiv preprint arXiv:1808.07561, 2018.

13.

Chen

, Zhang

, Xiao

et al. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (2017), 5659–667.

14.

, Shen

, Sun

Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (2018), 7132–7141.

15.

Chen

L.C.

, Papandreou

, Kokkinos

et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J], IEEE transactions on pattern analysis and machine intelligence 40(4) (2017), 834–848.

16.

Ramadan

, Aly

DGCU–Net: A new dual gradient-color deep convolutional neural network for efficient skin lesion segmentation[J], Biomedical Signal Processing and Control 77 (2022), 103829.

17.

Oktay

, Schlemper

, Folgoc

L.L.

et al., Attention u-net: Learning where to look for the pancreas[J], arXiv preprint arXiv:1804.03999, 2018.

18.

Shirokikh

, Shevtsov

, Kurmukov

et al. Universal loss reweighting to balance lesion size inequality in 3d medical image segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham (2020), 523–532.

19.

Heller

, Sathianathen

, Kalapara

et al., The kits19 challenge data: 300 kidney tumor cases with clinical context, ct semantic segmentations, and surgical outcomes[J], arXiv preprint arXiv:1904.00445, 2019.

20.

Zhou

, Siddiquee

M.M.R.

, Tajbakhsh

et al. Unet++: Redesigning skip connections to exploit multiscale features in image segmentation[J], IEEE Transactions on Medical Imaging 39(6) (2019), 1856–1867.

21.

Esteva

, Kuprel

, Novoa

R.A.

et al. Dermatologist-level classification of skin cancer with deep neural networks[J], Nature 542(7639) (2017), 115–118.

22.

Schlemper

, Oktay

, Chen

et al. Attention-gated networks for improving ultrasound scan plane detection[J], arXiv preprint arXiv:1804.05338, 2018.

23.

Zhu

, Du

, Turkbey

et al. Exploiting interslice correlation for MRI prostate image segmentation, from recursive neural networks aspect[J], Complexity, (2018), 2018.

24.

Wang

, Lv

, Wang

et al. SAR-U-Net: Squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver segmentation in Computed Tomography[J], Computer Methods and Programs in Biomedicine 208 (2021), 106268.

25.

Ronneberger

, Fischer

, Brox

U-net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, (2015), 234–241.

26.

Christ

P.F.

, Elshaer

MEA

, Ettlinger

et al. Automatic liver and lesion segmentation in CT using cascaded fully 514 convolutional neural networks and 3D conditional random fields[C]//International conference on medical image computing and computer-assisted intervention. Springer, Cham, (2016), 415–423.

27.

Chen

L.C.

, Yang

, Wang

et al. Attention to scale: Scale aware semantic image segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (2016), 3640–3649.

28.

Huang

, Liu

, Van Der Maaten

et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition (2017), 4700–4708.

29.

Chen

L.C.

, Zhu

, Papandreou

et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European conference on computer vision (ECCV) (2018), 801–818.

30.

Noori

, Bahri

, Mohammadi

Attentionguided version of 2D UNet for automatic brain tumor segmentation[C]//2019 9th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE (2019), 269–275.

31.

Zhang

, Peng

et al. Multi-phase Liver Tumor Segmentation with Spatial Aggregation and Uncertain Region Inpainting[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, (2021), 68𡀓77.