A lightweight segmentation method based on residual U-Net for MR images

Abstract

Automatic segmentation of Magnetic Resonance Imaging (MRI), which bases on Residual U-Net (ResU-Net), helps radiologists to quickly assess the condition. However, the ResU-Net structure requires a large number of parameters and storage model space. It is not convenient to apply to mobile MRI device. To solve this problem, Depthwise Separable Convolution and Squeeze-and-Excitation Residual U-Networks (DSRU-Net) is proposed to segment MRI. Squeeze-and-Excitation method is a channel attention mechanism. The proposed method is conducive to simplify ResU-Net model, making ResU-Net more convenient to be applied to mobile MRI device. The fuzzy comprehensive evaluation method, which includes three evaluation factors are that the required parameters of the model, the value of Dice Similarity Coefficient (DSC), and the value of Hausdorff Distance (HD), is used to evaluate the test results of the proposed method on the MICCAI 2012 Prostate MR Image Segmentation (PROMISE12) challenge dataset and Automatic Cardiac Diagnosis Challenge (ACDC) dataset. The fuzzy comprehensive evaluation values obtained by the proposed method in 5 PROMISE12 samples and 15 ACDC samples are 0.9889 and 0.9652, respectively. Combining the average results of the two datasets, the proposed method has the best effect in balancing the accuracy of segmentation and the amount of model parameters.

Keywords

Depthwise separable convolution channel attention mechanism residual U-Net MRI segmentation

1 Introduction

Cancer is a disease that endangers man’s health [1], so it is particularly important to help radiologists to accurately determine the condition of the disease. Radiologists use ultrasound imaging and magnetic resonance imaging to evaluate cancer. Magnetic Resonance Imaging (MRI) has a higher accuracy rate in soft tissue imaging than ultrasound imaging and Computer Tomography (CT) methods [2, 3]. However, the high price of MRI equipment is not conducive to use widely, which makes it impossible for patients to detect and diagnose early. Recently, the U.S. Food and Drug Administration (FDA) approved the world’s first mobile MRI equipment. Mobile MRI is less expensive to help doctors diagnose faster and to find the disease earlier [4].

Before a professional radiologist diagnoses a patient’s disease, the doctor first needs to accurately locate the specific location of the tissue before further analyzing and processing the tissue area [5 –8]. Therefore, segmentation of the tissue area is an important step for the diagnosis of disease. However, manual segmentation of MRI images not only requires time and effort, but also relies heavily on experience. And the conclusions of manual segmentation made by different doctors are different. The researchers have proposed many methods to automatically segment medical images [9, 10]. In 2015, Long et al. [11] proposed Fully Convolutional Networks (FCN) to successfully apply neural networks to the field of image segmentation. Since then, researchers have extensively studied medical image segmentation methods based on deep learning. Olaf et al. [12] proposed the U-Net structure for cell image segmentation. Zhang et al. [13] applied convolutional neural networks (CNN) to infant brain tissue segmentation. Milletari et al. [14] proposed the V-Net structure based on U-Net and successfully applied it to MRI segmentation. Yu et al. [15] proposed a three-dimensional neural network architecture with mixed residual connections for MRI segmentation. The architecture uses 3D spatial context information to perform efficient, accurate and volume-to-volume predictions. And the architecture uses residual connections to improve the training efficiency and discrimination capabilities of the network. The U-Net of the residual structure can provide the accuracy of segmentation, but the amount of parameters also increases. It is not conducive to the application of mobile miniaturized MRI device.

In this paper, Depthwise Separable Convolution and Squeeze-and-Excitation Residual U-Networks (DSRU-Net) is proposed to tackle the above-mentioned challenges. The contributions of this paper can be summarized as follows.

Firstly, the Depthwise Separable Convolution is used to replace the standard CNN in the original U-Net model; Secondly, After the residual structure combines the deep features and shallow features, the channel attention is used to redistribute the weights of these features, thereby improving the segmentation effect; Thirdly, DSRU-Net greatly reduces the number of parameters than Residual U-net, and keeps the segmentation effect from decreasing.

The performance of the proposed model in the paper is evaluated using the dataset of the Prostate Image Segmentation Challenge hosted by the International Conference on Medical Imaging Computing and Computer Aided Intervention 2012 (MICCAI2012) and the dataset of Automatic Cardiac Diagnosis Challenge (ACDC).

This paper is structured as follows. Section 2 introduces the related work of lightweight networks. Section 3 presents the methods employed in this work. Section 4 describes the materials and experimental results and discusses the results obtained. Section 5 concludes this paper.

2 Related work

To simplify the neural network structure, Iandola et al. [23] proposed SqueezeNet to be applied to image classification. Badrinarayanan et al. [26] proposed SegNet, which uses a small network structure and jump connection method, and has been successfully applied to natural image segmentation. Zhang et al. [24] proposed ShuffleNet, which uses two new operations, pointwise group convolution and channel shuffle, to greatly reduces the computational cost while maintaining accuracy. Yu et al. [27] proposed BiSeNet, which includes two paths: Spatial Path and Context Path. The design of the spatial path is to preserve the spatial information from the original image. The Context Path uses a lightweight model and a global average pool to quickly obtain a considerable receptive field. Chollet proposed to replace the traditional convolution with depthwise separable convolution (DWSC) [16]. Howard et al. [22] proposed MobileNets, which based on a streamlined architecture that uses DWSC to build light weight deep neural network. Chen et al. [19] proposed a spatial pyramid pooling module based on DWSC and applied it to semantic image segmentation. Qi et al. [20] proposed the X-Net based on DWSC, which designed feature similarity module to capture long-range dependencies for better brain stroke lesion Segmentation. Wang et al. [21] proposed ADSCNet based on DWSC, which is a lightweight neural network for real-time semantic segmentation. The DWSC is divided into deep convolution part and pointwise convolution part. Using DWSC instead of standard CNN can reduce the number of parameters in the network, but the accuracy of the network segmentation image will be reduced. Differently, our proposed model keeps the segmentation effect while reducing the amount of parameters.

3 Methods

ResU-Net uses CNN, which requires a lot of parameters, it is not conducive applied to mobile MRI device. DSRU-Net is proposed to tackle the challenge. DSRU-Net is based on the U-Net as shown in Fig. 1. In this chapter, the framework of DSRU-Net is introduced firstly, and then introduce the depthwise separable convolution and channel attention mechanism of the components of DSRU-Net in detail.

Fig. 1

The architecture of the proposed DSU-Net. L represents the depth of encoder and decoder.

3.1 DSRU-Net

The DSRU-Net structure is mainly divided into three parts, which are down-sampling, up-sampling, and jump connection. Firstly, the network is divided into left and right parts for analysis. On the left is the down-sampling process, namely encoder. The encoder is implemented through DWSC, SENet and pooling. The image size is reduced after the encoder, and the encoder extracts some shallow features. The right part is the process of upsampling, namely decoder. The decoder is implemented through DWSC, SENet and deconvolution. The size of image is increased after the decoder, while some deep features are obtained through convolution and upsampling. In the middle, the concat is used to combine the feature map obtained in the encoding stage with the feature map obtained in the decoding stage. The purpose is to combine deep and shallow features to refine the image.

SENet is placed at the tail of the residual connection, which can fuse deep features and shallow features, and redistribute weights to feature maps. Deep convolution and point convolution use “same” as filling method to ensure that the image size remains the same after passing through convolution, and deep convolution and point convolution use ReLU activation to speed up training. The combination of DWSC and SENet can be expressed by the following formula. $u_{c} = v_{c} * X,$ (1) here * denotes convolution, X = [x₁, x₂,..., x_c] is the input feature map of the DWSC, v_c ∈ V, v_c is the 2D convolution kernel, V = [v₁, v₂,..., v_c] represents the convolution kernel set of DWSC, u_c ∈ U and U = [u₁, u₂,..., u_c], u_c is the output feature map of DWSC and the input feature map of SENet.

3.2 Depthwise separable convolution

DWSC was first applied in deep learning by Chollet [16]. DWSC is different from standard convolution operation. The standard convolution works by applying the convolution kernel to all channels of the input image and obtains the input covered by the kernel. The weighted sum of pixels that slide across all input channels of the image. This means that no matter how many input channels are available for standard convolution, there only is one output channel as shown in Fig. 2. The parameters required for standard convolution are in the following. $N \times C \times D_{k} \times D_{k},$ (2)

Fig. 2

Standard convolution. C is the number of input channels and convolution kernels of a filter, D_F is the size of the input channels, N is the number of filters and output feature maps, and D_k is the size of the convolution kernel.

where N is the number of convolution kernels, C is the number of input channels, and D_k is the size of the convolution kernel. DWSC is performed in two steps. The first step uses the same number of convolution kernels as the number of input channels to convolve each input channel separately, after one convolution, the output layer has the same number of feature maps as the input layer. This step is called deep convolution, as shown in Fig. 3. Deep convolution cannot expand the feature map. Moreover, this operation independently performs convolution operations on each channel of the input layer, and does not effectively use the feature information of different channels at the same spatial position. Therefore, pointwise convolution is needed to combine these feature maps to generate a new feature map.

Fig. 3

Deep convolution. C is the number of input channels, filters and output feature maps, D_F is the size of the input channels, and D_k is the size of the convolution kernel.

The second step is the pointwise convolution (1×1) convolutional layer, which weights the input maps in the depth direction to generate a new feature map, as shown in Fig. 4. The number of output feature maps is the same as the number of filters. The amount of parameters required for DWSC is in the following. $N \times D_{k} \times D_{k} + N \times C \times 1 \times 1,$ (3)

Fig. 4

Pointwise convolution. C is the number of input channels and convolution kernels of a filter, N is the number of filters and output feature maps.

C is the number of input channels and output feature maps of deep convolution, D_k is the size of the convolution kernel of deep convolution, and N is the number of filters of pointwise convolution. The ratio of the parameters required by the DWSC and the traditional convolution is 1/N+1/D_K². Using the DWSC can reduce a large number of parameters.

3.3 Channel attention mechanism

Squeeze-and-Excitation Networks (SENet) was proposed by Hu et al. [17]. The channel attention mechanism model in this paper is modified based on the SENet structure. The point convolutional layer is used to instead of the fully-connected layer of the SENet, which enables the input of feature maps of any number and size. The implementation process is shown in Fig. 5. Squeeze-and-Excitation is a module that consists of three parts, namely squeeze operation, excitation operation and reweight operation. The first is the squeeze operation, which aims to compress information for each input feature channel and reduce the deviation of the estimated mean value caused by the parameter error of the convolutional layer. The feature map after deep convolution and batch normalization is taked as input for global max pooling, so as to obtain a real number with global receptive field. The number of output matches the number of input feature channels. Global max pooling can be described as follows. $z_{c} = max u_{c} (i, j), i, j \in {1, 2, \dots, D_{F}},$ (4) where D_F represents the size of the input channel feature, $U \in ℝ^{C \times D_{F} \times D_{F}}$ and U = [u₁, u₂,..., u_c] represents the set of input feature map, $Z \in ℝ^{C}$ and Z = [z₁, z₂,..., z_c], u_c and z_c represent the c-th input and output feature maps of the global maximum pooling, respectively.

Fig. 5

Channel attention (SENet). C represents the number of input feature channels, r represents the dimensionality reduction coefficient.

The second is the excitation operation, which mainly generates weights for each feature channel. It consists of two pointwise convolution layers. The first pointwise convolutional layer reduces the dimensionality of the input feature channels by setting the number of channels to C/r. C represents the number of input channels, r represents the dimensionality reduction coefficient. When the number of input feature channels is C = 1, the dimensionality reduction coefficient is equal to r = 1. When the input feature channel is 1 < C < 64, the dimensionality reduction coefficient is equal to r = 8, when the input feature channel is C > = 64, the dimensionality reduction coefficient is equal to r = 16, and the second pointwise convolutional layer restores the number of compressed feature channels by setting the number of channels to C. Excitation operation can be described as follows. ${\begin{matrix} s = σ (W_{2} ReLU (W_{1} Z)), \\ ReLU (x) = max (0, x), \\ σ (x) = \frac{1}{1 + e^{- x}}, \end{matrix}$ (5)

Where $W_{1} \in R^{\frac{C}{r} \times C}, W_{2} \in R^{C \times \frac{C}{r}}$ , the “valid” as filling method is used in pointwise convolution to ensure that the results are obtained based on complete context features.

Finally, there is a reweight operation. The value of the output weight of the excitation indicates the importance of the feature channel. The weight of the output of the excitation is multiplied with the feature map input by squeeze to complete the recalibration of the original feature in the channel dimension. Reweight operation can be described as follows. $x_{c} = F_{scale} (u_{c}, s_{c}) = s_{c} \times u_{c},$ (6) where X = [x₁, x₂,..., x_c] is the output feature map of the channel attention mechanism. F_scale(u_c, s_c) refers to channel-wise multiplication between the scalar s_c ∈ S, S = [s₁, s₂,..., s_c] and the feature map u_c ∈ U, U = [u₁, u₂,..., u_c]. The channel attention model uses pointwise convolution, and the parameters introduced by these convolutional layers are as follows. $\sum_{s = 1}^{s} N_{s} \times \frac{2 \times C_{s}^{2}}{r},$ (7) where r represents the reduction rate, S represents the number of stages (a stage represents the collection of blocks operating on the feature map of the common space size), C_s represents the size of the output channel, and N_s represents the number of repeated blocks as the stage [17]. The number of parameters added to the channel attention structure is almost negligible.

4 Experiments

This section mainly introduces the dataset and evaluation criteria required for the experiment, and analyzes the experimental results.

4.1 Dataset

This paper uses two datasets to verify the effectiveness of the method, one is a two-class segmentation of the MICCAI 2012 Prostate MR Image Segmentation (PROMISE12) challenge dataset, and the other is a four-class segmentation of the Automatic Cardiac Diagnosis Challenge (ACDC) Dataset.

PROMISE12 only provides 50 patient samples as a training set. Due to the small number of PROMISE12 samples, the 50 patient samples is divided into 40 training sets, 5 validation sets and 5 test sets.

The Automatic Cardiac Diagnosis Challenge (ACDC) Dataset [25] provides 100 patient samples as a training set. The 100 patient samples is randomly selected into 70 training set, 15 validation set and 15 test set.

After that, each sample is sliced and adjusted the size of each slice to 256×256. The contrast of the sliced image is improved in the preprocessing part. Contrast-limited adaptive histogram equalization for histogram equalization is used. In addition, to make the pixel distribution more uniform, curvature-driven image denoising for each image is used. Finally, calculate the mean and standard deviation of the training set, and use the calculated mean and standard deviation to standardize the training set, validation set and the test set.

In the training set of PROMISE12 and ACDC, there are only about 1200 images with corresponding masks in the training set, therefore the data needs to be enhanced. Random rotation, shift, zoom, inversion and elastic deformation methods are used to enhance the training set to 150000 images.

4.2 Evaluation

In medical image segmentation, usually the region needs to be segmented which only occupy a small part of the entire image. In this environment, using Dice Similarity Coefficient (DSC) as the loss function has a better segmentation effect [18]. DSC is also an evaluation function. DSC is derived from two classifications, and essentially measures the overlap of two samples. The DSC value ranges from 0 to 1, where “1” means that the segmentation result completely overlaps the ground truth (GT). The calculation formula is in the following. $DSC (y, \hat{y}) = \frac{2 \sum_{i = 1}^{N} y_{i} \cdot {\hat{y}}_{i}}{\sum_{i = 1}^{N} y_{i} + \sum_{i = 1}^{N} {\hat{y}}_{i}},$ (8) where y represents elements of GT segmentation graph, $\hat{y}$ represents elements of the predicted segmentation map, N represents the total number of elements in the segmentation map, i denotes the i element in the segmentation map. In the experiment, 0.00001 is added to the numerator and denominator on the right side of the equation of formula (8) to prevent the denominator from being 0 during the calculation process.

It is not enough to use DSC to evaluate the effect of segmentation, because DSC cannot judge the effect of segmentation contour. Here, the Hausdorff Distance (HD) is introduced to evaluate the matching degree between the contour of the segmentation result and the GT contour. The smaller the HD value is, the more the edge of the segmentation result coincides with the edge of the GT result.

HD is a distance defined between any two sets in the metric space. The formula is in the following. ${\begin{matrix} d_{H} (X, Y) = max {d_{XY}, d_{YX}}, \\ d_{XY} = max_{x \in X} min_{y \in Y} d (x, y), \\ d_{YX} = max_{y \in Y} min_{x \in X} d (x, y), \end{matrix}$ (9) where Y represents the point set of the GT edge contour, Y = {y₁,..., y_m}, X represents the point set of the edge contour of the segmentation result X = {x₁,..., x_n}. The value of hausdorff distance indicates the maximum degree of mismatch between the segmentation result contour and the GT contour. When d_H(X, Y) = 0, it means that the segmentation result contour is completely coincident with the GT contour. $d (x, y) = ∥ x - y ∥,$ (10) the operator ∥· ∥ represents the distance between point sets. x represents the element in the X point set, and y represents the element in the Y point set.

4.3 Results

This experimental platform is based on Keras 2.3.1 version, adopting Tensorflow-GPU version 2.1.0 as the backend to experiment under the python3.7 framework. The hardware platform is NVIDIA GeForce RTX2060 GPU (6GB RAM) and AMD Ryzen 5 3600 6-Core Processor 3.59 GHz (16GB RAM), adopting CuDNN7.6 library as a benchmark function to ensure that the fastest algorithm is used.

Comparative Experiment: The proposed method is compared with PSPNet-ResNet50 and SegNet on PROMISE12 dataset and ACDC Dataset, respectively. The experimental results are shown in Tables 1 and 2. On the experimental platform, the maximum batchsize of PSPNet-ResNet50 and SegNet can only be set to 4. Therefore, to ensure the comparability of the experiment, we also set the batchsize of the proposed method to 4 while keeping other parameters consistent.

Table 1
Test results of PSPNet, SegNet and the proposed method on the PROMISE12 dataset

Methods DSC HD Parameters Memory

(%) (mm) (million) (MB)

PSPNet 81.60 16.16 46.76 548

SegNet 81.24 15.02 30.82 361

DSRU-Net 84.61 9.96 20.54 241

Methods	DSC	HD	Parameters	Memory
PSPNet	81.60	16.16	46.76	548
SegNet	81.24	15.02	30.82	361
DSRU-Net	84.61	9.96	20.54	241

Table 2

Test results of PSPNet, SegNet and the proposed method on the ACDC Dataset

Methods	DSC (%)				HD (mm)				Parameters	Memory
	LV	RV	MYO	Mean	LV	RV	MYO	Mean	(million)	(MB)
PSPNet	88.26	66.77	81.27	78.77	4.46	9.46	7.49	7.14	46.76	548
SegNet	72.93	32.22	62.95	56.03	9.12	19.66	12.72	13.83	30.82	361
DSRU-Net	85.70	64.63	77.25	75.86	5.32	9.89	8.94	8.05	20.54	241

Ablation experiment: To prove the validity of the model, ablation experiments is conducted, and two different modifications to the architecture is designed and prepared four groups of experiments, namely: ① ResU-Net – Residual U-Net architecture. ② ResU-Net with DWSC (ResU-Net-D) – Residual U-Net architecture with depthwise separable layers, using DWSC to replace the standard CNN. ③ ResU-Net with channel attention mechanism (ResU-Net-C) –Residual U-Net architecture with attention mechanism. ④ DSRU-Net –Residual U-Net architecture with DWSC and channel attention mechanism. In this experiment, only the structure of the model is adjusted, and the rest remains the same.

The backpropagation algorithm is used to train the model. The optimizer selects Adam, the learning rate is set to 0.001, and the momentum is set to 0.9. When the accuracy of training does not increase by 0.001 after 5 consecutive epochs, the training is ended early, the results of training are saved, and the results are used to evaluate the model. The input image batch is 16.

The comparative experimental results on the PROMISE12 dataset are shown in Table 1. From the results of Table 1, the DSC and HD values of PSPNet, SegNet and DSRU-Net are 81.60%, 81.24%, 84.61% and 16.16 mm, 15.02 mm, 9.96 mm. The parameters of PSPNet, SegNet and DSRU-Net are 46.76 million, 30.82 million and 20.54 million respectively. From the results, it can be seen that the proposed method DSRU-Net has greater advantages in segmentation effect and parameter amount.

The comparative experimental results on the ACDC dataset are shown in Table 2. From the results of Table 2, the DSC and HD values of PSPNet, SegNet and DSRU-Net are 78.77%, 56.03%, 75.86% and 7.14 mm, respectively, 13.83 mm, 8.05 mm. The parameters of PSPNet, SegNet and DSRU-Net are 46.76 million, 30.82 million and 20.54 million, respectively. From the results, it can be seen that PSPNet achieves the best segmentation effect, but the proposed method DSRU-Net is not far behind PSPNet in terms of segmentation effect. At the same time, the proposed method DSRU-Net has greater advantages in terms of parameters.

From the comparative experiment of PROMISE12 dataset and ACDC dataset, it can be seen that the proposed method can not only achieve a better segmentation effect but also effectively reduce the amount of model parameters.

The results of ablation experiments on the PROMISE12 dataset are shown in Table 3. From the results in Table 3, the DSC value of ResU-Net, ResU-Net-D, ResU-Net-C and DSRU-Net is 87.65%, 88.50%, 88.22% and 88.40% respectively. The HD value of ResU-Net, ResU-Net-D, ResU-Net-C and DSRU-Net is 18.00 mm, 37.81 mm, 21.67 mm and 15.80 mm respectively. The parameters of ResU-Net, ResU-Net-D, ResU-Net-C and DSRU-Net is 53.25 million, 19.98 million, 53.47 million and 20.76 million respectively. The segmentation results of each model are shown in Fig. 6.

Table 3

Quantitative comparison of four groups of experiments of PROMISE12 dataset

Methods	DSC	HD	Parameters	Memory
	(%)	(mm)	(million)	(MB)
ResU-Net	87.65	18.00	53.25	624
ResU-Net-D	88.50	37.81	19.98	234
ResU-Net-C	88.22	21.67	53.47	627
DSRU-Net	88.40	15.80	20.76	244

Fig. 6

The blue curve represents the contour of the prostate obtained through deep learning, while the red curve represents the contour obtained manually and segmented by an experienced radiologist.

The results of the ablation experiment on the ACDC dataset are shown in Table 4. From the results in Table 4, the DSC value of ResU-Net, ResU-Net-D, ResU-Net-C and DSRU-Net is 77.10%, 77.32%, 79.78% and 78.49% respectively. The HD value of ResU-Net, ResU-Net-D, ResU-Net-C and DSRU-Net is 7.58 mm, 7.54 mm, 6.98 mm and 7.46 mm respectively. The parameters of ResU-Net, ResU-Net-D, ResU-Net-C and DSRU-Net is 53.25 million, 19.98 million, 53.47 million and 20.76 million respectively. The segmentation results of each model are shown in Fig. 7.

Table 4

Quantitative comparison of four groups of experiments of ACDC Dataset, LV, RV and MYO represent the left ventricle, right ventricle and myocardium, respectively

Methods	DSC (%)				HD (mm)				Parameters	Memory
	LV	RV	MYO	Mean	LV	RV	MYO	Mean	(million)	(MB)
ResU-Net	85.11	66.90	79.28	77.10	5.49	9.21	8.03	7.58	53.25	624
ResU-Net-D	88.19	63.32	80.46	77.32	4.79	9.81	8.03	7.54	19.98	235
ResU-Net-C	88.25	68.76	82.32	79.78	4.66	8.92	7.35	6.98	53.80	631
DSRU-Net	88.12	66.26	81.10	78.49	4.73	9.39	8.27	7.46	20.54	242

Fig. 7

Qualitative comparison on ACDC dataset. The proposed DSRU-Net achieves more accurate results along with better smoothness and continuity in shape. Yellow, red and green represent the left ventricle, right ventricle and myocardium, respectively.

4.4 Discussion

From the results in Tables 3 and 4, it is impossible to intuitively judge that certain models are better, so the fuzzy comprehensive evaluation method is introduced. Fuzzy comprehensive evaluation method is one of the most basic mathematical methods in fuzzy mathematics. Fuzzy mathematics is used to make an overall evaluation of things or objects restricted by multiple factors.

The purpose of this experiment is to reduce the model parameters and improve the segmentation effect. Therefore, the first-level evaluation factors is set as the model parameter quantity u₁ and the segmentation accuracy rate u₂, and set the second-level evaluation factors DSC value u₂₁ and HD value u₂₂ under u₂. In this experiment, u₁ has no second-level indicators and u₂ has Second-level indicators, so the weights of u₁ and u₂ are set to 0.4 and 0.6, respectively. The DSC value and HD value are commonly used accuracy evaluation indicators, so the weights of u₂₁ and u₂₂ are set to 0.5 and 0.5, respectively. From the above content, Table 5 can be designed. The values listed in Tables 3 and 4 refer to the values of the evaluation factors. The evaluation value is calculated based on the data in Tables 3 and 4. The evaluation value refers to the degree of merit of the evaluation factor. Taking the best evaluation factor as the benchmark, its evaluation value is 1; the other poor evaluation factors get corresponding evaluation values based on the degree of the poor quality. The corresponding relationship between the evaluation value and the evaluation factor value (technical parameter value) is proportional: evaluation value = technical parameter value/optimal technical parameter value. The corresponding relationship between evaluation value and evaluation factor value (technical parameter value) is inversely proportional: evaluation value = optimal technical parameter value/technical parameter value. The DSC value is the higher the better, and the the HD value and model parameters are the lower the better. The evaluation value of each evaluation factor of Tables 3 and 4 is shown in Tables 6 and 7, respectively. The fuzzy comprehensive evaluation method uses the comprehensive evaluation value to judge the pros and cons of each model. The comprehensive evaluation value can be calculated by the following formula.

$E = A \times V,$ (11) where A = [a₁, a₂, a₃], a₁ represents the weight value of u₁, a₂ represents the weight value of u₂×u₂₁, a₃ represents the weight value of u₂×u₂₂, V is the evaluation value of each factor V¡?^3 ×4. Through formula (13) the comprehensive evaluation values of ResU-Net, ResU-Net-D, ResU-Net-C, and DSRU-Net of PROMISE12 dataset can be obtained, respectively as 0.7104, 0.8254, 0.6666, 0.9889. The comprehensive evaluation values of ResU-Net, ResU-Net-D, ResU-Net-C, and DSRU-Net of ACDC dataset can be obtained, respectively as 0.7161, 0.9685, 0.7488, 0.9652. From the comprehensive evaluation value of PROMISE12 dataset, the proposed method has obvious advantages, while from the comprehensive evaluation value of ACDC dataset, the difference between the proposed method and ResU-Net-D is not obvious. It shows that the proposed method may have obvious advantages in the two class of segmentation datasets, but the advantages in the four class of segmentation datasets are not obvious.

Table 5

Second-level indicators for evaluating model effects

First-level indicators	Second-level indicators
Model parameters quantity u₁ = 0.4
Segmentation accuracy u₂ = 0.6	DSC value u₂₁ = 0.5
	HD value u₂₂ = 0.5

Table 6

The evaluation value of the evaluation factors of each model of PROMISE12 dataset

	ResU-Net	ResU-Net-D	ResU-Net-C	DSRU-Net
u₁	0.375	1	0.372	0.973
u₂₁	0. 990	1	0.997	0.999
u₂₂	0.878	0.418	0.729	1

Table 7

The evaluation value of the evaluation factors of each model of ACDC dataset

	ResU-Net	ResU-Net-D	ResU-Net-C	DSRU-Net
u₁	0.375	1	0.372	0.973
u₂₁	0.966	0.969	1	0.984
u₂₂	0.921	0.926	1	0.936

According to the results in Tables 6 and 7, the evaluation value of each method model parameter quantity u₁ is 0.375, 1.0, 0.372, 0.973. The amount of parameters using DWSC is reduced by about 60% compared to that of CNN. It shows that the model structure of DWSC can greatly reduce the amount of parameters, while also reducing the storage space. The evaluation values of the segmentation accuracy u₂ of each method in the PROMISE12 dataset are 0.9340, 0.7090, 0.8630, 0.9995, and the evaluation values of the segmentation accuracy u₂ of each method in the ACDC dataset are 0.9435, 0.9475, 1.0, 0.9600. From the PROMISE12 dataset, comparing the u₂ values of ResU-Net and ResU-Net-D, it can be seen that using DWSC instead of traditional convolution will significantly reduce the segmentation effect. Comparing the u₂ value of ResU-Net-D and the proposed method, it can be seen that the channel attention mechanism can improve the segmentation performance. Taking the average of the fuzzy comprehensive evaluation values obtained from the PROMISE12 dataset and the ACDC dataset, the proposed method is the highest, indicating that the combination of DWSC and the channel attention mechanism can not only reduce the model parameters, but also improve the segmentation effect.

5 Conclusion

This paper proposes the DSRU-Net method. This method uses DWSC to replace the standard CNN of U-Net so it greatly reduces the amount of model parameters. At the same time, the reduction of U-Net effect caused by the use of DWSC layers can be improved by the channel attention mechanism. Through comparison, it can be seen that the method in this paper has better segmentation effect and fewer model parameters than PSPNet and SegNet. The fuzzy comprehensive evaluation method is used to evaluate the ablation experiment. The fuzzy comprehensive evaluation method includes three evaluation factors, which are the parameters required by the model, the value of DSC and the value of HD. The fuzzy comprehensive evaluation method is used to evaluate the proposed method on PROMISE12 dataset and ACDC dataset. The proposed method is tested in selected 5 PROMISE12 dataset samples and 15 ACDC dataset samples. The proposed method obtaines the fuzzy comprehensive evaluation values in these two datasets, which is 0.9889 and 0.9652 respectively. The result show that the proposed method is the best choice in balancing the amount of parameters and the segmentation effect.

References

Siegel

R.L.

, Miller

K.D.

and Jemal

, Cancer statistics, CA—Cancer J. Clin. 67(1) (2017), 7–30.

Rasch

, Barillot

, Remeijer

, Touw

, van Herk

and Lebesque

J.V.

, Definition of the prostate in CT and MRI: a multi-observer study, Int J Radiat Oncol Biol Phys 43(1) (1999), 57–66.

Boni

R.A.H.

, Boner

J.A.

, Debatin

J.F.

, Trinkler

, Knonagel

, Vonhochstetter

, Helfenstein

and Krestin

G.P.

, Optimization of prostate carcinoma staging-comparison of imaging and clinical methods, Clin Radiol 50(9) (1995), 593–600.

Hyperfine Research, Inc., DEVICE: Lucy Point-of-Care MagNetic Resonance Imaging Device, 510(k), NO: K192002.

Roehrborn

C.G.

, et al., Serum prostate-specific antigen and prostate volume predict long-term changes in symptoms and flow rate: results of a four-year, randomized trial comparing finasteride versus placebo, Urology 54(4) (1999), 662–669.

Fei

, Kemper

and Wilson

D.L.

, A comparative study of warping and rigid body registration for the prostate and pelvic MR volumes, Comput. Med. Imaging Graphics 27(4) (2003), 267–281.

Fei

, et al., Slice-to-volume registration and its potential application to interventional MRI-guided radio-frequency thermal ablation of prostate cancer, IEEE Trans. Med. Imaging 22(4) (2003), 515–525.

Qiu

, et al., Prostate segmentation: an efficient convex optimization approach with axial symmetry using 3-D TRUS and MR images, IEEE Trans. Med. Imaging 33(4) (2014), 947–960.

Liao

, Gao

, Oto

and Shen

, Representation learning: a unified deep learning framework for automatic prostate MR segmentation, in Medical Image Computing and Computer-assisted Intervention: Miccai International Conference on Medical Image Computing and Computerassisted Intervention (2013), 254–261.

10.

Yan

, Xu

, Turkbey

and Kruecker

, Discrete deformable model guided by partial active shape model for trus image segmentation, IEEE transactions on bio-medical engineering 57(5) (2010), 1158–1166.

11.

Long

, Shelhamer

and Darrell

, Fully convolutional Networks for semantic segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), 3431–3440.

12.

Olaf

, Fischer

and Brox

, U-Net: Convolutional Networks for biomedical image segmentation, International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, Cham (2015), 234–241.

13.

Zhang

, Li

, Deng

, Wang

, Lin

, Ji

and Shen

, Deep convolutional neural Networks for multi-modality isointense infant brain image segmentation, Neuroimage 108 (2015), 214–224.

14.

Milletari

, Navab

and Ahmadi

S.-A.

, V-Net: Fully convolutional neural Networks for volumetric medical image segmentation, 2016 Fourth International Conference on 3D Vision (3DV) (2016), 565–571.

15.

, Yang

, et al., Volumetric ConvNets with mixed residual connections for automated prostate segmentation from 3D MR images, in, Annual Conf. of Association for the Advancement of Artificial Intelligence (2017), 66–72.

16.

Chollet

, Xception: Deep Learning with Depthwise Separable Convolutions, The IEEE Conference on Computer Vision and Pattern Recognition (2016), 1251–1258.

17.

, Shen

, et al., Squeeze-and-Excitation Networks, The IEEE Conference on Computer Vision and Pattern Recognition (2018), 7132–7141.

18.

Sudre

C.H.

, Li

, Vercauteren

, Ourselin

and Cardoso

M.J.

, Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations, Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (2017), 240–248.

19.

Chen

L.C.

, Zhu

, Papandreou

, et al., Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Paper Presented at the Meeting of the ECCV (7) (2018).

20.

, Yang

, Li

, et al., X-net: Brain stroke lesion segmentation based on depthwise separable convolution and long-range dependencies, Springer, Cham, International Conference on Medical Image Computing and Computer-Assisted Intervention (2019), 247–255.

21.

Wang

, Xiong

, Wang

, et al., ADSCNet: asymmetric depthwise separable convolution for semantic segmentation in real-time, Applied Intelligence 50(4) (2020), 1045–1056.

22.

Howard

, Zhu

, Chen

, Kalenichenko

, Wang

, Weyand

, Andreetto

and Adam

, Mobilenets: efficient convolutional neural networks for mobile vision applications (2017).

23.

Iandola

F.N.

, Han

, Moskewicz

M.W.

, et al., SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5MB model size (2016).

24.

Zhang

, Zhou

, Lin

, et al., ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices, Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), 6848–6856.

25.

Bernard

, Lalande

, Zotti

, Cervenansky

, et al., Deep Learning Techniques for Automatic MRI Cardiac Multi-structures Segmentation and Diagnosis: Is the Problem Solved ? in IEEE Transactions on Medical Imaging 37(11) (2018), 2514–2525.

26.

Badrinarayanan

, Kendall

and Cipolla

, SegNet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 39(12) (2017), 2481–2495.

27.

, Wang

, Peng

, Gao

, Yu

, Sang

, Bisenet: Bilateral segmentation network for real-time semantic segmentation, In: Proc. European Conference on Computer Vision (ECCV) (2018), 325–334.

A lightweight segmentation method based on residual U-Net for MR images

Abstract

Keywords

1 Introduction

2 Related work

3 Methods

4.1 Dataset

4.2 Evaluation

Table 1 Test results of PSPNet, SegNet and the proposed method on the PROMISE12 dataset Methods DSC HD Parameters Memory (%) (mm) (million) (MB) PSPNet 81.60 16.16 46.76 548 SegNet 81.24 15.02 30.82 361 DSRU-Net 84.61 9.96 20.54 241

References

Table 1
Test results of PSPNet, SegNet and the proposed method on the PROMISE12 dataset

Methods DSC HD Parameters Memory

(%) (mm) (million) (MB)

PSPNet 81.60 16.16 46.76 548

SegNet 81.24 15.02 30.82 361

DSRU-Net 84.61 9.96 20.54 241