Abstract
BACKGROUND:
Brain tumor segmentation plays an important role in assisting diagnosis of disease, treatment plan planning, and surgical navigation.
OBJECTIVE:
This study aims to improve the accuracy of tumor boundary segmentation using the multi-scale U-Net network.
METHODS:
In this study, a novel U-Net with dilated convolution (DCU-Net) structure is proposed for brain tumor segmentation based on the classic U-Net structure. First, the MR brain tumor images are pre-processed to alleviate the class imbalance problem by reducing the input of the background pixels. Then, the multi-scale spatial pyramid pooling is used to replace the max pooling at the end of the down-sampling path. It can expand the feature receptive field while maintaining image resolution. Finally, a dilated convolution residual block is combined to improve the skip connections in the training networks to improve the network’s ability to recognize the tumor details.
RESULTS:
The proposed model has been evaluated using the Brain Tumor Segmentation (BRATS) 2018 Challenge training dataset and achieved the dice similarity coefficients (DSC) score of 0.91, 0.78 and 0.83 for whole tumor, core tumor and enhancing tumor segmentation, respectively.
CONCLUSIONS:
The experiment results indicate that the proposed model yields a promising performance in automated brain tumor segmentation.
Introduction
Glioma is one of the most common types of primary tumor that occur in the brain. It grows from glioma cells and can be categorized into low and high grade gliomas. High grade gliomas (HGG) are more aggressive and malignant with a life expectancy of at most two years, while low grade gliomas (LGG) can be benign or malignant, and grow more slowly with a life expectancy of several years [1]. Benign tumors generally recover after surgery, and malignant tumors are difficult to cure because of their intractability. It is a serious hazard to human health, so how to better diagnose and treat it is essential [2].
With the development of medical imaging technology, the role of imaging technology in disease diagnosis is becoming more and more important. Medical imaging technology mainly includes X-ray examination, computed tomography (CT), Ultrasound and magnetic resonance imaging (MRI), etc. Among them, MRI can provide a variety of information diagnostic capabilities and achieve multiple anatomical tomographic imaging by setting different parameters due to its sharpness and tissue resolution. Furthermore, MRI is non-invasive and provides detailed image shape, size and position information without receiving high ionizing radiation. Therefore, it has attracted more and more attention in the diagnosis, treatment and surgical guidance of brain tumors. Brain tumor segmentation is effective in current diagnosis and treatment. By segmentation of brain tumor, doctors can measure the location, size and other parameters of brain tumor, determine the growth state and change process of tumor, conduct quantitative analysis and follow-up comparison of brain tumor.
Traditionally, the determination and staging of glioma are mainly based on a radiologist’s experience and intuition, leading to poor diagnosis stability and reliability. Through accurately and reliably converting medical images into quantified digital features, radiomics provides an effective solution for automatic detection and determination of glioma by describing the microenvironment of tumor lesion [3–5]. In addition, it has been shown to be effective in computer-aided diagnosis and computer assisted surgery and radiotherapy as well as medical research of glioma patients by extracting personalized features for individual patient. With the emergence of large-scale labeled data and the development of computer technology, the automatic segmentation of brain tumor by using convolutional neural network (CNN) has become a hotspot of current research. Brain tumor segmentation can be a semantic segmentation task. Fully convolutional networks (FCN) [6] established a CNN structure that is widely used for the task of semantic segmentation. Ronneberger et al. [7] proposed U-Net model for medical image segmentation based on the full convolution network model. Both U-Net and FCN have classical encoding-decoding topologies, but U-Net has symmetrical network structure and skip connections, and U-Net has better performance than FCN in segmentation of brain tumor images.
With the straight-forward and successful structure, U-Net quickly evolved to a commonly used benchmark in medical image segmentation [8]. Segmentation method on the basis of U-Net can be roughly divided into two categories, 2D U-Net structure and 3D U-Net structure.
A. Research based on 2D U-Net structure. With the deepening of the network, U-Net will lose some detailed feature information, resulting in fuzzy segmentation of the target boundary. Therefore, most researchers choose to conduct multi-feature scale feature fusion on the basis of U-Net. Alom et al. [9] proposed a recurrent U-Net model and a recurrent residual U-Net model, which design better U-Net architectures with the same number of network parameters, and proved through experiments that the model had better performance in retinal image segmentation, skin cancer segmentation and lung lesion segmentation tasks. Zhang et al. [10] proposed automatic breast and fibroglandular tissue (FGT) segmentation in breast MRI using deep learning by a fully-convolutional residual neural network U-Net. Because of the introduction of residual structure, the model can obtain accurate segmentation results for the breast and FGT on MRI. Seo et al. [11] added a residual path with deconvolution and activation operations to the skip connection of the U-Net to avoid duplication of low resolution information of features. In addition, the proposed architecture has additional convolution layers in the skip connection in order to extract high level global features of small object inputs as well as high level features of high resolution edge information of large object inputs. The efficacy of the modified U-Net (mU-Net) was demonstrated using the public dataset of Liver tumor segmentation (LiTS) challenge 2017. Ibtehaz N et al. [12] proposed Multi Res-Unet on the basis of U-Net, added extension of residual connection, and proposed a residual path, which verified its good segmentation performance on ISIC, BRATS and other datasets.
B. Research on 3D U-Net structure. The 3D network structure can make full use of the 3D features of MR images and obtain more accurate segmentation results. Therefore, some researchers choose to use 3D convolution kernel on the base of U-Net to extract features of tumors. Isensee et al. [8] proposed a medical image segmentation framework that can adapt to any new data set. The framework can automatically adjust all the hyper-parameters according to the attributes of a given data set without manual intervention. The Nnu-net, which relies on only two 3D U-Net cascades and robust training scheme, achieved the most advanced performance in six recognized segmentation challenges. Myronenko [13] designed a semantic segmentation network for tumor sub-regions segmentation from 3D MRIs based on encoder-decoder architecture. Due to a limited training dataset size, a variational auto-encoder branch is added to reconstruct the input image itself in order to regularize the shared decoder and impose additional constraints on its layers. The method won 1st place in the BRATS 2018 challenge. Chen et al. [14] proposed a novel Separable 3D U-Net structure, which exploits the intra-slice and inter-slice representations separately, and the DSC score of the model on the BRATS 2018 dataset reached 0.88.
It can be found that the researchers slice the input 3D data by building 2D network structure and convert it into 2D data input, so that the network can complete the rapid training of network parameters on ordinary hardware facilities. The 3D network structure can make full use of the 3D features of MR images and obtain more accurate segmentation results. However, due to the large number of parameters in the training network, it is difficult for the network to train from scratch, and there are problems such as excessive consumption of GPU and memory, which have higher requirements on computer hardware facilities. Therefore, researchers prefer 2D network in task of brain tumor segmentation.
Ronneberger [7] employed up-convolution layers, which are useful for recovering the information lost through down-sampling processes at pooling layers in a convolution network. Apparently, this technique is only able to recover part of lost information. Therefore, the accuracy of this image recognition technique remains limited. To address this limitation, Chen et al. [15] adopted dilated convolutions to extract denser feature maps without using down-sampling operations at the last several layers of a pre-trained network. Guillaume et al. [16] proposed a full convolutional network with dilated convolutions for the segmentation of hand written text lines, and achieved better segmentation results than FCN. Vo et al. [17] investigated the effects of the cascade architecture of dilated convolutions and the deep network architecture of multi-resolution input images on the accuracy of semantic segmentation. It is shown that deep network architecture for multi-resolution input images increases the accuracy of semantic segmentation by aggregating multi-scale contextual information. It can be observed that adding dilated convolution into CNN could improve the learning ability of the network and obtain better segmentation results.
The classic U-Net method performs well on the task of segmenting brain tumors, but it still has some problems. On the one hand, there is a serious problem of class-imbalance in the training data set, which makes the model insufficient to learn the features of small targets, and finally affects the segmentation accuracy. On the other hand, the problem of loss of detailed information in the sampling path under the U-Net model has not been solved, especially the pooling operation, which seriously reduces the resolution of the feature map. In view of the problems of U-Net method in these two aspects, we propose to establish a brain tumor segmentation model with higher segmentation accuracy than U-Net by preprocessing the training data set and introducing the dilated convolution into the u-net model, and making up for the resolution of the feature map through the dilated convolution.
Inspired by dilated convolution and U-Net, a DCU-Net segmentation network is presented in this paper. The remainder of this paper is organized as follows. In Section 2, the proposed method is presented. The databases used for evaluation and the experimental setup are detailed in Section 3. Results are presented and discussed in Section 4. Finally, the main conclusions are presented in Section 5.
Theory and method
The flow diagram of the proposed method is shown in Fig. 1. There are three main steps: Pre-processing, Network architecture and training, Brain tumor structures prediction.

Flow diagram of the proposed method.
Step1. The 3D MRI training dataset was processed to produce 2D image patches with four modes as the input of the training stage.
Step 2. The 2D image patches were input into the constructed DCU-Net model to complete the feature extraction, and the loss function is minimized by the optimization algorithm to obtain the DCU-Net optimal parameter model.
Step 3. The image of the test dataset to be segmented is input into the trained DCU-Net for segmentation of the tumor region, and finally the segmented image is output.
First, MRI of DCU-Net patients are randomly selected as the training set. 3D MRI images of four modes with the size of 240×240×155 are cropped and processed. Because brain tumors mostly appeared in intracranial regions of the brain, in order to reduce the calculation amount, part of the top layer and the bottom layer of the 155 layers of brain MR imaging were ignored, and only the middle146 layers were selected as the dataset. Due to the serious class imbalance problem of brain tumor MRI, that is, the area of tumor area is very different from that of healthy tissue area, such dataset is involved in training, and the network will have a hard time learning the features of tumor part. To solve this problem, the size of each layer in 146 layers are cut from 240×240 to 192×152, and some background pixels were removed, which reduced the amount of computation and alleviated the class imbalance problem. Finally, 3D MRI images with the size of 240×240×155 are cropped to 3D MRI images with the size of 192×152×146.
Second, the intensity normalization of MRI images was carried out, the 1% highest and 1% lowest intensity values of each image sequence are removed in order to make the intensity values of different image sequence within a coherent range. And the intensity normalization algorithm is used to subtract the mean value and divide the variance of the intensity value of each pixel in the image to obtain a standardized image with the intensity value range of [0, 1] [18].
Last, in order to conduct training in a limited storage environment, the 50 3D MRI images with size 192×152×146 are further divided into 21900 2D image patches with size 128×128 with pixel as the center, and the intensity is normalized.
The proposed architecture
The architecture of DCU-Net is shown in Fig. 2. The model was trained by using the label value and the processed image patches with the size of 128×128. DCU-Net carries out convolution, up-sampling, dilated convolution, pooling, Merge the Dilated Spatial Pyramid Pooling (DSPP) and batch normalization (BN) on the input image patches. A certain number of feature maps will be produced, then the nonlinear activation function will be used at the end of the network to transform the feature into the probability prediction value of three labels, and finally the loss function will be minimized through the back propagation algorithm. The followings are the important information of the DCU-Net involved in this paper.

Detail diagram of DCU-Net structure.
In conventional deep neural networks, max-pooling layers and stride operators are repeatedly used to down-sample an input representation and reduce its dimensionality. These techniques are also helpful for decreasing the computational cost by reducing the number of training parameters. However, it leads to a significant reduction in the spatial resolution, as shown in Fig. 3. Where, 64×64, 54×54×5, 44×44×5 and 22×22×5 represent the size of the image. The arrows in Fig. 3 represent the convolution or pooling of the image. After the convolution and pooling of the image, the size of the feature map continues to decrease. In particular, the pooling operation seriously loses the resolution of the feature map, which is not conducive to the expression of detailed features. Yu et al. [19] first proposed dilated convolution in 2015. Compared with ordinary convolution, dilated convolution has one more parameter, which is used to specify the number of convolution kernel intervals. Dilated convolutional layers have recently been adopted to recover the spatial resolution without increasing the number of training parameters.

Deep convolutional neural network without dilated convolutions.
Unlike max-pooling layers and stride operators, a two-dimensional dilated convolution layer is applied for an input representation by dilating the filter before computing the usual convolution. The size of the filter is expanded, and the empty positions are filled completely with zeros. As a result, the weights are matched to distant elements in the input matrix. The distance is determined by the rate value. If the kernel center is aligned to an arbitrary location in an image, then the kernel elements are matched to input elements as shown in Fig. 4.

Dilated convolutions with different rates. From left to right: (a) rate = 1, kernel = 3×3; (b) rate = 2, kernel = 5×5; (c) rate = 4, kernel = 12×12.
Dilated convolution operation is performed on the feature maps of layer 1, and the calculation formula of the output feature maps size is as follows:
Where, L1 and Ll+1 represent the size of the feature maps at the 1 and 1 + 1 layers respectively, f is the convolution kernel, S0 is the stride of dilated convolution, p is padding and r is the dilation rate. Equation (1) shows that the main advantage of dilated convolutions is to expand the receptive field of filters at convolution layers, while the resolution of the input feature maps is not reduced. In the segmentation task of brain tumor, U-Net convolves the image first and then performs the max-pooling of 2×2 like the traditional CNN. Although the pooling can increase the receptive field, it will also reduce the size of the feature maps and resolution. Then the up-sampling will be restored to the original image size, which will affect the segmentation accuracy. Therefore, in this paper, dilated convolution is adopted to replace part of the ordinary convolution operation, which can enlarge the receptive field while keeping the image size unchanged and greatly reduce the loss of feature information.
The classical U-Net structure directly combines the high-level semantic information and the low-level detailed information through the skip connection, and then sends it into the up-sampling path for segmentation. However, the resolution of feature maps at different stages is different. Directly integrating the features of the down-sampling path with the corresponding up-sampling path will ignore the expression of the superficial details, which is not conducive to the fine segmentation of tumor. Therefore, we proposed a skip connection of residual with dilated (RD-Skip) in this paper, as shown in Fig. 5.

The structure of RD–Skip.
Where Dilated_X respectively represents the dilated convolution with dilation rate of 16, 8, 4 and 2. The superficial characteristic information in the down-sampling path was further expanded to its receptive field, the Add operation is used to form a residual block with dilated convolution with the original input image, which was then fused with the up-sampling path at the corresponding stage to finally obtain the segmented tumor image.
Four max-pooling operations are added into the down-sampling path of the classical U-Net structure, on the one hand, to reduce the amount of computation, on the other hand, to enlarge the receptive field. However, each time the max-pooling operation is performed, the resolution of the feature map is reduced, which is not conducive to the expression of detailed features. DSPP [20] can increase the feature receptive field while maintaining the resolution of the feature map. After the U-Net performs the last max-pooling operation at the end of the down-sampling path, the obtained feature map resolution is the smallest, and after this the down-sampling path no longer performs the max-pooling operation. Therefore, the DSPP is used to replace the 3×3×1024 convolutional layer and max-pooling operation in the classic U-Net at the end of the network down-sampling path, as shown in Fig. 6. First of all, The DSPP module performs batch normalization on the input image, in order to improve the training speed, 3×3 dilated convolution cascades of four scales were carried out on the input feature maps, with dilated rates of 2, 4, 8 and 16 respectively. Feature maps of different receptive fields were extracted, and feature information was fused. Finally, feature maps captured by different receptive fields were output cascaded. DSPP can improve the expression of detailed features and enhance the identification ability of tumor features.

The structure of DSPP.
Inspired by [21], it is found that 3×3 small convolutional kernel has a great potential to obtain the deepest convolutional layer with the least number of parameters, which can enhance the nonlinear mapping ability and prevent over-fitting. Therefore, there are 8 ordinary convolution with size of 3×3 on the down-sampling path of DCU-Net, which are used to extract image features, and the max-pooling operation with size of 3×3 is used for three times to reduce parameters and improve training speed. Batch Normalization is added in the convolution path to avoid the disappearance of the gradient and improve the training speed of network parameters. In order to make up for the loss of local information caused by pooling, DSPP block was used at the end of the down-sampling path to replace pooling, expand the receptive field of convolution kernel extraction features, and keep the image resolution unchanged to reduce the loss of detailed information. Different from the direct connection of U-Net, DCU-Net introduces dilated convolution of different scales in the skip connection part of the network to form RD-Skip, so as to achieve the purpose of narrowing the difference in image resolution between corresponding contraction path and expansion path. There are four up-sampling operations with a magnification factor of 2×2 on the up-sampling path, and bilinear interpolation algorithm is adopted to insert new elements between pixels on the base of the original image pixels. Through four merge operations, the output result of RD-Skip and the data of the up-sampling are connected in series on the Z axis. Finally, the probability mapping of pixel points was conducted through softmax function, and the output of the segmented label probability map was obtained.
Table 1 summarizes the structure parameters of four Dilated_X layers in DCU-Net. Note that each layer is connected to all the output features of the corresponding up-sampling layer. After passing through the Dilated_X layer, the fused feature image is output and finally serves as the input of the corresponding upper sampling layer. Table 2 summarizes the structure parameters of DSPP in DCU-Net.
Structure parameters of Dilated_X layers in DCU-Net
Structure parameters of Dilated_X layers in DCU-Net
Notes. Structure of the Dilated_X, the kernel are described as height×width, dilation rate are described as the number of convolution kernel intervals, and every layer uses Rectified Linear Units (ReLU) for activations.
Structure parameters of DSPP in DCU-Net
Notes. Structure of the DSPP, the kernel are described as height×width, dilation rate are described as the number of convolution kernel intervals, and every layer uses Rectified Linear Units (ReLU) for activations.
The training of DCU-Net model adopts the categorical cross entropy [22] is used as the loss function, cross entropy is used to measure the difference between the actual distribution and the expected distribution which is defined as follow:
The size of the test dataset of 240×240×155 was input into the trained DCU-Net to segment the tumor area. Commonly used classification functions for classification tasks are sigmoid and softmax. Softmax is the extension of binary classification function sigmoid on multiple classification, aiming to present the result of multiple classification in the form of probability. In the segmentation test dataset, the network mapped the original output values of multiple neurons to the probability distribution of [0,1] through softmax classification function, and take the maximum value of the probability as the classification label of the pixel value, finally divided the segmented image into three tumor labels pixel by pixel.
Material and experiment
Database
We evaluated our system on the fully-annotated MICCAI BRATS 2018 dataset [23], encompasses data of 285 patients with diagnosed gliomas, 210 high-grade glioblastomas (HGG) and 75 low-grade gliomas (LGG). The data comes in four co-registered modalities, as shown in Fig. 7, native pre-contrast (T1), post-contrast T1-weighted (T1c), T2-weighted (T2) and T2 Fluid Attenuated Inversion Recovery (FLAIR) MRI [24]. All the pixels have one of four labels attached, healthy tissue (label0), GD-enhancing tumor (ET, label4), peritumoral edema (ED, label2), the necrotic and non-enhancing tumor core (NCR/NET, label1). Ground Truth (GT) is a manual segmentation of brain tumors by experienced experts. According to the protocol in the BRATS 2018 dataset, the brain tumor region of each patient can be further described into three sub-regions and assigned different labels, as shown in Table 3. The studies were interpolated to the same shape (155×240×240 with voxel size 1 mm3) and they were skull-stripped.

MRI image modal sequence. From left to right: Flair modality, T1 modality, T1c modality, T2 modality, ground truth.
Tumor segmentation sub-region and label description
MR images of 50 brain tumor patients from the BRATS 2018 dataset are randomly selected as the training set, including 35 cases of HGG and 15 cases of LGG. In the training stage, Keras deep learning framework is used to learn the model parameters of the training set. The selected small initial learning rate was 1e-4, the size of batch processing was set as 8, and the ratio of the data amount of the training set and the verification set for the pre-processed two-dimensional image patches are 4 : 1. The 5-fold cross-validation method is used to validate the performance of the model, and the data were trained for 100 epochs. In order to evaluate the performance of the DCU-Net model proposed in this paper, two sets of comparative experiments were designed:
In the first group of experiments, the DCU-Net, DCU-Net* and DCU-Net** networks are trained respectively, then the trained network is used to segment the images of the test sets. Where, DCU-Net* represents that the DSPP module in DCU-Net is replaced by the convolution block of 3×3×1024 in U-Net. DCU-Net** represents that the RD-Skip module in DCU-Net is replaced by the classical skip connection operation. By comparing the segmentation results with the evaluation metrics, it was verified that the advantages of the proposed DSPP module and RD-Skip module. In the second group of experiments, the DCU-Net and U-Net networks are trained respectively, Then, the trained network was used to segment the images of the test sets. It was verified that DCU-Net could achieve more accurate segmentation of tumor regions than U-Net.
The methods experiment on Keras framework based on the Tensorflow. The experimental machine uses an Intel Core i7 3.2GHZ processor equipped with an NVIDIA GeForce GTX1060 GPU.
Metrics
The evaluation metrics mainly includes dice similarity coefficient (DSC), sensitivity and Specificity.
Dice Coefficient. The Dice-Coefficient is the main metric and commonly used in biomedical image segmentation. This measure states the similarity between prediction and annotations. It penalizes false positive and false negative.
Sensitivity and Specificity. Sensitivity indicates the ratio between correct predicted voxels and true positive voxels. Specificity indicates the percentage of true negatives predicted. These metrics can help determine whether our method is over-segmenting or under-segmenting of the tumor regions.
Among them, TP, FP, TN and FN indicate the number of true positive, false positive, true negative points and false negative points, respectively. In Fig. 8, we can better understand the meaning of TP, FP, TN and FN. Where, figure (a) is the brain MR image of the patient, figure (b) is the whole tumor area manually labeled by the experts (ground-truth), and figure (c) shows the red outline of the complete tumor boundary predicted by the model. Figure (d) labeled the overlapped part of ground-truth and model prediction segmentation as TP, the over-segmented part as FP, the under-segmented part as FN, and the correctly predicted part as background (normal tissue) as TN. It is worth noting that the DSC score can be used as a comprehensive evaluation standard of Specificity and Sensitivity. Therefore, we then compare the DSC scores of different methods.

Clear representation of TP, FP, TN, and FN parts.
MR images of 30 unused brain tumor patients in the BRATS 2018 data set are randomly selected as the test set of experimental. The trained DCU-Net model is used to segment the test images of brain tumor, and the segmentation results are shown in Fig. 9. By comparing the GT image with the image segmented by the DCU-Net, it can be seen that the structure proposed in this paper performs well in the task of tumor segmentation, and can better segment the edema area (green), core tumor (red) and enhancing tumor (yellow).

The segmentation results of DCU-Net model. From top to bottom: TI sequences, ground truth, DCU-Net results
Experiment 1: It compares segmentation results of DCU-Net* and DCU-Net**. The experiment uses the same training set, and the trained DCU-Net* and DCU-Net** models are used to conduct tumor segmentation on the same test set. The segmentation results of DCU-Net*, DCU-Net** and DCU-Net are compared, as shown in Fig. 10. Where, GT- represents the ground truth, DCU- represents the segmentation result of the DCU-Net structure proposed in this paper, A- represents the segmentation result of the DCU-Net* model and B- represents the result of segmentation by the DCU-Net** model. The images are divided into four groups, including GT images, DCU-Net segmentation results, DCU-Net* results and DCU-Net** segmentation results. Taking GT segmentation images as the standard, compare the segmentation results of column 2, column 3 and column 4, in the same tumor area circled, the segmentation contour of DCU-Net is clearer and more accurate than that of the other two networks. It further indicates that DCU-Net can improve the segmentation effect of the network on the brain tumor boundary and reduce the segmentation results of false positive and false negative.

Comparison of segmentation results. From left to right: ground truth, segmentation results of DCU-Net, segmentation results of DCU-Net*, segmentation results of DCU-Net**.
The experiment further carried out qualitative evaluation on the segmentation results of DCU-Net, DCU-Net* and DCU-Net** according to the main evaluation metrics DSC, as shown in Fig. 11. It can be concluded from the comparison figure that the DCU-Net structure has a high segmentation accuracy for the whole tumor, especially for the core tumor region. It can be proved that DCU-Net have improved the segmentation accuracy of the tumor region.

Qualitative comparison of segmentation results.
Experiment 2: It compares the segmentation results of U-Net model. The experiment uses the same training set and test set, and then compares the segmentation results of U-Net and DCU-Net network with the ground truth, as shown in Fig. 12. Where GT- represents the gold standard, DCU- represents the segmentation result of the DCU-Net structure proposed in this paper, and U- represents the segmentation result of the classic U-Net model. The images are divided into three groups, including GT images, DCU-Net segmentation results and U–Net segmentation results. With GT segmentation as the standard, in the same tumor area circled, the segmentation effect of DCU-Net on the details of brain tumor is better than that of U-Net, and the segmentation of the core tumor is more accurate, especially the division of different tumor intervals, which can ensure the integrity and accuracy of brain tumor segmentation.

Comparison of segmentation results. From left to right: ground truth, segmentation results of DCU-Net, segmentation results of U-Net.
The experiment further carried out qualitative evaluation on the segmentation results of DCU-Net and U-Net according to the main evaluation metrics DSC, as shown in Fig. 13. It can be concluded that the segmentation accuracy of DCU-Net structure for the whole tumor is relatively high, especially for enhancing tumor region. We also compare the performance of DCU-Net and U-Net in model complexity and DSC evaluation index in detail, as shown in Table 4. We list the average running time cost for segmenting one MRI scan for time complexity comparison and the number of learnable parameters in different networks for space complexity comparison. It shows that comparing with U-Net, DCU-Net improves the segmentation efficiency of test data with less than 10% more training parameters. Table 4 also evaluates the DSC index and standard deviation of the two models in the segmentation of the whole tumor part. The results show that the segmentation performance of DCU model is more stable and accurate.

Qualitative comparison of segmentation results.
Comparison with U-Net model in complexity and DSC
In order to further evaluate the performance of the algorithm, using the same BRATS 2018 training data set, we qualitatively compared the proposed method, our DCU-Net model, with the top three in the Brain Tumor Segmentation Challenge 2018 leaderboard, among which the methods [8, 25] ranked first, second and third respectively. In addition, it is also compared with some excellent methods [14, 26–28]. The evaluation results of DCU-Net model and other models are listed in Table 5. It can be concluded that no algorithm can be ranked first in all evaluation indexes of all tumor regions, but the DCU-Net model has good performance on most metrics, and several of these algorithms are ranked first.
Quantitative results of proposed methods compared to the results from the excellent brain tumor segmentation algorithms published recently
Notes. The bold numbers highlight the scores best among these algorithms on the BRATS 2018 dataset.
Specifically, due to the expansion of receptive field due to the addition of dilated convolution, the DCU-Net structure proposed in this paper has a higher segmentation accuracy for the whole tumor, and the DSC index is higher than the model of Myronenko, who won the champion of BRATS challenge 2018. Moreover, DCU-Net can accurately fit the GT image to complete the segmentation of enhanced tumor. The number of pixels in the necrotic part of the tumor in the MR images of the patients was less than that of the edema area and the enhanced area, especially in the MR images of some patients with mild tumors. Due to the insufficient training data of the tumor necrosis part, the trained model’s segmentation accuracy of the tumor necrosis part was not high enough, which ultimately led to the insufficient DSC index of our model in the segmentation of the tumor core area. Therefore, there is still much room for improvement in the segmentation of core tumor.
In terms of sensitivity metrics, DCU-Net pre-processes the dataset by tailoring, alleviated the class imbalance problem and increased the percentage of tumor elements. Compared with other methods, it significantly alleviated the over-segmentation of tumor regions. Therefore, the sensitivity metrics of the result of DCU-Net segmentation is high, ranking first in Table 5. DCU-Net model ranked the first in Sensitivity, which can prove that DCU-Net can accurately identify and segment each label of tumor region.
In terms of specificity metrics, the score of DCU-Net is significant, which proves that our model can accurately and effectively segment the background region from the tumor region, with a low under-segmentation rate and over-segmentation rate.
In this paper, DCU-Net image segmentation algorithm for brain tumors is proposed to solve the problem of low precision and fuzzy boundary of each region. The main contributions can be summarized as follows: Improve skip connections. RD-Skip is designed to expand the receptive field from the low-level features of the down-sampling path. It can better integrate the feature images of the down-sampling path and up-sampling path. Merge the Dilated Spatial Pyramid Pooling (DSPP). Combined with dilated convolution cascade of different scales, DSPP is used to replace the max-pooling, which can not only expand the feature receptive fields, but also maintain the resolution of feature images. It can enhance the recognition and detection ability of small targets, as well as improve the expression ability of detailed features.
Finally, the DCU-Net algorithm proposed in this paper has been verified on the BRATS2018 dataset, and the segmentation results in the whole tumor region are satisfactory. Compared with the classical U-Net segmentation algorithm, the structure in this paper is more refined, which can effectively solve the over-segmentation and under-segmentation problems of brain tumor segmentation, and the segmentation results are more accurate.
Footnotes
Acknowledgments
This study was supported by key specialized research and development program of Henan Province (202102210170).
