The dilated dense U-net for spinal fracture lesions segmentation

Abstract

With the development of computer technology, more and more deep learning algorithms are widely used in medical image processing. Viewing CT images is a very usual and important way in diagnosing spinal fracture diseases, but correctly reading CT images and effectively segmenting spinal lesions or not is deeply depended on doctors’ clinical experiences. In this paper, we present a method of combining U-net, dense blocks and dilated convolution to segment lesions objectively, so as to give a help in diagnosing spinal diseases and provide a reference clinically. First, we preprocess and augment CT images of spinal lesions. Second, we present the DenseU-net network model consists of dense blocks and U-net to raise the depth of training network. Third, we introduce dilated convolution into DenseU-net to construct proposed DDU-net(Dilated Dense U-net), in order to raise receptive field of CT images for getting more lesions information. The experiments show that DDU-net has a good segmentation performance of spinal lesions, which can build a solid foundation for both doctors and patients.

Keywords

Deep learning Segmentation Dense U-net DDU-net(Dilated Dense U-net)

1 Introduction

Conventional way to view spinal CT images is a hard work for doctors and time costing, there are some errors and biases for different doctors’ clinical experiences and different diagnostic methods subjectively. The spinal CT images are also difficult to check and view for the irregular shape of vertebral boundary and low contrast of CT images. We can achieve objectively full overview of a whole slices of spinal CT images by reading CT images in computers, which is a very popular way in diagnosis and very convenient for both doctors and patients. Due to the characteristics of spine CT images, the popular segmentation algorithms of spine CT images are whether for some concretely type of CT images, or are not to maintain a balanced weight between segmentation accuracy and speed, meanwhile there are existing some works done manually. It will raise patients’ suffering and doctors’ burden if there are some errors and biases emerged in spinal lesions segmentation.

At present, many deep learning algorithms are widely applied in medical image processing and achieved a gigantic progress with development of computer technology. In this paper, we propose a framework to segment spinal fracture lesions for aiding the diagnoses of spinal diseases. The novel scheme is based on improved U-net [1], which can extract more spinal lesions feature information, so as to correctly segment spinal lesions, meanwhile keeping a relatively quick segmentation velocity. The contributions in this paper are:

1. We preprocess spinal fracture CT images collected from hospitals.

2. We combine dense blocks and U-net to construct DenseUnet to raise depth of training network and prompt spinal lesions segmentation performance.

3. We present DDU-net by introducing dilated convolution into DenseUnet to augment receptive field of spinal lesions CT images for accurately lesions segmentation.

The rest of this paper is as follows, in part 2, we give a overview of deep learning algorithms applied in medical image procession. In part 3, we process spinal CT images collected from hospitals and describe proposed DDU-net in detail. In part 4, we prove lesions segmentation performance of DDU-net by experiments. Finally, we conclude our work and give some weak points to conquer in future.

2 Related work

In recent years, there are more and more deep learning methods are applied in medical image processing.

2.1 Deep learning algorithms applied in medical images

Cheng,Z.,et al. [2] incorporated residual connection and Attention Gate block into medical image segmentation network for lung segmentation. Kaul,C.,et al. [3] incorporated attention within convolutional neural networks using feature maps generated by a separate convolutional auto encoder for skin cancer segmentation and lung lesion segmentation. Li,C.,et al. [4] proposed an attention-based nested segmentation network, named ANU-Net, which had a deep supervised encoder-decoder architecture and a redesigned dense skip connection for organ caner lesion segmentation. Gu,R.,et al. [5] proposed a comprehensive attention-based CNN (CA-Net) for more accurate and explainable medical image segmentation that was aware of the most important spatial positions, channels and scales at the same time. M Kolak,et al. [6] presented a fully automatic method for high resolution 3D volumetric segmentation of medical image data using modern supervised deep learning approach. Hassanzadeh, T. [7] proposed an evolutionary-based method to find a precise and small network for medical image segmentation from MRI(Magnetic Resonance Imaging) data. Cai, Y. [8] tried to eliminate semantic ambiguity in skip connection operations by adding attention gates (AGs), and used attention mechanisms to combine local features with their corresponding global dependencies. Yang, Jin [9] proposed a CT image segmentation algorithm based on depth learning to solve the problems of poor robustness, weak anti noise ability and low segmentation accuracy of existing image segmentation algorithms. Li,W.,et al. [10] proposed an improved segmentation model called the multi scale attention dense residual U-shaped network (MAD-UNet) for pancreas segmentation. He,Kelei,et al. [11] tackled the challenging task of prostate segmentation in CT images by a two-stage network with the first stage to fast localize, and the second stage to accurately segment the prostate. K He,et al. [12] proposed a two-stage framework for prostate segmentation, the first stage quickly localized the prostate region. Then, the second stage precisely segmented the prostate by a multi-task FCN-based on the U-Net architecture. He, K.,et al. [13] proposed a framework with the first stage to quickly localized the prostate region, and the second stage to precisely segmented the prostate by a multi-task UNet architecture. The methods mentioned above are the applications of deep learning algorithms in medical image processing.

2.2 Deep learning algorithms applied in spinal diseases

Now we focus on the deep learning algorithms applied in treatment for spinal diseases. Janssens, R. [14] presented a method to address the challenging problem of segmentation of lumbar vertebrae from CT images acquired with varying fields of view. Chuang [15] gave an iterative vertebrae instance segmentation model, which had good generalization ability for segmenting all types of vertebrae, including cervical, thoracic, and lumbar vertebrae. Liu, Y., et al. [16] applied a fully convolutional networks (FCN) to muscle segmentation at the L3 slice in the abdomen CT images. Ebrahimi, S. [17] aimed toward automated detection of specific vertebrae landmarks in spine radiographs, enabling automated adjustments. Chen, Y [18] proposed a scheme for automated identification and localization of vertebrae from CT(computed tomography) images. Lu [19] developed an efficient methodology to leverage the subject-matter-expertise stored in large-scale archival reporting and image data for a deep-learning approach to fully-automated lumbar spinal stenosis grading. Shi,D.,et al. [20] developed a two-step algorithm to localize and segment just vertebral bodies by taking the advantage of the intensity pattern along the front spinal region, as well as GPU accelerations using convolutional neural networks. Buerger,C.,et al. [21] proposed to use deep learning (DL) for MBS initialization and for robustly guiding MBS during segmentation to generate 24 instance segmentations for each and every vertebra. Now the U-net has been widely used in medical image processing, in this paper, we define a novel framework for spinal fracture lesion segmentation based on improved U-net.

3 Materials and methods

The scheme we presented is based on improved U-net to accurately segment spinal lesions, which is divided into two parts of training and testing and depicted in Fig. 1.

Fig. 1

Overview of flowchart for lesions segmentation by improved U-net.

3.1 Materials

We collect spinal fracture CT images from Xijing Hospital(Military Medical University of Air Force) and other hospitals. With the help of orthopedic residents in Xijing Hospital, we utilize the software Labelme to label spinal fracture lesions in every CT image, and classify spinal lesions as cfracture (cervical fracture), tfracture(thoracic fracture), lfracture(lumbar fracture).

3.1.1 Data preprocess

We may gain different information in CT images of same spinal fracture lesions, due to there are some errors and interferences in different CT equipments. So we should preprocess CT images of spinal lesions that gained from hospitals for training and testing. The pre-processing flow is shown in Fig. 2.

The format of spinal CT images we gained from hospitals is DICOM(Digital Imaging and Communications in Medicine), which contains many information about patients and spinal lesions. In Fig. 2, first, we input original CT images, and make equalization due to image pixels are different which are scanned by CT equipments with different thick layers. Then we use HU(Hounsfield Unit) value to process CT images, which is a measurement unit for measuring the density of local tissue or organ of human body. Usually, the HU value of air is - 1000, the HU value of pure water is 0, and the HU value of dense bone is + 1000. The HU value and pixel of spinal CT images have a linear relationship and can be switched each other. In this paper, we take HU value above 400, and make the HU value in [400, 2000] normalized as [0,1] to represent spine bones in CT images. Third, we utilize Gauss Filter to remove noises in CT images for training and testing. Finally, we extract the ROI(Region of Interest) of spinal fracture lesions by binary segmentation and mathematical morphology to focus on spinal fracture lesions.

Fig. 2

The preprocess of original CT images.

3.1.2 Data augmentation

We randomly select 40 series of spinal fracture CT images for training and testing, which consist of 5134 CT images, due to we can not include all kinds of spinal fracture lesions. Since deep learning algorithms need a large amount of data to train and test, we augment training data set by data enhancement, so the spinal CT images are rotated by 90 degree, 180 degree, 270 degree, horizontal flip and vertical flip.

Since there are some differences in spinal CT images and spinal fracture lesions, and some interferences in CT images preprocessing, we randomly select 3 series of CT images to avoid affecting final prediction results after data enhancement. Fig. 3 to Fig. 5 are selected samples of spinal CT images after data augmentation.

Fig. 3

The augmentation of cfracture lesions. (a) Original CT images. (b) Rotating 90 degree. (c) Rotating 180 degree. (d) Rotating 270 degree. (e) Horizontal flip. (f) Vertical flip.

Fig. 4

The augmentation of tfracture lesions. (a) Original CT images. (b) Rotating 90 degree. (c) Rotating 180 degree. (d) Rotating 270 degree. (e) Horizontal flip. (f) Vertical flip.

Fig. 5

The augmentation of lfracture lesions. (a) Original CT images. (b) Rotating 90 degree. (c) Rotating 180 degree. (d) Rotating 270 degree. (e) Horizontal flip. (f) Vertical flip.

As we can see that in Fig. 3(a), Fig. 4(a) and Fig. 5(a), the original spinal fracture(cfracture, tfracture, lfracture) lesions are not obvious and hard to segment correctly, we can get more useful information of spinal fracture lesions after data augmentation, because we can observe the fracture lesions from different views, which allows us to get more accurate fracture lesions feature information.

From Fig. 3 to Fig. 5, we do not change the sizes and shapes of spinal fracture lesions in CT images after we rotate and flip the CT images, but only change the directions spinal lesions, which is very critical and important to training network. The spinal fracture lesions with different directions are classified as different lesions in training network. So we can expand training data set by rotating and flipping the CT images.

3.1.3 Label spinal lesions in CT image

With the help of orthopedic residents in Xijing Hospital, we segment spinal fracture lesions in every CT image by using the software Labelme, Fig. 6 to Fig. 8 are the segmentation samples of spinal lesions, which are classified as cfracture, tfracture, lfracture. From Fig. 6 to Fig. 8, (a) is original CT image, (b) is the CT image labelled by Labelme. As we can see, cfracture lesion is represented in red in Fig. 6(b), tfracture lesion is represented in green in Fig. 7(b), and lfracture lesion is represented in yellow in Fig. 8(b),

Fig. 6

cfracture lesion: (a) original CT image, (b) labelled by Labelme.

Fig. 7

tfracture lesion: (a) original CT image, (b) labelled by Labelme.

Fig. 8

lfracture lesion: (a) original CT image, (b) labelled by Labelme.

3.2 Methodology

The network of U-net plays a very important role in medical image segmentation, but has some problems, such as having not enough depth of training network and having some errors and biases in lesions segmentation. In order to avoid the problem mentioned above, we propose a spinal fracture lesions segmentation framework by combining dense blocks, dilated convolution and U-net for accurate spinal lesions segmentation in this paper.

3.2.1 U-net

Ronneberger et al. [1] proposed the definition of U-net in 2015, which was a novel scheme of convolution neural network and was appropriate for medical images segmentation. The U-net is based on FCN(Fully Convolutional Network) J. Long et al. [24] and is proved that it has more better medical image segmentation performance than FCN by experiments, the architecture of U-net is shown in Fig. 9.

The U-net divides to contracting path(the encoder in left) and expanding path(the decoder in right).In left(the contracting path), there are four down-sampling blocks, and every down-sampling block is consisted of two 3*3 convolution layers(without padding) and one max pooling layer (2*2 convolution, stride = 2). The down-sampling blocks are used to extract the features information of medical images, and the feature channels of medical images become twice and the sizes of feature maps are halved after each down-sampling operation.

In right(the expanding path), there are also corresponding four up-sampling blocks, and every up-sampling block includes two 3*3 convolution layers (without padding) and one de-convolution layer. The copy and crop connects the feature maps obtained by different layers for up-sampling blocks. Correspondingly, the feature channels of medical images are halved and the sizes of feature maps are twice in up-sampling blocks. Then we map feature vectors of 64 channels to different classified results in the last one 1*1 convolution layer.

We utilize the Relu(Rectified Linear Unit) function to concatenate the output feature maps of different shallow and deep layers, in order to get more accurate spinal lesions segmentation in medical images. Due to there is no FC(Full Connected) layer in U-net, and using valid convolution operation in training, so the input sizes and output sizes of CT images are not need to be same for very large medical images.

3.2.2 Dense block

The means to prompt the segmentation performance of training network, whether is to expand training data set mentioned above, or is to deepen training network. The number of training network layers plays significant role in improving network performance, while blindly raises the number of training layers may cause gradient disappearance or gradient explosion, which can reduce the segmentation performance of training network. In this paper, we introduce dense blocks G. Huang et al. [23], to U-net to increase the depth of training network. The dense block is shown in Fig. 10, (a) is the conventional 3*3 convolution in U-net, (b) is a dense block with 5 layers while growth rate k=4.

Fig. 9

The U-net architecture. Zoomed-in for more details.

Fig. 10

Dense block. (a) The U-Net 3*3 convolution. (b) Dense block with 5 layers, growth rate k=4.

At present, dense block has achieved a gigantic progress and is widely applied in converting layer from layer connection mode of conventional neural network to cross layer connection mode, which is shown in Fig. 10(b). This greatly increasing the training network performance than traditional convolution neural network and residual block K. He et al. [22].

Comparing to conventional neural network, in which every layer get input from previous layer directly, then transmit the extracted feature information to next layer, which is shown in Fig. 10(a). As we can see in Fig. 10(b), every convolution layer has a aggregated inputs of all the previous layers in dense block, and every convolution layer can directly connect any layer in dense block. So every convolution layer extracts a few of features information then pass to all of posterior convolution layers to decrease redundant features. Otherwise, every convolution layer may need to extract a great deal of features information without dense block. The dense block also gives the definition of bottleneck layer and transition layer, which can cut down computation load of every convolution layer and prompt the features reusability.

3.2.3 DenseUnet

In order to avoid training network degradation, meanwhile to raise the depth of training network and improve the performance of training network for a better spinal fracture lesions segmentation, we introduce dense blocks into U-net to build a novel lesions segmentation scheme of DenseU-net.

We put dense blocks in contracting path of conventional U-net network to extract features information of spinal lesions in CT images, correspondingly, the expanding path which contains dense blocks will output the CT images of same sizes as the input CT images. Both contracting path and expanding path have four dense blocks and include corresponding convolutional layers in DenseU-net. Usually, we can recover the lost lesions feature information of contracting path in the expanding path by transmitting feature maps from the contracting path to the expanding path [25]. Therefore, the feature maps of the expanding path can be deemed as extracted as the contracting path. It is proved that the copy and crop operation of feature maps between the contracting path and the expanding path can achieve a better spinal lesions segmentation performance and decrease the computation and resources occupation of training network [26]. Hence, the feature maps of the contracting path are corresponding and related to the expanding path.

We suppose the output of the i ^th layer in traditional CNN (Convolution Neural Network) is expressed as follows:

$X_{i} = H_{i} (X_{i - 1}) .$ (1)

Here, X_i is the output of i_th layer, X_i-1 is the output of (i - 1) ^th layer, and H is the convolution followed by ReLu(Rectified Linear Unit). So the output of DenseNet is defined as: $X_{i} = H_{i} ([X_{0}, X_{1}, . . ., X_{i - 1}]) .$ (2)

Here, [...] is the copy and crop operation and H is the operations of BN(Batch Normalization), one ReLu and one 3*3 convolution layer. Every dense block has k feature-maps where k is the growth rate. We suppose the channels number of input layer is i₀, the feature maps number of i ^th layer is: $k_{i} = k_{0} + k * (i - 1) .$ (3)

In this paper, we introduce dense blocks to both the contracting path and the expanding path of U-net, and import dense blocks and transition layer, the concrete structure of DenseUnet is shown in Table 1.

Table 1

The concrete structure of our DenseUnet

Layers	Contents	Ouput
input		256*256
dense block1	A*6	256*256
transition layer1	1*1conv	256*256
	3*3max pooling1	128*128
dense block2	A*12	128*128
transition layer2	1*1conv	128*128
	3*3max pooling2	64*64
dense block3	A*24	64*64
transition layer3	1*1conv	64*64
	3*3max pooling3	32*32
dense block4	A*16	32*32
transition layer4	1*1conv	32*32
	3*3max pooling4	16*16
dense block5	A*16	16*16
dense block4	3*3upconv1, dense block4	32*32
dense block3	3*3upconv2, dense block3	64*64
dense block2	3*3upconv3, dense block2	128*128
dense block1	3*3upconv4, dense block1	256*256
output	1*1conv	256*256

In contracting path, every dense block has the convolution block (the matrix A in Table 1) consists of BN(batch normalization), Relu, 1*1 convolution, BN, Relu, and 3*3 convolution with growth rate k = 32. The growth rate is renovated layer by layer to keep there are equal dense blocks in the convolution layers. We have 6, 12, 24, 16 convolution blocks from dense block 1 to dense block 4 in the contracting path, which is shown in Table 1. Then the feature maps are transmitted to the expanding path through the copy and crop operations. In our DenseUnet, we reduce the CT images feature maps by transition layer, which includes BN, Relu, 1*1 convolution layer and one max pooling layer(stride = 2). The dense block 5 (i.e., 16*16) can be considered as bridge from the contracting path to the expanding path, which begin expanding feature maps. To the expanding path, we use upsampling layers to expand the CT images size. Correspondingly, there are also the matrix A consists of BN(batch normalization), Relu, 1*1 convolution, BN, Relu, and 3*3 convolution with growth rate k = 32 in the expanding path, then upsample and recover the CT images. Finally, we use the last 1*1 convolution for binary prediction. $A = [\begin{matrix} 1 & \times 1 conv \\ 3 & \times 3 conv \end{matrix}],$ (4)

3.2.4 Dilated convolution

The dilated convolution is presented by Yu and Koltun [27], which is widely used in semantic segmentation and object detection. Dilated convolution can augment feature information of receptive field without extra addition or complexity of training network, and can give a thorough utilized information of original CT image to get more accurate lesions segmentation. The basic dilated convolution is shown in Fig. 11, while the convolution kernel size is 3*3, the dilated rate is 1, the 1-dilated convolution kernel size is also 3*3 after inserting 0 to pixel of CT image for more receptive field, and 2-dilated convolution kernel size is 5*5, while the dilated rate is 2. In this paper, we use 2-dilated convolution to build our proposed network.

To the conventional 3*3 convolution, the kernel size K_d after dilating is as follows: $K_{d} = k + (k - 1) (r - 1) .$ (5) Here, k is the original kernel size, which is 3 in Fig. 11, and r is the dilated rate.

Fig. 11

Dilated convolution: (a) r = 1, conv kernel 3 * 3 (b) r = 2, conv kernel 5 * 5.

3.2.5 DDU-net(Dilated Dense U-net)

In our DenseUnet network, the unpadded operation of each 3*3 convolution has a missing boundary problem. We can gain more useful feature information of spinal lesions by dilated convolution, which gain more receptive field while not adding parameters and complexity of training network. Therefore, we propose the spinal fracture lesions segmentation framework of DDU-net, which is shown in Fig 12. We use 2-dilated convolution to replace the conventional 3*3 convolution in DenseUnet except the 1*1 convolution of last output layer, and maintain the pooling and up sampling.

Fig. 12

Our proposed DDU-net(Dilated Dense U-net) architecture.

3.2.6 Loss function

We use binary cross entropy as loss function, the output of training network is Mi ∈[0, 1], Xi ∈ [0, 1] is the accurate lesions segmentation, the loss function is: $loss = \sum X_{i} log M_{i} + (1 - X_{i}) log (1 - M_{i}) .$ (6) We use Nadam(Nesterov Adaptive Moment Estimation) to train spinal CT images, because Nadam has a full influence both on learning rate and update of gradient. Generally, Nadam also can be adopted for a better segmentation performance for root mean square prop.

4 Experiment and analysis

The experimental environment is CPU: Intel Core i7-6700 @ 3.40GHz * 8, memory 32G, graphics card GTX1070, 8G, 128G + 1T hard disk, operating system is Win10, Tensorflow 1.8.0, keras 2.1.5 and numpy 1.13.3 to train and test data. The learning rate=1e-4, epoch=10000.

4.1 Dataset

We use the spinal CT images from XiJing Hospital and other hospitals to train and test, consist of 40 cases, 5134 CT images. After data augmentation, the data set is expanded to 10268 CT images, and image pixels are 256*256. With the help of doctors from Xijing hospital, we label spinal fracture(cfracture, tfracture, lfracture) lesions in every CT image.

We randomly splits the train set and test set in accordance with the ratio of 4:1, 80% is used in training, 10% is used in testing and the other 10% is used in verifying. So 8216 CT images are for training network, 1026 CT images are experiments testing, and 1026 CT images are used for verification.

4.2 Evaluation metrics

In this paper, we use formulas as follow to analyze the spinal fracture lesions segmentation of our DDU-net, $sensitivity = \frac{TP}{TP + FN} .$ (7) $specificity = \frac{TN}{TN + FP} .$ (8) $accuracy = \frac{TP + TN}{TP + TN + FP + FN} .$ (9)

Here, TP is true positive which means the correctly detection lesions number. TN is true negative which means detection number of not spinal fracture lesions. FP is false positive which means the not spinal lesions number. FN is false negative which means the number of lesions detected not correctly. sensitivity means the percent of accurately spinal lesions prediction, itemize as spinal fracture lesions. specificity means the proportion of correctly non-spinal lesions prediction results. accuracy means the correction rate of detected spinal lesions to the global CT image.

The (Dice similarity coefficient) f1 value is the average of the precision and recall which can represent the accuracy of spinal lesions segmentation, and it is defined as follows $f 1 = \frac{2 TP}{2 TP + FP + FN} .$ (10)

The nice2 is an error score, which is obtained by averaging the FPR (False Positive Rate) and FNR(False Negative Rate). FNR and FPR are defined as follows:

$FNR = \frac{FN}{FN + TP} .$ (11) $FPR = \frac{FP}{TN + FP} .$ (12)

The formula of nice2 is as follows

$nice 2 = \frac{1}{2} (FPR + FNR) .$ (13)

4.3 Experimental results

In Figs. 13 to 15 are the predicted segmentation results of spinal fracture lesions in proposed network, Fig. 13 is predicted segmentation of cfracture lesions, which gives us the contours of cervical fracture lesions relative accurately. Fig. 14 is predicted segmentation of tfracture lesions, which correctly segments the thoracic fracture lesions in CT images. and Fig. 15 is predicted segmentation of lfracture lesions, which is relatively correct to construct the outlines of lumbar fracture lesions.

Fig. 13

cfracture predicting lesion. (a) input CT image (b) ground truth (c) DDU-net.

Fig. 14

tfracture predicting lesion. (a) input CT image (b) ground truth (c) DDU-net.

Fig. 15

lfracture predicting lesion. (a) input CT image (b) ground truth (c) DDU-net.

In order to evaluate the performance of proposed DDU-net, we randomly select three sets spinal CT images from test set, which include the predicted lesions of cervical fracture, thoracic fracture, lumbar fracture. From Figs. 13 to 15, as we can see that, (a) is original CT image, (b) is the lesions ground truth, (c) is the predicted lesions segmentation of our proposed DDU-net. We can see that the DDU-net can segment spinal fracture lesions accurately and effectively, which can get more detailed information of spinal fracture lesions and present assistance to doctors for diagnosing spinal diseases.

4.4 Discussions

4.4.1 Comparison in data augmentation

Table 2 is the comparative experiment before and after data augmentation by using proposed DDU-net. In Table 2, the sensitivity, specificity, f1, nice2 and accuracy are increased by 0.007, 0.008, 0.008, 0.004, 0.005 respectively after data augmentation.

Table 2
Comparison in data augmentation

Images No. Network Sen Spe f1 nice2 Acc

4108 DDU-net 0.823 0.965 0.968 1.22 0.976

8216 DDU-net 0.830 0.973 0.976 1.18 0.981

Images No.	Network	Sen	Spe	f1	nice2	Acc
4108	DDU-net	0.823	0.965	0.968	1.22	0.976
8216	DDU-net	0.830	0.973	0.976	1.18	0.981

The reason is that we differentiate spinal fracture lesions according to the sizes and shapes of lesions in CT images, and spinal lesions are classified as the same kinds of lesions, which have same sizes and shapes in CT images. We only change the directions and positions of spinal lesions in CT images after data augmentation, but we do not change the sizes and shapes of spinal lesions in CT images. So we can expand training data set to thoroughly extract the useful feature information of spinal lesions and maintain the robustness of training network, so as to fully learn and extract lesions feature information for accurately spinal lesions segmentation. The next experiments are all based on data augmentation.

4.4.2 Comparison in dilated convolution

Table 3 is the comparative experiment between the network of DenseU-net and DDU-net, as we can see the sensitivity, specificity, f1, nice2 and accuracy are increased by 0.004, 0.005,0.009, 0.002, 0.004 respectively after we introduce dilated convolution into DenseU-net.

Table 3
Comparison in dilated convolution

Network Sen Spe f1 nice2 Acc

DenseU-net 0.826 0.968 0.957 1.20 0.977

DDU-net 0.830 0.973 0.976 1.18 0.981

Network	Sen	Spe	f1	nice2	Acc
DenseU-net	0.826	0.968	0.957	1.20	0.977
DDU-net	0.830	0.973	0.976	1.18	0.981

The reason is that we can get more receptive field of the input spinal lesions CT images after we combine dilate convolution with DenseU-net, so as to learn and extract sufficient useful feature information of spinal lesions for accurately lesions segmentation.

4.4.3 Comparisons to other algorithms

We present the comparison of predicted spinal lesions segmentation results between our proposed DDU-net and other algorithms in Fig. 16, (a) input CT images (b) ground truth (c) U-net (d) Lu, Jentang, et al. [19] (e) our proposed DDU-net. We can see that the spinal lesions segmentation of our propose DDU-net are more clearly and nearly to lesions ground truth than other algorithms of Lu, Jentang, et al. [19] and U-net, which can provide assistance to doctors in diagnosing spinal diseases.

Fig. 16

The experiment results, (a) input CT image (b) ground truth (c)U-net (d) Lu, Jentang, et al. [19] (e) DDU-net. Zoomed-in view for more details.

Moreover, in Table 4, we present a quantitative comparison of DDU-net and other algorithms in sensitivity, specificity, f1 score, nice 2 score and accuracy to verify the performance of our proposed network. It can be observed that DDU-net has a relatively better effect and performance of spinal lesions segmentation. So the proposed DDU-net can construct a firm foundation for diagnosing spinal fracture diseases.

Table 4

Lesion Segmentation Comparison of algorithms

Network	Sen	Spe	f1	nice2	Acc
Chuang et al. [15]	0.825	0.968	0.968	1.21	0.975
Lu et al. [19]	0.817	0.961	0.957	1.23	0.970
U-net	0.798	0.952	0.937	1.28	0.965
DDU-net	0.830	0.973	0.976	1.18	0.981

The reason is that we can achieve sufficient and thoroughly feature information of spinal lesions after we introduce dense blocks and dilate convolution into conventional U-net, which can both improve the lesions segmentation performance of training network by deepening the training network, and achieve a large receptive field of spinal CT images for getting more clear and useful feature information of spinal lesions.

5 Conclusion

In this paper, we propose DDU-net network, which combines U-net, dense blocks and dilated convolution, to deepen the training network and increase the receptive field of CT images, so as to get more effective and accurate segmentation of spinal fracture lesions. The DDU-net achieves a better performance than other algorithms after experiments, which can provide assistance for diagnosing spinal diseases.

There are still some weak points need to conquer: 1. In CT images preprocessing and label lesions in CT images, there are a lot of manual workloads, which is a heavily workload, we try to make this automatically in next step. 2. The method we proposed in this paper is based on two-dimensional CT images, which may not exactly and have some errors and biases. so we will study how to segment lesions in three-dimensional convolution neural network in future.

6 Declarations

6.1 Ethical approval

Not applicable.

6.2 Consent to participate

Not applicable.

6.3 Consent to publish

Not applicable.

6.4 Authors contributions

Junsheng Wu contributed to the conception of the study;

Gang Sha performed the experiment, contributed significantly to analysis and manuscript preparation and the data analyses and wrote the manuscript;

Bin Yu helped perform the analysis with constructive discussions

6.5 Funding

The project in this paper is supported by Biomechanical Modeling of Lumbosacral Spine and Surgical Evaluation System", Fund Number Nos. 61172147 and 61502365.

6.6 Conflicts of interest/Competing interests

The authors declare that they have no conflicts of interest.

References

Ronneberger

, Fischer

and Brox

, ’U-Net: Convolutional networks for biomedical image segmentation,’ in Proc. MICCAI (2015), 234–241.

Cheng

, et al., Attention V-Net: A Residual U-Net with Attention Gate Block for Lung Organs At Risk Segmentation, CSAE 2020: The 4th International Conference on Computer Science and Application Engineering (2020).

Kaul

, Manandhar

and Pears

, FocusNet: An attention-based Fully Convolutional Network for Medical Image Segmentation, IEEE (2019).

, et al., ANU-Net: Attention-based Nested U-Net to exploit full resolution features for medical image segmentation, Computers & Graphics 90 (2020).

, et al., CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation, (2020).

Kolak

, et al., Optimized High Resolution 3D Dense-U-Net Network for Brain and Spine Segmentation, Applied Sciences 9(3) (2019).

Hassanzadeh

, Essam

D.L.

and Sarker

R.A.

, EvoU-Net: an evolutionary deep fully convolutional neural network for medical image segmentation, ACM (2020).

Cai

and Wang

, MA-Unet: An improved version of Unet based on multi-scale and attention mechanism for medical image segmentation, (2020).

Yang

and Qiu

, An improved segmentation algorithm of CT image based on U-Net network and attention mechanism, Multimedia Tools and Applications 1–24.

10.

, et al., MADâĂŘUNet: A deep UâĂŘshaped network combined with an attention mechanism for pancreas segmentation in CT images, Medical Physics (2020).

11.

, et al., HF-UNet: Learning Hierarchically Inter-Task Relevance in Multi-Task U-Net for Accurate Prostate Segmentation in CT images, IEEE Transactions on Medical Imaging 99 (2021), 1–1.

12.

, et al., TripletUNet: Multi-Task U-Net with Online Voxel-Wise Learning for Precise CT Prostate Segmentation. (2020).

13.

, et al., MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise Prostate Segmentation via Online Sampling, Medical Image Analysis 4 (2021), 102039.

14.

Janssens

, Zeng

and Zheng

, Fully automatic segmentation of lumbar vertebrae from CT images using cascaded 3D fully convolutional networks, 2018, 893–897.

15.

Chuang

, et al., Efficient Triple Output Network for Vertebral Segmentation and Identification, IEEE Access (2019), 117978–117985.

16.

[16] Liu

, et al., Muscle segmentation of L3 slice in abdomen CT images based on fully convolutional networks, 2019 Ninth International Conference on Image Processing Theory, Tools and Applications (IPTA) (2019).

17.

Ebrahimi

, Contribution to automatic adjustments of vertebrae landmarks on x-ray images for 3D reconstruction and quantification of clinical indices, (2017).

18.

Chen

, et al., Vertebrae Identification and Localization Utilizing Fully Convolutional Networks and a Hidden Markov Model, IEEE Transactions on Medical Imaging 99 (2019).

19.

J.T.

, et al., Deep SPINE: Automated Lumbar Vertebral Segmentation, Disc-level Designation, and Spinal Stenosis Grading Using Deep Learning, (2018).

20.

Shi

, et al., Automatic Localization and Segmentation of Vertebral Bodies in 3D CT Volumes with Deep Learning, the 2nd International Symposium (2018).

21.

Buerger

, et al., Combining deep learning and model-based segmentation for labeled spine CT segmentation, Image Processing (2020).

22.

, Zhang

, Ren

and Sun

, Deep residual learning for image recognition, in Proc. CVPR (2016), 770–778.

23.

Huang

, Liu

, Maaten

L.V.d.

and Weinberger

K.Q.

, âĂAIJDensely connected convolutional networks, âĂİ in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 2261–2269.

24.

Long

, Shelhamer

and Darrell

, Fully convolutional networks for semantic segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, (2015), 3431-3440. doi: 10.1109/CVPR.2015.7298965

25.

Drozdzal

, Vorontsov

, Chartrand

, Kadoury

and Pal

, âĂIJThe importance of skip connections in biomedical image segmentation, âĂİ arXiv:1608.04117, (2016).

26.

Zhai

, Liu

, Zhang

, Liu

, Li

and Cao

, âĂ IJMultiscale feature fusion single shot object detector based on DenseNet, âĂİin Intelligent Robotics and Applications, H. Yu, J. Liu, L. Liu, Z. Ju, Y. Liu, and D. Zhou, Eds., (2019), 450–460.

27.

and Koltun

, Multi-Scale Context Aggregation by Dilated Convolutions. (2016).

The dilated dense U-net for spinal fracture lesions segmentation

Abstract

Keywords

1 Introduction

2 Related work

2.1 Deep learning algorithms applied in medical images

2.2 Deep learning algorithms applied in spinal diseases

3 Materials and methods

3.1.1 Data preprocess

3.2.1 U-net

3.2.2 Dense block

4.1 Dataset

4.2 Evaluation metrics

4.4.1 Comparison in data augmentation

Table 2 Comparison in data augmentation Images No. Network Sen Spe f1 nice2 Acc 4108 DDU-net 0.823 0.965 0.968 1.22 0.976 8216 DDU-net 0.830 0.973 0.976 1.18 0.981

Table 3 Comparison in dilated convolution Network Sen Spe f1 nice2 Acc DenseU-net 0.826 0.968 0.957 1.20 0.977 DDU-net 0.830 0.973 0.976 1.18 0.981

6 Declarations

6.1 Ethical approval

6.2 Consent to participate

6.3 Consent to publish

6.4 Authors contributions

6.5 Funding

6.6 Conflicts of interest/Competing interests

References

Table 2
Comparison in data augmentation

Images No. Network Sen Spe f1 nice2 Acc

4108 DDU-net 0.823 0.965 0.968 1.22 0.976

8216 DDU-net 0.830 0.973 0.976 1.18 0.981

Table 3
Comparison in dilated convolution

Network Sen Spe f1 nice2 Acc

DenseU-net 0.826 0.968 0.957 1.20 0.977

DDU-net 0.830 0.973 0.976 1.18 0.981