Abstract
With the development of computer technology, more and more deep learning algorithms are widely used in medical image processing. Viewing CT images is a very usual and important way in diagnosing spinal fracture diseases, but correctly reading CT images and effectively segmenting spinal lesions or not is deeply depended on doctors’ clinical experiences. In this paper, we present a method of combining U-net, dense blocks and dilated convolution to segment lesions objectively, so as to give a help in diagnosing spinal diseases and provide a reference clinically. First, we preprocess and augment CT images of spinal lesions. Second, we present the DenseU-net network model consists of dense blocks and U-net to raise the depth of training network. Third, we introduce dilated convolution into DenseU-net to construct proposed DDU-net(Dilated Dense U-net), in order to raise receptive field of CT images for getting more lesions information. The experiments show that DDU-net has a good segmentation performance of spinal lesions, which can build a solid foundation for both doctors and patients.
Introduction
Conventional way to view spinal CT images is a hard work for doctors and time costing, there are some errors and biases for different doctors’ clinical experiences and different diagnostic methods subjectively. The spinal CT images are also difficult to check and view for the irregular shape of vertebral boundary and low contrast of CT images. We can achieve objectively full overview of a whole slices of spinal CT images by reading CT images in computers, which is a very popular way in diagnosis and very convenient for both doctors and patients. Due to the characteristics of spine CT images, the popular segmentation algorithms of spine CT images are whether for some concretely type of CT images, or are not to maintain a balanced weight between segmentation accuracy and speed, meanwhile there are existing some works done manually. It will raise patients’ suffering and doctors’ burden if there are some errors and biases emerged in spinal lesions segmentation.
At present, many deep learning algorithms are widely applied in medical image processing and achieved a gigantic progress with development of computer technology. In this paper, we propose a framework to segment spinal fracture lesions for aiding the diagnoses of spinal diseases. The novel scheme is based on improved U-net [1], which can extract more spinal lesions feature information, so as to correctly segment spinal lesions, meanwhile keeping a relatively quick segmentation velocity. The contributions in this paper are:
1. We preprocess spinal fracture CT images collected from hospitals.
2. We combine dense blocks and U-net to construct DenseUnet to raise depth of training network and prompt spinal lesions segmentation performance.
3. We present DDU-net by introducing dilated convolution into DenseUnet to augment receptive field of spinal lesions CT images for accurately lesions segmentation.
The rest of this paper is as follows, in part 2, we give a overview of deep learning algorithms applied in medical image procession. In part 3, we process spinal CT images collected from hospitals and describe proposed DDU-net in detail. In part 4, we prove lesions segmentation performance of DDU-net by experiments. Finally, we conclude our work and give some weak points to conquer in future.
Related work
In recent years, there are more and more deep learning methods are applied in medical image processing.
Deep learning algorithms applied in medical images
Cheng,Z.,et al. [2] incorporated residual connection and Attention Gate block into medical image segmentation network for lung segmentation. Kaul,C.,et al. [3] incorporated attention within convolutional neural networks using feature maps generated by a separate convolutional auto encoder for skin cancer segmentation and lung lesion segmentation. Li,C.,et al. [4] proposed an attention-based nested segmentation network, named ANU-Net, which had a deep supervised encoder-decoder architecture and a redesigned dense skip connection for organ caner lesion segmentation. Gu,R.,et al. [5] proposed a comprehensive attention-based CNN (CA-Net) for more accurate and explainable medical image segmentation that was aware of the most important spatial positions, channels and scales at the same time. M Kolak,et al. [6] presented a fully automatic method for high resolution 3D volumetric segmentation of medical image data using modern supervised deep learning approach. Hassanzadeh, T. [7] proposed an evolutionary-based method to find a precise and small network for medical image segmentation from MRI(Magnetic Resonance Imaging) data. Cai, Y. [8] tried to eliminate semantic ambiguity in skip connection operations by adding attention gates (AGs), and used attention mechanisms to combine local features with their corresponding global dependencies. Yang, Jin [9] proposed a CT image segmentation algorithm based on depth learning to solve the problems of poor robustness, weak anti noise ability and low segmentation accuracy of existing image segmentation algorithms. Li,W.,et al. [10] proposed an improved segmentation model called the multi scale attention dense residual U-shaped network (MAD-UNet) for pancreas segmentation. He,Kelei,et al. [11] tackled the challenging task of prostate segmentation in CT images by a two-stage network with the first stage to fast localize, and the second stage to accurately segment the prostate. K He,et al. [12] proposed a two-stage framework for prostate segmentation, the first stage quickly localized the prostate region. Then, the second stage precisely segmented the prostate by a multi-task FCN-based on the U-Net architecture. He, K.,et al. [13] proposed a framework with the first stage to quickly localized the prostate region, and the second stage to precisely segmented the prostate by a multi-task UNet architecture. The methods mentioned above are the applications of deep learning algorithms in medical image processing.
Deep learning algorithms applied in spinal diseases
Now we focus on the deep learning algorithms applied in treatment for spinal diseases. Janssens, R. [14] presented a method to address the challenging problem of segmentation of lumbar vertebrae from CT images acquired with varying fields of view. Chuang [15] gave an iterative vertebrae instance segmentation model, which had good generalization ability for segmenting all types of vertebrae, including cervical, thoracic, and lumbar vertebrae. Liu, Y., et al. [16] applied a fully convolutional networks (FCN) to muscle segmentation at the L3 slice in the abdomen CT images. Ebrahimi, S. [17] aimed toward automated detection of specific vertebrae landmarks in spine radiographs, enabling automated adjustments. Chen, Y [18] proposed a scheme for automated identification and localization of vertebrae from CT(computed tomography) images. Lu [19] developed an efficient methodology to leverage the subject-matter-expertise stored in large-scale archival reporting and image data for a deep-learning approach to fully-automated lumbar spinal stenosis grading. Shi,D.,et al. [20] developed a two-step algorithm to localize and segment just vertebral bodies by taking the advantage of the intensity pattern along the front spinal region, as well as GPU accelerations using convolutional neural networks. Buerger,C.,et al. [21] proposed to use deep learning (DL) for MBS initialization and for robustly guiding MBS during segmentation to generate 24 instance segmentations for each and every vertebra. Now the U-net has been widely used in medical image processing, in this paper, we define a novel framework for spinal fracture lesion segmentation based on improved U-net.
Materials and methods
The scheme we presented is based on improved U-net to accurately segment spinal lesions, which is divided into two parts of training and testing and depicted in Fig. 1.

Overview of flowchart for lesions segmentation by improved U-net.
We collect spinal fracture CT images from Xijing Hospital(Military Medical University of Air Force) and other hospitals. With the help of orthopedic residents in Xijing Hospital, we utilize the software Labelme to label spinal fracture lesions in every CT image, and classify spinal lesions as cfracture (cervical fracture), tfracture(thoracic fracture), lfracture(lumbar fracture).
Data preprocess
We may gain different information in CT images of same spinal fracture lesions, due to there are some errors and interferences in different CT equipments. So we should preprocess CT images of spinal lesions that gained from hospitals for training and testing. The pre-processing flow is shown in Fig. 2.
The format of spinal CT images we gained from hospitals is DICOM(Digital Imaging and Communications in Medicine), which contains many information about patients and spinal lesions. In Fig. 2, first, we input original CT images, and make equalization due to image pixels are different which are scanned by CT equipments with different thick layers. Then we use HU(Hounsfield Unit) value to process CT images, which is a measurement unit for measuring the density of local tissue or organ of human body. Usually, the HU value of air is - 1000, the HU value of pure water is 0, and the HU value of dense bone is + 1000. The HU value and pixel of spinal CT images have a linear relationship and can be switched each other. In this paper, we take HU value above 400, and make the HU value in [400, 2000] normalized as [0,1] to represent spine bones in CT images. Third, we utilize Gauss Filter to remove noises in CT images for training and testing. Finally, we extract the ROI(Region of Interest) of spinal fracture lesions by binary segmentation and mathematical morphology to focus on spinal fracture lesions.

The preprocess of original CT images.
We randomly select 40 series of spinal fracture CT images for training and testing, which consist of 5134 CT images, due to we can not include all kinds of spinal fracture lesions. Since deep learning algorithms need a large amount of data to train and test, we augment training data set by data enhancement, so the spinal CT images are rotated by 90 degree, 180 degree, 270 degree, horizontal flip and vertical flip.
Since there are some differences in spinal CT images and spinal fracture lesions, and some interferences in CT images preprocessing, we randomly select 3 series of CT images to avoid affecting final prediction results after data enhancement. Fig. 3 to Fig. 5 are selected samples of spinal CT images after data augmentation.

The augmentation of cfracture lesions. (a) Original CT images. (b) Rotating 90 degree. (c) Rotating 180 degree. (d) Rotating 270 degree. (e) Horizontal flip. (f) Vertical flip.

The augmentation of tfracture lesions. (a) Original CT images. (b) Rotating 90 degree. (c) Rotating 180 degree. (d) Rotating 270 degree. (e) Horizontal flip. (f) Vertical flip.

The augmentation of lfracture lesions. (a) Original CT images. (b) Rotating 90 degree. (c) Rotating 180 degree. (d) Rotating 270 degree. (e) Horizontal flip. (f) Vertical flip.
As we can see that in Fig. 3(a), Fig. 4(a) and Fig. 5(a), the original spinal fracture(cfracture, tfracture, lfracture) lesions are not obvious and hard to segment correctly, we can get more useful information of spinal fracture lesions after data augmentation, because we can observe the fracture lesions from different views, which allows us to get more accurate fracture lesions feature information.
From Fig. 3 to Fig. 5, we do not change the sizes and shapes of spinal fracture lesions in CT images after we rotate and flip the CT images, but only change the directions spinal lesions, which is very critical and important to training network. The spinal fracture lesions with different directions are classified as different lesions in training network. So we can expand training data set by rotating and flipping the CT images.
With the help of orthopedic residents in Xijing Hospital, we segment spinal fracture lesions in every CT image by using the software Labelme, Fig. 6 to Fig. 8 are the segmentation samples of spinal lesions, which are classified as cfracture, tfracture, lfracture. From Fig. 6 to Fig. 8, (a) is original CT image, (b) is the CT image labelled by Labelme. As we can see, cfracture lesion is represented in red in Fig. 6(b), tfracture lesion is represented in green in Fig. 7(b), and lfracture lesion is represented in yellow in Fig. 8(b),

cfracture lesion: (a) original CT image, (b) labelled by Labelme.

tfracture lesion: (a) original CT image, (b) labelled by Labelme.

lfracture lesion: (a) original CT image, (b) labelled by Labelme.
The network of U-net plays a very important role in medical image segmentation, but has some problems, such as having not enough depth of training network and having some errors and biases in lesions segmentation. In order to avoid the problem mentioned above, we propose a spinal fracture lesions segmentation framework by combining dense blocks, dilated convolution and U-net for accurate spinal lesions segmentation in this paper.
U-net
Ronneberger et al. [1] proposed the definition of U-net in 2015, which was a novel scheme of convolution neural network and was appropriate for medical images segmentation. The U-net is based on FCN(Fully Convolutional Network) J. Long et al. [24] and is proved that it has more better medical image segmentation performance than FCN by experiments, the architecture of U-net is shown in Fig. 9.
The U-net divides to contracting path(the encoder in left) and expanding path(the decoder in right).In left(the contracting path), there are four down-sampling blocks, and every down-sampling block is consisted of two 3*3 convolution layers(without padding) and one max pooling layer (2*2 convolution, stride = 2). The down-sampling blocks are used to extract the features information of medical images, and the feature channels of medical images become twice and the sizes of feature maps are halved after each down-sampling operation.
In right(the expanding path), there are also corresponding four up-sampling blocks, and every up-sampling block includes two 3*3 convolution layers (without padding) and one de-convolution layer. The copy and crop connects the feature maps obtained by different layers for up-sampling blocks. Correspondingly, the feature channels of medical images are halved and the sizes of feature maps are twice in up-sampling blocks. Then we map feature vectors of 64 channels to different classified results in the last one 1*1 convolution layer.
We utilize the Relu(Rectified Linear Unit) function to concatenate the output feature maps of different shallow and deep layers, in order to get more accurate spinal lesions segmentation in medical images. Due to there is no FC(Full Connected) layer in U-net, and using valid convolution operation in training, so the input sizes and output sizes of CT images are not need to be same for very large medical images.
Dense block
The means to prompt the segmentation performance of training network, whether is to expand training data set mentioned above, or is to deepen training network. The number of training network layers plays significant role in improving network performance, while blindly raises the number of training layers may cause gradient disappearance or gradient explosion, which can reduce the segmentation performance of training network. In this paper, we introduce dense blocks G. Huang et al. [23], to U-net to increase the depth of training network. The dense block is shown in Fig. 10, (a) is the conventional 3*3 convolution in U-net, (b) is a dense block with 5 layers while growth rate k=4.

The U-net architecture. Zoomed-in for more details.

Dense block. (a) The U-Net 3*3 convolution. (b) Dense block with 5 layers, growth rate k=4.
At present, dense block has achieved a gigantic progress and is widely applied in converting layer from layer connection mode of conventional neural network to cross layer connection mode, which is shown in Fig. 10(b). This greatly increasing the training network performance than traditional convolution neural network and residual block K. He et al. [22].
Comparing to conventional neural network, in which every layer get input from previous layer directly, then transmit the extracted feature information to next layer, which is shown in Fig. 10(a). As we can see in Fig. 10(b), every convolution layer has a aggregated inputs of all the previous layers in dense block, and every convolution layer can directly connect any layer in dense block. So every convolution layer extracts a few of features information then pass to all of posterior convolution layers to decrease redundant features. Otherwise, every convolution layer may need to extract a great deal of features information without dense block. The dense block also gives the definition of bottleneck layer and transition layer, which can cut down computation load of every convolution layer and prompt the features reusability.
In order to avoid training network degradation, meanwhile to raise the depth of training network and improve the performance of training network for a better spinal fracture lesions segmentation, we introduce dense blocks into U-net to build a novel lesions segmentation scheme of DenseU-net.
We put dense blocks in contracting path of conventional U-net network to extract features information of spinal lesions in CT images, correspondingly, the expanding path which contains dense blocks will output the CT images of same sizes as the input CT images. Both contracting path and expanding path have four dense blocks and include corresponding convolutional layers in DenseU-net. Usually, we can recover the lost lesions feature information of contracting path in the expanding path by transmitting feature maps from the contracting path to the expanding path [25]. Therefore, the feature maps of the expanding path can be deemed as extracted as the contracting path. It is proved that the copy and crop operation of feature maps between the contracting path and the expanding path can achieve a better spinal lesions segmentation performance and decrease the computation and resources occupation of training network [26]. Hence, the feature maps of the contracting path are corresponding and related to the expanding path.
We suppose the output of the i th layer in traditional CNN (Convolution Neural Network) is expressed as follows:
Here, X
i
is the output of i
th
layer, Xi-1 is the output of (i - 1)
th
layer, and H is the convolution followed by ReLu(Rectified Linear Unit). So the output of DenseNet is defined as:
Here, [...] is the copy and crop operation and H is the operations of BN(Batch Normalization), one ReLu and one 3*3 convolution layer. Every dense block has k feature-maps where k is the growth rate. We suppose the channels number of input layer is i0, the feature maps number of i
th
layer is:
In this paper, we introduce dense blocks to both the contracting path and the expanding path of U-net, and import dense blocks and transition layer, the concrete structure of DenseUnet is shown in Table 1.
The concrete structure of our DenseUnet
In contracting path, every dense block has the convolution block (the matrix A in Table 1) consists of BN(batch normalization), Relu, 1*1 convolution, BN, Relu, and 3*3 convolution with growth rate k = 32. The growth rate is renovated layer by layer to keep there are equal dense blocks in the convolution layers. We have 6, 12, 24, 16 convolution blocks from dense block 1 to dense block 4 in the contracting path, which is shown in Table 1. Then the feature maps are transmitted to the expanding path through the copy and crop operations. In our DenseUnet, we reduce the CT images feature maps by transition layer, which includes BN, Relu, 1*1 convolution layer and one max pooling layer(stride = 2). The dense block 5 (i.e., 16*16) can be considered as bridge from the contracting path to the expanding path, which begin expanding feature maps. To the expanding path, we use upsampling layers to expand the CT images size. Correspondingly, there are also the matrix A consists of BN(batch normalization), Relu, 1*1 convolution, BN, Relu, and 3*3 convolution with growth rate k = 32 in the expanding path, then upsample and recover the CT images. Finally, we use the last 1*1 convolution for binary prediction.
The dilated convolution is presented by Yu and Koltun [27], which is widely used in semantic segmentation and object detection. Dilated convolution can augment feature information of receptive field without extra addition or complexity of training network, and can give a thorough utilized information of original CT image to get more accurate lesions segmentation. The basic dilated convolution is shown in Fig. 11, while the convolution kernel size is 3*3, the dilated rate is 1, the 1-dilated convolution kernel size is also 3*3 after inserting 0 to pixel of CT image for more receptive field, and 2-dilated convolution kernel size is 5*5, while the dilated rate is 2. In this paper, we use 2-dilated convolution to build our proposed network.
To the conventional 3*3 convolution, the kernel size K
d
after dilating is as follows:

Dilated convolution: (a) r = 1, conv kernel 3 * 3 (b) r = 2, conv kernel 5 * 5.
In our DenseUnet network, the unpadded operation of each 3*3 convolution has a missing boundary problem. We can gain more useful feature information of spinal lesions by dilated convolution, which gain more receptive field while not adding parameters and complexity of training network. Therefore, we propose the spinal fracture lesions segmentation framework of DDU-net, which is shown in Fig 12. We use 2-dilated convolution to replace the conventional 3*3 convolution in DenseUnet except the 1*1 convolution of last output layer, and maintain the pooling and up sampling.

Our proposed DDU-net(Dilated Dense U-net) architecture.
We use binary cross entropy as loss function, the output of training network is Mi ∈[0, 1], Xi ∈ [0, 1] is the accurate lesions segmentation, the loss function is:
The experimental environment is CPU: Intel Core i7-6700 @ 3.40GHz * 8, memory 32G, graphics card GTX1070, 8G, 128G + 1T hard disk, operating system is Win10, Tensorflow 1.8.0, keras 2.1.5 and numpy 1.13.3 to train and test data. The learning rate=1e-4, epoch=10000.
Dataset
We use the spinal CT images from XiJing Hospital and other hospitals to train and test, consist of 40 cases, 5134 CT images. After data augmentation, the data set is expanded to 10268 CT images, and image pixels are 256*256. With the help of doctors from Xijing hospital, we label spinal fracture(cfracture, tfracture, lfracture) lesions in every CT image.
We randomly splits the train set and test set in accordance with the ratio of 4:1, 80% is used in training, 10% is used in testing and the other 10% is used in verifying. So 8216 CT images are for training network, 1026 CT images are experiments testing, and 1026 CT images are used for verification.
Evaluation metrics
In this paper, we use formulas as follow to analyze the spinal fracture lesions segmentation of our DDU-net,
Here, TP is true positive which means the correctly detection lesions number. TN is true negative which means detection number of not spinal fracture lesions. FP is false positive which means the not spinal lesions number. FN is false negative which means the number of lesions detected not correctly. sensitivity means the percent of accurately spinal lesions prediction, itemize as spinal fracture lesions. specificity means the proportion of correctly non-spinal lesions prediction results. accuracy means the correction rate of detected spinal lesions to the global CT image.
The (Dice similarity coefficient) f1 value is the average of the precision and recall which can represent the accuracy of spinal lesions segmentation, and it is defined as follows
The nice2 is an error score, which is obtained by averaging the FPR (False Positive Rate) and FNR(False Negative Rate). FNR and FPR are defined as follows:
The formula of nice2 is as follows
In Figs. 13 to 15 are the predicted segmentation results of spinal fracture lesions in proposed network, Fig. 13 is predicted segmentation of cfracture lesions, which gives us the contours of cervical fracture lesions relative accurately. Fig. 14 is predicted segmentation of tfracture lesions, which correctly segments the thoracic fracture lesions in CT images. and Fig. 15 is predicted segmentation of lfracture lesions, which is relatively correct to construct the outlines of lumbar fracture lesions.

cfracture predicting lesion. (a) input CT image (b) ground truth (c) DDU-net.

tfracture predicting lesion. (a) input CT image (b) ground truth (c) DDU-net.

lfracture predicting lesion. (a) input CT image (b) ground truth (c) DDU-net.
In order to evaluate the performance of proposed DDU-net, we randomly select three sets spinal CT images from test set, which include the predicted lesions of cervical fracture, thoracic fracture, lumbar fracture. From Figs. 13 to 15, as we can see that, (a) is original CT image, (b) is the lesions ground truth, (c) is the predicted lesions segmentation of our proposed DDU-net. We can see that the DDU-net can segment spinal fracture lesions accurately and effectively, which can get more detailed information of spinal fracture lesions and present assistance to doctors for diagnosing spinal diseases.
Comparison in data augmentation
Table 2 is the comparative experiment before and after data augmentation by using proposed DDU-net. In Table 2, the sensitivity, specificity, f1, nice2 and accuracy are increased by 0.007, 0.008, 0.008, 0.004, 0.005 respectively after data augmentation.
Comparison in data augmentation
Comparison in data augmentation
The reason is that we differentiate spinal fracture lesions according to the sizes and shapes of lesions in CT images, and spinal lesions are classified as the same kinds of lesions, which have same sizes and shapes in CT images. We only change the directions and positions of spinal lesions in CT images after data augmentation, but we do not change the sizes and shapes of spinal lesions in CT images. So we can expand training data set to thoroughly extract the useful feature information of spinal lesions and maintain the robustness of training network, so as to fully learn and extract lesions feature information for accurately spinal lesions segmentation. The next experiments are all based on data augmentation.
Table 3 is the comparative experiment between the network of DenseU-net and DDU-net, as we can see the sensitivity, specificity, f1, nice2 and accuracy are increased by 0.004, 0.005,0.009, 0.002, 0.004 respectively after we introduce dilated convolution into DenseU-net.
Comparison in dilated convolution
Comparison in dilated convolution
The reason is that we can get more receptive field of the input spinal lesions CT images after we combine dilate convolution with DenseU-net, so as to learn and extract sufficient useful feature information of spinal lesions for accurately lesions segmentation.
We present the comparison of predicted spinal lesions segmentation results between our proposed DDU-net and other algorithms in Fig. 16, (a) input CT images (b) ground truth (c) U-net (d) Lu, Jentang, et al. [19] (e) our proposed DDU-net. We can see that the spinal lesions segmentation of our propose DDU-net are more clearly and nearly to lesions ground truth than other algorithms of Lu, Jentang, et al. [19] and U-net, which can provide assistance to doctors in diagnosing spinal diseases.

The experiment results, (a) input CT image (b) ground truth (c)U-net (d) Lu, Jentang, et al. [19] (e) DDU-net. Zoomed-in view for more details.
Moreover, in Table 4, we present a quantitative comparison of DDU-net and other algorithms in sensitivity, specificity, f1 score, nice 2 score and accuracy to verify the performance of our proposed network. It can be observed that DDU-net has a relatively better effect and performance of spinal lesions segmentation. So the proposed DDU-net can construct a firm foundation for diagnosing spinal fracture diseases.
Lesion Segmentation Comparison of algorithms
The reason is that we can achieve sufficient and thoroughly feature information of spinal lesions after we introduce dense blocks and dilate convolution into conventional U-net, which can both improve the lesions segmentation performance of training network by deepening the training network, and achieve a large receptive field of spinal CT images for getting more clear and useful feature information of spinal lesions.
In this paper, we propose DDU-net network, which combines U-net, dense blocks and dilated convolution, to deepen the training network and increase the receptive field of CT images, so as to get more effective and accurate segmentation of spinal fracture lesions. The DDU-net achieves a better performance than other algorithms after experiments, which can provide assistance for diagnosing spinal diseases.
There are still some weak points need to conquer: 1. In CT images preprocessing and label lesions in CT images, there are a lot of manual workloads, which is a heavily workload, we try to make this automatically in next step. 2. The method we proposed in this paper is based on two-dimensional CT images, which may not exactly and have some errors and biases. so we will study how to segment lesions in three-dimensional convolution neural network in future.
Declarations
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent to publish
Not applicable.
Authors contributions
Junsheng Wu contributed to the conception of the study; Gang Sha performed the experiment, contributed significantly to analysis and manuscript preparation and the data analyses and wrote the manuscript; Bin Yu helped perform the analysis with constructive discussions
Funding
The project in this paper is supported by Biomechanical Modeling of Lumbosacral Spine and Surgical Evaluation System", Fund Number Nos. 61172147 and 61502365.
Conflicts of interest/Competing interests
The authors declare that they have no conflicts of interest.
