Abstract
Precise segmentation of lung parenchyma is essential for effective analysis of the lung. Due to the obvious contrast and large regional area compared to other tissues in the chest, lung tissue is less difficult to segment. Special attention to details of lung segmentation is also needed. To improve the quality and speed of segmentation of lung parenchyma based on computed tomography (CT) or computed tomography angiography (CTA) images, the 4th International Symposium on Image Computing and Digital Medicine (ISICDM 2020) provides interesting and valuable research ideas and approaches. For the work of lung parenchyma segmentation, 9 of the 12 participating teams used the U-Net network or its modified forms, and others used the methods to improve the segmentation accuracy include attention mechanism, multi-scale feature information fusion. Among them, U-Net achieves the best results including that the final dice coefficient of CT segmentation is 0.991 and the final dice coefficient of CTA segmentation is 0.984. In addition, attention U-Net and nnU-Net network also performs well. In this paper, the methods chosen by 12 teams from different research groups are evaluated and their segmentation results are analyzed for the study and references to those involved.
Keywords
Introduction
The lungs are the respiratory organs of the human body. The structure of the lungs consists of the lung parenchyma and the interstitium. The role of the parenchyma is to carry out the gas exchange of the lungs, and its organs include the alveoli and alveolar walls. The 2018 Global Cancer Statistics [1] study showed that lung cancer, which accounts for 11.6%of overall cancers, is already the most common and death of cancer today (18.4%of total cancer deaths). Currently, nearly all pre-processing of medical imaging for lung disease includes segmentation of lung parenchyma. The lung lesion is located within the lung parenchyma [2]. Lung parenchyma is filled with air and has a relatively low density. Due to the different densities, it can be separated from the rib cage and surrounding tissues. Other tissues outside of the lung parenchyma should be removed before lung disease is detected. The initial segmentation of the lung parenchyma is done by removing other tissues outside of the lung parenchyma from the original images, which can reduce the amount of computation, speed up the implementation process and minimize the interference with regions outside the lung parenchyma. Due to the heterogeneity of lung lobes [3], the proximity of gray levels of different soft tissues, anatomical variability, and the differences in scanners and scanning protocols and radiation doses, it remains a difficult task to improve the quality and speed of lung parenchyma segmentation in medical imaging.
With the rapid development of medical imaging technology, accurate segmentation of lung parenchyma from computed tomography (CT) images [4–6] is a key step in computer-aided diagnosis technology [7]. Computer-aided diagnosis (CAD) [8] refers to use computer technology to compute and analyze images, pathology and other data to assist in the discovery of lesions, which can improve the accuracy of diagnosis. Accurate lung segmentation can improve the efficiency of the overall CAD system and reduce misdiagnosis. Thus, it is clear that lung parenchyma segmentation is of great significance. Currently, a wide range of lung segmentation methods can be divided into three categories: conventional methods, machine learning methods and deep learning methods.
Conventional methods, in general, are of three broad types: threshold methods [9], region growing methods [10] and edge methods [11]. Simple and fast threshold methods are a common technique for the parallel segmentation of regions. The grayscale characteristics of the image are used to select one or several thresholds to divide the grayscale histogram of the image into two or more classes, and then achieve image segmentation. But it cannot include nodules, which are attached to the marginal regions of the lung parenchyma. The region growing method is a semi-automatic segmentation algorithm, in which the process starts with a given pixel and then matches it by its neighboring pixels [12]. When the former is matched, both are considered to belong to the same region. Otherwise, the two will become part of two different regions. Although this method can segment high-density regions attached to the edges of the lung parenchyma, it cannot segment when pathological conditions are adjacent to neighboring structures. Other important region-based methods are the watershed transform [13]. Lassen et al. [14] proposed a fully automated method of lung parenchyma segmentation based on the watershed algorithm, which enhances the performance of the algorithm by customizing the features to analyze the lung texture through the Hessian matrix. In addition, several edge detection algorithms are usually based on difference operators, such as Sobel, Prewitt and Roberts operators.
Machine learning (ML) is dedicated to the study of how experience can be used to improve the performance of the system itself by means of computation. Machine learning techniques have provided decision support for a wide range of pulmonary anomalies [15]. Support vector machines [16], clustering methods [17] and artificial neural network-based segmentation algorithms [18] are the methods of Machine learning. Clustering is an unsupervised learning method that can classify similar elements into the same cluster. Image segmentation technology based on artificial neural network has strong learning ability, self-adaptability and good noise immunity. In addition, image segmentation technology based on artificial neural network also can effectively overcome difficulties, such as uneven grayscale and complex tissue morphology structure of medical images, and then provide segmentation results with high accuracy. However, neural networks, which are difficult to obtain sufficient training samples, have high algorithm complexity, high computational effort and poor real-time performance. In addition, there may be overfitting.
The concept of Deep Learning [19] was introduced by Hinton et al. in 2006 and originated from the study of artificial neural networks [20, 21]. And a multilayer perceptron with multiple hidden layers is a deep learning structure. Deep learning interprets and processes data by mimicking the mechanisms of the human brain [22]. Convolutional Neural Networks (CNN) [23] is an important application of deep learning in the field of images, and it is very successful in classification. Ahmed Soliman et al. [24] proposed a 3D CNN structure for lung parenchyma segmentation and demonstrated the feasibility of 3D CNN structure for medical image segmentation, although it also exposed the drawbacks of computationally intensive and long time-consuming. Jonathan Long et al. [25] introduced Fully Convolutional Networks (FCN), which abandoned a large number of fully connected layers in convolutional networks and replaced them with convolutional layers to classify images of the pixel level, which is much faster compared to the classical image segmentation methods. The U-Net, improved from FCN, is the most influential neural network of the field of medical image segmentation. The U-net network, proposed by Ronneberger [26] and others, is one of the most commonly used architectures of image segmentation in deep learning [27]. A large number of researchers have made various improvements based on U-Net [28, 29]. Kumar et al. [30] proposed a deep convolutional neural network based on U-NET that enables automatic segmentation of lung parenchyma. The model uses data augmentation technique that enables a hopping structure of deep network which based on a small number of available medical images to perform pixel classification. The literature [31] changed the two-dimensional convolution of U-Net to three-dimensional convolution to better learn the spatial information. Some other researchers have tried to use semi-supervised learning to solve the training difficulties of the small amount of data. In order to unify the processing of 2D and 3D imaging modalities, and then they extend the 2D attention mechanism to 3D [32]. He et al. [33, 34] proposed the Residual Network (ResNet), which allows the original input information to be directly to be passed to the later layers, provides new ideas for the application of neural networks in image segmentation. In 2018, researchers at the University of Heidelberg as well as the Heidelberg University Hospital incorporated the research-designed nnU-Net [35] into the automatic processing operation of medical datasets and achieved good results in the medical segmentation decathlon.
In summary, deep learning-based segmentation techniques have achieved remarkable research results of lung tissue segmentation. Based on the final results of lung parenchyma segmentation submitted by 12 teams in the 4th International Symposium on Image Computing and Digital Medicine (ISICDM 2020), this paper briefly introduces the methods of good results from lung parenchyma segmentation and analyzes the reasons for them. Finally, the shortcomings in the current segmentation methods and the possible future directions of development are prospected.
Methods
This section introduces the image segmentation algorithms used by the participating teams in this competition for lung parenchyma segmentation. The details of the algorithm are shown in the Discussion section. Table 1 shows the methods used by each team and their main characteristics.
The methods used by each team and their main characteristics
The methods used by each team and their main characteristics
U-Net is based on FCN, which consists of two parts. The first part is feature extraction, and the second part is up-sampling. The former is used to capture contextual information, and the latter is used to pinpoint the location. These two can be regarded as an encoder and a decoder respectively. In the feature extraction part, there is one scale after each pooling layer, including the original image scale, with a total of 5 scales. In the up-sampling part, every up-sampling is the same as the number of channels corresponding to the feature extraction part, but it needs to be cropped before fusion. The fusion here is also splicing. Since the structure of this network is like U-shaped, it is called U-Net network, and its structure is shown in Fig. 1.

U-Net structure diagram.
In the last two years, the Attention Model has been widely used in various types of deep learning tasks, such as natural language processing, image recognition and speech recognition. And it is one of the core techniques that deserve the most attention and in-depth understanding in deep learning techniques. The Attention mechanism is applied in the field of computer vision to focus attention on the salient features useful for the target task and suppress the irrelevant regions in the input image. In the case of medical image segmentation tasks, this means focusing attention on the tissue or organ to be segmented.
U-Net-WGAN
U-Net-WGAN is a combined model of U-Net and WGAN networks. Generative Adversarial Networks (GAN) was proposed by Ian J. GoodFellow in 2014, which is a relatively new technology compared to CNN and RNN. GAN has been widely used in image generation, image synthesis and image enhancement. In actual training, GAN has many problems, such as training difficulties, loss of generator and discriminator cannot indicate the training process and lack of diversity of generated samples. WGAN or Wasserstein GAN was proposed in 2017. The main improvement of WGAN is to do more in-depth theoretical analysis and research on the objective function of GAN based on Maximizing Mean Difference (MMD). WGAN network can better solve the problem of unstable training in GAN, and no longer need to carefully balance the training degree of the generator and discriminator.
nnU-Net
nnU-Net is a deep learning semantic segmentation framework for training 3D medical images without any experimental design and parameter tuning. nnU-Net network is automatically configured for data preprocessing operations (known as pipeline fingerprinting), which based on the properties of the images themselves (known as data fingerprinting).
Other algorithms
Scale-aware pyramid segmentation network (CPFNet)
Scale-aware Pyramid Segmentation Network (CPFNet) [36] can capture multi-scale contextual information and adapt each image to highlight specific scale feature of the target. The network uses two new pyramid modules, the GPG module and the SAPF module. The GPG module is in the skip-connected part of the encoder and decoder, which is designed to provide different levels of contextual information. The SAPF module is a scalable pyramid module for dynamically fusing multi-scale contextual information in high-level functions. CPFNet can better solve the problem of different cross-sectional (slice) target scales with different sizes and variable morphologies in the pulmonary medical image dataset.
Multi-scale fusion net
Using a single network model to segment medical images often can’t get the expected effect. Multi-scale Fusion Net is used to extract multi-scale image features in parallel. At the same time, the loss function of the network is adjusted to solve the problem of sample imbalance, that is, the small proportion of the target area. In other words, the influence of the small target prospect is increased by weighting the sum of multiple loss functions and adjusting the proportion of different losses.
Inf-Net
Inf-Net [37] network is a semi-supervised framework, which can improve the learning ability and achieve higher performance. Firstly, CT or CTA images are fed into two convolutional layers to extract high-resolution and semantically weak (low-level) features. On this basis, an edge-attention module is added to explicitly improve the representation of the target region boundaries. Then, the low-level features obtained as a result of the two convolutional layers are fed into three convolutional layers to extract the high-level features.
Results
Study data
The challenge provides each team with 10 sets of CT images and 10 sets of CTA images for model training. The image size is 512×512, with a total of 7331 slices. the original images are in DICOM format and the ground truth is in JPG format. In addition, the image data used for this challenge are anonymized and teams are required not to use the data outside of this competition.
The segmentation results of CT and CTA images, the final dice coefficient, over-segmentation rate (OR) and under-segmentation rate (UR) of 12 teams were recorded in the final contest. The DICE, OR and UR are as follow. And the segmentation results will be introduced respectively according to the method category.
In which, V
seg
and V
gt
is the correct number of the pixels of the segmented image and V
seg
+ V
gt
is the sum of pixels of the ground truth image and the segmented image.
In which, R s is the number of pixels of marked images and O s is the number of pixels in the segmentation results but should not be included. U s is the amount of pixels that should be in the segmentation results.
We present the final 2D segmentation results of all teams in a three-dimensional format using MATLAB tools, as shown in Fig. 2.

The three-dimensional display of the results of lung parenchyma segmentation by each team, (GT represents Ground Truth, T represents team, CT_1∼CT_4 represent four sets of data of CT images, CTA_1∼CTA_4 represent four sets of data for CTA images, and N/A represents CTA_2 of T9 is displayed as blank due to abnormal results).
Table 2 shows the Dice coefficients and OR and UR results of the CT images of each team. Figure 3 shows the corresponding histograms. It can see that the dice coefficients of CT image segmentation results are between 0.7 and 0.9, and the results at 0.7 are mainly due to the unstable network. The low generalization performance will lead to the test results on a certain group or sets of data not meeting the requirements at all, and the final average results are pulled down. The teams above 0.9 are mostly due to effectively remove the noise, and thus increases the Dice coefficient. Among them, the Dice of U-Net&Postprocessing of Team 3 is the best, which is 0.991. The OR of Attention &U-Net of Team 10 is the best, which is 0.182. The UR of Attention&U-Net of Team 4 is the best, which is 0.037.
Dice coefficients and OR, UR results of CT image_segmentation by each team
Dice coefficients and OR, UR results of CT image_segmentation by each team

Dice coefficients and OR, UR results of CT image_segmentation by each team.
Table 3 shows the Dice coefficient and OR and UR results of the CTA images of each team, and the corresponding histogram is shown in Fig. 4. It can observe that the dice coefficient of most CTA images segmentation results is between 0.7 and 0.9, and only one team has scores at 0.7. Among them, the Dice of Attention & U-Net of Team 10 is the best, which is 0.985. The OR of Attention & U-Net of Team 10 is the best, which is 0.129. The UR of Attention&U-Net of Team4 is the best, which is 0.040.
Dice coefficients and OR, UR results of CTA image_segmentation by each team

Dice coefficients and OR, UR results of CTA image_segmentation by each team.
Combining the values of Tables 2 3 and the difference in dice accuracy of the CT and CTA segmentation results in the histogram in Fig. 5, it is found that the Dice Accuracy of CT image segmentation of each team is not much different from the CTA. The difference between CTA image and CT image is that the CTA image is taken with a contrast agent. The role of the contrast agent is to increase the brightness of the pulmonary blood vessels. Figure 6 shows the differences between CT and CTA images.

Dice coefficients and OR, UR results of CTA image_segmentation by each team.

CT and CTA image.
With reference to the segmentation results of CT images and CTA images, the models used in this competition are analyzed here as follows.
U-Net
Both Team 2 and Team 9 used 2D U-Net network structure. Team 2 did not modify the 2D U-Net network structure for innovation. And Team2 used the basic 2D U-Net network structure of input size 512×512, batch size 10, and data augmentation using scaling, translation, rotation, gamma transform, flip, elastic deformation and Gaussian noise. The loss function is L = 0.5*L bce + 0.5*L dice , and the optimizer uses the Adam optimization method. The training process of lung parenchyma segmentation is shown in Fig. 7. Team 9’s performance is not good as Team 2.

Flow chart of the training process of Team 2.
Team 3 segmented the lung parenchyma also using the 2D U-Net model. To improve the accuracy of lung parenchymal segmentation, the team has conducted extensive research in data pre-processing and post-processing of segmentation results. In the post-processing part, the maximum connected domain approach is used for the noise present in the segmented lung parenchyma. A connected domain is a collection of pixels consisting of neighboring pixels with the same pixel value. The connected region is defined by the 8-point method and the 4-point method. The difference is that the positions of adjacent pixels that constitute the pixel connectivity are different.
As the Fig. 2 shows, it can observe that the results of Team 2 and Team 3 are generally satisfactory, which indicated that there are some limitations in segmentation of lung parenchyma by simple 2D U-Net. Team 9 had obvious over-segmentation and under-segmentation, which indicated that the model parameter setting is not the same or the pre-processing and post-processing of this team is not very effective.
The Attention-U-Net model is based on the U-Net model. Team 4, Team 10 and Team12 combined the attention mechanism and U-Net network to segment lung parenchyma. For data preprocessing, Team 4 firstly uses rotation and Gaussian blur to change the contrast of the image to half or twice of the original and followed by segmentation extraction of the lung parenchyma using Attention U-Net network. Finally, CRF was used to fill the hole and optimize the segmentation results. For the loss function of the network model, they choose the weighted sum of cross-entropy loss and mean square error loss.
Team 10 uses the U-Net network as the Backbone of its model and makes improvements based on it, in which a post-processing smoothing operation is performed. the operation process of Team 10 is shown in Fig. 8.

The flowchart of Team10 segmentation of lung parenchyma.
Team 12 used the Attention U-Net model and adopted a two-stage segmentation strategy for the problem of small and discrete image foreground areas. The CT/CTA image is first coarsely segmented which can obtain the approximate location of the target region. On this basis, attention is used to perform feature weighting, then the image is segmented, and finally fine segmentation is performed to obtain the final result.
As the Fig. 2 shows, it can observe that the results of Team 10 were generally satisfactory, and the results of Team 4 have obvious under-segmentation and over-segmentation. Team 12 has subtle under-segmentation for the segmentation of CT/CTA images, and the surface of the segmentation results is relatively rough, and the post-processing may not incorporate smoothing. Since the preprocessing operations of each group of data are not the same, it is not scientific to directly judge the network itself. However, it is worth noting that for segmentation of lung and trachea, which has a large gap between foreground and background pixels, some adjustments to the loss function and tilting the weight to the foreground may increase the accuracy of the result to a certain extent.
Team 5 uses the U-Net-WGAN network structure. As the Fig. 2 shows, U-Net-WGAN network is effective on CTA images and has over-segmentation on CT images, which also indicates that the U-Net-WGAN network structure may be more suitable for segmenting high-resolution medical images.
nnU-Net
The two teams that used nnU-Net in this competition were Team 1 and Team 8, both of whom took advantage of the network’s adaptive characteristic for dataset processing. Team 1 trained nnU-Net network with low resolution data, and then extracted the centerline. And Team 8 made some adjustments to the number of training rounds and the batches of each round. The 1000 epoches in the original program were changed to 200, and 250 batches were trained for each epoch, and batch-size = 2. Various methods are used in data expansion, including rotation, scaling, Gaussian noise, Gaussian blur, brightness and contrast adjustment, low-resolution simulation, gamma enhancement, and mirroring operations. As the Fig. 2 shows, both results of lung parenchyma segmentation of Team 1 and Team 8 are very good and similar. On closer inspection, it can be found that Team 8 has better segmentation results compared to Team 1, whose segmentation results have some noise.
Other algorithms
Scale-aware pyramid segmentation network (CPFNet)
Among all the participating teams, Team 11 uses the scale-aware pyramid segmentation network (CPFNet). And the data preprocessing means of this team include image normalization, normalizing all pixel value of dicom images to (0,1), and label generation. Data enhancement used random horizontal flip, vertical flip, and random –90 to 90 degree rotation. The training strategy: The training set is divided into training and validation groups (9:1), and the best model on the validation set is saved for testing. All networks were implemented on a single TitanV GPU. the Adam optimizer was used. In addition, the image size uniformly adjusted to 256×256 and the batch size set to 8. The initial learning rate is 1×10–4, and then the learning rate is adjusted to 0.1 times of the previous rate every 40 training cycles. As the Fig. 2 shows, there is over-segmentation in Team 11.
Multi-scale Fusion Net
As the Fig. 2 shows, Team 6 has severe over-segmentation by using Multi-scale Fusion Net network. The proposed algorithm uses only a single model without integrating learning and post-processing, with fewer parameters and reproducibility. The unsatisfactory segmentation results are probably related to the model and the lack of post-processing operation.
Inf-Net
Inf-Net’s performance was not very outstanding at the 4th International Symposium on Image Computing and Digital Medicine (ISICDM 2020). As the Fig. 2 shows, Team 7 has severe under-segmentation and over-segmentation component that exists simultaneously. Inf-Net may not be suitable for lung parenchyma segmentation.
In general, all teams performed well. The most outstanding performance was Team 3, which can be seen as a result of the post-processing part, which removes noise through the Maximum Connected Domain operation. Thus, it can significantly reduce the over-segmentation rate. In this competition, Team 3 and Team 2 used U-Net and obtained the best results. Another outstanding performance was Team 10, who used the Attention U-Net network. In addition, Team 8 and Team 1 used the nnU-Net network also achieved good results of lung parenchyma segmentation. Therefore, it can see that U- Net, Attention U-Net, and nnU-Net networks are applied to lung parenchymal segmentation, which are still able to produce good results even if the amount of data is small.
Conclusion
From the algorithms used by the participating teams of the ISICDM conference lung tissue segmentation challenge, deep learning methods have become the first choice of many participating teams, which also reflects that deep learning has great advantages in medical image segmentation problems. From the results achieved by the participating teams, U-Net, Attention U-Net and nnU-Net performed well. This shows that this special U-shaped structure can really handle the segmentation problem of medical images better. In addition, with the continuous breakthrough of computer technology, computer-aided diagnosis system has become an important research topic in medical imaging and diagnostic radiology. Segmentation of lung parenchyma is crucial for early screening and diagnosis of lung diseases, especially lung cancer. Improving the accuracy and speed of lung parenchyma segmentation, enhancing the robustness of the model, and achieving automated segmentation are the main research directions of many scholars. Along with the continuous improvement of medical imaging big data, innovative algorithms represented by deep learning will also bring new breakthroughs in the lung parenchymal segmentation problem. Deep learning algorithms have the advantages of high accuracy, fast speed, robustness and migration. Deep learning-based image segmentation algorithms have already outperformed traditional algorithms in terms of accuracy and efficiency, but they have not completely replaced traditional algorithms, there is still room for improvement, and further in-depth research is needed.
Footnotes
Acknowledgments
In the process of experiment and writing, the contributors get a lot of writing suggestions and work guidance from editors and readers, which helps us to make the content more rigorous and easier to understand. Here, we would like to express our most sincere thanks to you. Again, this work is supported by the National Natural Science Foundation of China (61971118) and Fundamental Research Funds for the Central Universities (N182410001, N2104008).
