Abstract
BACKGROUND:
Automatic segmentation of the pancreas and its tumor region is a prerequisite for computer-aided diagnosis.
OBJECTIVE:
In this study, we focus on the segmentation of pancreatic cysts in abdominal computed tomography (CT) scan, which is challenging and has the clinical auxiliary diagnostic significance due to the variability of location and shape of pancreatic cysts.
METHODS:
We propose a convolutional neural network architecture for segmentation of pancreatic cysts, which is called pyramid attention and pooling on convolutional neural network (PAPNet). In PAPNet, we propose a new atrous pyramid attention module to extract high-level features at different scales, and a spatial pyramid pooling module to fuse contextual spatial information, which effectively improves the segmentation performance.
RESULTS:
The model was trained and tested using 1,346 CT slice images obtained from 107 patients with the pathologically confirmed pancreatic cancer. The mean dice similarity coefficient (DSC) and mean Jaccard index (JI) achieved using the 5-fold cross-validation method are 84.53% and 75.81%, respectively.
CONCLUSIONS:
The experimental results demonstrate that the proposed new method in this study enables to achieve effective results of pancreatic cyst segmentation.
Introduction
Pancreatic cancer is one of the diseases with the highest mortality rate. Identifying and removing malignant cysts in the early stages of pancreatic cancer can improve the patient’s chances of survival. In contrast to the brain, the pancreas is not protected by the skull and it is surrounded by various organs and fatty tissues in the abdomen, which makes it unsuitable for detection and segmentation. The majority of abdominal CT images contain noise as well as visceral fat near the pancreas, which makes early detection extremely challenging [1]. The segmentation of the pancreatic tissue and its tumor is a critical input to the computer-aided diagnostic system, which facilitates surgical planning and navigation, as well as the automatic detection of cancer and the corresponding evaluation. Accurate and efficient segmentation of medical images has become a challenging task in radiation treatment planning, and segmentation of pancreatic tumors is one of the challenges in medical image segmentation.
The use of medical images such as MRI, CT, PET, and PET-CT in diagnosing diseases and treatment planning is gradually increasing. In recent years, limited research work has been done to segment pancreatic tumors. Most previous pancreatic tumor detection was based on patient history and symptoms, with little application of relevant image processing techniques. The increased use of high-resolution computed tomography and improvements in related imaging techniques have greatly influenced the accuracy of pancreatic imaging, while also improving the potential for cyst identification and classification.
Most previous studies have focused on the segmentation of healthy tissues, with less research related to tumor segmentation. Dmitriev et al [2] used a combination of a new random walker algorithm and a regional growth method to segment the borders of pancreas and pancreatic cysts. Hagos et al [3] implemented a fast algorithm to segment tumors in PET scanned images, performed principal component analysis on super pixels to reduce the dimensionality of the segmentation algorithm, and used K-means clustering to perform segmentation, achieving satisfactory accuracy and speed.
The Active Contour Model (ACM) image segmentation algorithm is an image processing technique that combines upper layer and prior knowledge for robust image segmentation, introducing grayscale information and boundary information of the image in the optimization process [4]. Therefore, the optimization can obtain smooth closed contours with the advantages of diverse forms and flexible structures. The active contour model can segment some organs with clear boundaries and relatively fixed positions well, and it shows superior performance for robust left ventricular segmentation in cardiac MRI [5]. However, it is very time consuming and sensitive to different initial contours, which remains more problematic in the case of pancreatic cysts.
In recent years, medical image segmentation algorithms based on deep networks have been developed and are expected to be applied to help physicians achieve various clinical purposes. Ronneberger et al [6] proposed the U-Net based on the fully convolutional neural network (FCN)[7], and later proposed segmentation networks[8–12]are basically based on the U-Net architecture, and their segmentation accuracy has been improving. Jiang et al [13] proposed a DLU-Net for segmentation of pancreatic cancer, added a deformable convolution module to the network architecture to enhance the ability to model irregular objects, and designed a BConvLSTM module to incorporate features at different scales in the decoding stage of the network, which can achieve more accurate segmentation of pancreatic cancer edges. Liang et al [14] proposed a convolutional neural network-based multiparametric MRI pancreatic gross tumor volume (GTV) segmentation algorithm using a sliding window approach to achieve the extraction of the region of interest. Zhu et al [15] designed a multiscale classification network for the variability of pancreatic tumor size to achieve pancreatic ductal adenocarcinoma (PDAC) detection and used a nonparametric post-processing method to remove outliers. Zhou et al [16] developed a multi-phase segmentation network to enhance automatic segmentation of PDAC by integrating multi-phase phase information (arterial and venous phases) and added an additional loss function to eliminate view discrepancies. Zhou et al [17] introduced an additional deep supervision approach to the segmentation network and implemented a coarse-to-fine segmentation algorithm to first determine the location of the pancreas and then localize the cyst for segmentation.
Segmentation of pancreatic tumors using relevant image processing algorithms helps to further evaluate the disease and is particularly important for the examination of pancreatic cancer in its early stages, however, existing segmentation networks are difficult to achieve accurate segmentation for cystic tumors of different sizes. In this study, we aim to implement an algorithm for automatic segmentation of pancreatic cysts in abdominal CT scans based on deep learning. The main contributions of this work are as follows. We propose a new PAPNet segmentation network for pancreatic cyst CT image segmentation. Our proposed APAM and SPPM can effectively extract high-level features at different scales and fuse contextual spatial information. We perform a detailed evaluation of the proposed model. The experimental results show that the proposed method obtains high segmentation accuracy.
Methods
Network architecture
PAPNet is an extended U-Net network, and the overall network architecture is designed based on the Encoder-Decoder architecture of U-Net. Our proposed network consists of four main components: feature encoding module, atrous pyramid attention module, spatial pyramid pooling module and feature decoding module. The overall architecture of PAPNet is shown in Fig. 1.

PAPNet segmentation network architecture.
The feature encoding module and feature decoding module are shown in Fig. 2. The feature encoding module uses the residual module in ResNet34 [18] with pre-trained weights, which consists of two 3×3 convolutional connections with a short-circuit connection between the input and the output, enabling the input features to be added to the output features, allowing the stacking layer to learn new features based on the original input features, avoiding the problem of gradient disappearance, and making the network more easy to train. We use a decoder block to expand the feature size instead of the original upsampling operation. The decoder block contains two 1×1 convolution and 3×3 deconvolution operations. The first 1×1 convolution realizes the dimensionality reduction of the number of feature map channels to reduce the parameter computation; the 3×3 deconvolution realizes the reduction of the feature map size; the second 1×1 convolution up-dimensions the number of output feature map channels to double the number of input feature map channels. Based on the skip connection and decoder blocks, the final segmentation prediction results can be obtained.

Residual blocks and decoder blocks.
In this paper, an atrous pyramid attention module and a spatial pyramid pooling module are added at the bottom of the network for extracting high-level features at different scales and fusing contextual spatial information.
The motivation of this work is to build a module that can extract multi-scale features more efficiently. Inspired by EPSANet [19], we proposed the atrous pyramid attention module (APAM). We improve the pyramid squeeze attention (PSA) module proposed in EPSANet. First, we replaced the grouped convolution with different convolution kernel sizes in it with a atrous convolution with a larger perceptual field, while the number of computational parameters can be guaranteed not to increase. Secondly, we replaced the squeeze attention module in it with the channel attention module in CBAM [20] and removed the final product operation.
Figure 3 illustrates the structural composition of the channel attention weight module (CAWM). The channel attention mechanism allows the network to selectively weight the importance of each channel to produce more information output. Firstly, the input feature map

Channel attention weight module.
Atrous convolution has shown good performance in many semantic segmentation and target detection tasks [19]. Atrous convolution with different void rates can obtain different perceptual fields and thus feature information at different scales. An example of atrous convolution is shown in Fig. 4.

Atrous convolution.
Figure 5 shows the structural composition of the atrous pyramid attention module. In the atrous pyramid attention module, firstly, the input feature map

Atrous pyramid attention module.
Secondly, we concatenate the four feature maps to obtain
Finally, the output weight distribution
APAM obtains feature information at different scales through different cavity convolution and achieves effective segmentation of targets of different sizes by CAWM to achieve the selection of feature information at different scales.
To improve the interaction of global contextual information, a spatial pyramid pooling module (SPPM) has been designed. Inspired by the design ideas of residual multi-kernel pooling (RMP) in CENet [22] and pyramid pooling module (PPM) in PSPNet [23], they both use different scales of pooling operations for global contextual a priori information extraction. To improve the identification of location information, this paper used the max pooling layer for pooling operation.
As shown in Fig. 6, four max pooling layers of different sizes (2×2, 3×3, 5×5, 6×6) are used to extract global context information, and then a 1×1 convolution is used to reduce the dimensionality, with each pooled feature channel number reduced to 1/8 of the original channel number and the original feature reduced to 1/2 of the original channel number. Each 1×1 convolution is followed by a Batch Normalization and ReLU layers are added to improve the generalization capability of the network. To keep the output feature map size consistent with the input, the reduced-dimensional pooled feature map needs to be bilinearly interpolated and upsampled to the same size as the original feature map. Finally, the five feature maps will be concatenated together and the final output feature map size will be consistent with the input feature map size.

Spatial pyramid pooling module.
SPPM implements contextual information interaction by aggregating pooled features at different scales, which improves the recognition of targets at different locations by the network and reduces false segmentation.
Dataset and evaluation metrics
Data for this study are obtained from patients with pathologically confirmed cystic tumours of the pancreas admitted to Shanghai Changhai Hospital from January 2017 to June 2022. A total of 107 patients are included in the study, including 56 patients with SCN and 51 patients with MCN. The CT data are obtained using parenchymal phase (30–35 s after contrast injection) images of the pancreas with a layer thickness of 3 mm, a layer spacing of 3 mm, and a resolution of 512×512. The tumor boundaries of each CT slice involving the tumor region are confirmed, annotated, and segmented by expert radiologists with ten-years’ experience. The clinical doctor and radiologist examined the final segmentation results together.
Before the data are fed into the network, we need to perform some preprocessing on them. 1346 2D tumor slice images are extracted from the CT images of 107 patients. To enhance the image contrast, we adjust the window width to 300 and the window level to 100, and all data are normalized to 0-1 before input to the network. Image normalization is the process of centering the data by removing the mean value. According to the knowledge of convex optimization theory and data probability distribution, data centralization conforms to the law of data distribution, and it is easier to obtain the generalization effect after training.To verify the stability of the model, a 5-fold cross-validation strategy is used for experiments in this paper. The training set and test set are randomly divided in a ratio of 4 : 1. During the training process, 20% of the training set is used as the validation set.
To assess the quality of segmentation, this paper calculates metrics to quantify the variability between the segmented images and the gold standard by comparing them to each other. In this paper, two metrics, DSC and JI, are used to evaluate the segmentation results.
Defining A and B as the ground truth and prediction results respectively, the DSC is calculated as follows:
The JI is calculated as follows:
In order to facilitate the statistics of the experimental results, all metrics for this experiment are calculated based on study cases (patients), i.e., tumor slices from individual patients are predicted and merged into one 3D label image.
The number of experimental epochs is set to 100, the batchsize is set to 16, the optimizer is Adam, the initial learning rate is set to 4e-4, and each epoch decays by 0.99. The model with the minimum loss value in the training period is selected as the final model to generate the segmentation results. All experiments are implemented on the Pytorch framework and trained on a 12GB NVIDIA RTX 3060.
The Dice loss function Loss
Dice
and binary cross entropy (BCE) loss function Loss
BCE
are commonly used loss functions in medical image segmentation to measure the difference between the prediction and the true value of the model. Loss
Dice
is calculated as follows:
Loss
BCE
is calculated as follows:
where N is the number of pixels of the image, y
i
and
This network uses a combination of Loss
Dice
and Loss
BCE
as the loss function, and α is set to 0.5 for the experiment. Equation is as follows:
Comparison of results between different networks
To compare the performance of different network architectures on the pancreatic tumor segmentation task, PAPNet is compared and analyzed with ResU-Net, U-Net[6], CE-Net [22] and Attention U-Net[24] in this paper. The experimental results obtained for each algorithm are shown in Table 1. PAPNet achieved optimal segmentation results with an average DSC of 84.48% and an average JI of 75.81%, which is an improvement of 4.73% and 4.94% compare to the baseline network ResU-Net and 3.39% and 3.73% compare to CE-Net. What can be shown is that both DSC and JI for either network have a high standard deviation, indicating that there is individual variability in the difficulty of segmentation of pancreatic cysts, i.e., segmentation failure can occur on some patient cases and DSC can show low performance. The stability of the model was verified by 5-fold cross-validation, and an analysis of variance (ANOVA) was performed on the DSCs between labels and predictions in each trained model.The difference of DSCs among the fivefold partition groups was not statistically significant (p = 0.592).
Experimental results of different networks
Experimental results of different networks
Figure 7 illustrates the cumulative frequency of DSC for the experimental results of different networks, which can reflect the distribution of DSC. The number of 0≤DSC≤0.5 in PAPNet is significantly lower than other networks, indicating that the number of wrong segmentations is less than other networks and the performance is more stable. The DSCs of all networks are most distributed at 0.9 < DSC≤1.0, indicating that most cystic tumors can be well segmented.

Cumulative frequency of DSC for different networks.
Figure 8 randomly shows the segmentation of pancreatic cystic tumor sections under different networks. From the first column of images, Attention U-Net shows over-segmentation, ResU-Net, U-Net and CE-Net all show under-segmentation, and PAPNet achieves a good segmentation situation. From the second column of images, Attention U-Net, CE-Net and ResU-Net all show over-segmentation phenomenon, and PAPNet and U-Net both obtain good segmentation results. From the third column of images, U-Net, Attention U-Net and ResU-Net all showed minor mis-segmentation, and all networks performed well for tumor subject segmentation. From the overall situation, PAPNet alleviates the over-segmentation and under-segmentation problems of pancreatic cystic tumor segmentation in some cases and shows superior segmentation performance compared with other networks. Certainly, PAPNet can also fail to segment in some cases, as shown in Fig. 9, when the target area is too small or other soft tissue grayscale is close, it is easy to fail to segment or incorrectly segment.

Visualization of segmentation results for different networks.

Two cases of PAPNet segmentation failure.
In this paper, the effectiveness of each proposed module is verified by ablation experiments, and the experimental results are shown in Table 2. Firstly, the validity of APAM is verified. Since APAM is improved based on PSAM, a comparative analysis of APAM and PSAM is performed first, and it can be seen from the experimental results that the average DSC of ResU-Net+PSAM and ResU-Net+APAM is improved compared to the baseline network ResU-Net by 3.15% and 3.3%. When combined with SPPM, the average DSC of PAPNet (ResU-Net+APAM+SPPM) is 0.93% higher than that of ResU-Net+PSAM+SPPM. The performance of APAM proposed in this paper is better than that of PSAM.
Results of ablation experiments with different modules
Results of ablation experiments with different modules
To verify the effectiveness of SPPM. With SPPM only, ResU-Net+SPPM showed a small performance improvement compared to the baseline network ResU-Net. When SPPM was combined with PSAM or APAM, the average DSC of ResU-Net+PSAM+SPPM and PAPNet improved by 0.7% and 1.48% compared to ResU-Net+PSAM and ResU-Net+APAM, respectively. SPPM improves the segmentation accuracy in different degrees. The average DSC of PAPNet improves 4.58% compared to the baseline network ResU-Net, which proves the effectiveness of the proposed module.
Figure 10 shows the cumulative frequency of DSC for the results of the ablation experiments. Compared with others, the overall distribution of PAPNet achieved the lowest number for both 0≤DSC≤0.5 and 0.5 < DSC≤0.6, i.e., the fewest instances of poor segmentation quality or segmentation failure. It indicates that PAPNet achieves better segmentation results in most cases.

Cumulative frequency of DSC for ablation experiments.
Figure 11 randomly shows the segmentation of pancreatic cystic tumor sections under different modules. From the first column of images, only PAPNet did not segment out the wrong region. From the second column of images, ResU-Net+APAM, ResU-Net+SPPM showed failed segmentation, when the two modules were combined but accurately segmented the target region. From the third column of images, both ResU-Net+PSAM+SPPM and ResU-Net+SPPM segment a larger region incorrectly, indicating that the SPPM module is prone to be interested in large target regions. The overall situation shows that PAPNet is capable of better perception of global and local information, and thus achieves better segmentation in both large and small targets.

Visualization of segmentation results of ablation experiments.
To improve the use of neural network-based image segmentation algorithms to assist radiologists in the diagnosis and treatment of pancreatic cancer, this paper proposes an end-to-end segmentation algorithm PAPNet for pancreatic cancer. In PAPNet, residual blocks with pre-trained weights are used, which makes the network easier to train and the performance is substantially improved. The proposed atrous pyramid attention module, spatial PAPNet uses pre-trained residual blocks to make the network easier to train and improve the performance significantly.
In comparison with other networks, PAPNet can more effectively aggregate multi-scale feature information to identify pancreatic cysts in different location and size, and segment it precisely. Due to the variability of the samples, it cannot be ensured that good segmentation can be achieved for each test sample. For example, when the pancreatic cysts are too small or other surrounding tissues are of similar gray scale there may be cases of segmentation errors. However, PAPNet has been able to decrease the occurrence of such cases well compared to other networks.
The experimental results demonstrate that despite the diversity of locations and shapes of pancreatic cysts, the proposed model in this paper can also achieve their segmentation successfully, with an average DSC of 84.53±15.86% and an average JI of 75.81±19.39%. In addition, only two kinds of cystic tumor data, MCN and SCN, are segmented in this paper, and other pancreatic tumor data, such as intraductal papillary mucinous neoplasm (IPMN) and pancreatic ductal adenocarcinoma (PDAC), can be added subsequently for validation.
