ASU-Net: U-shape adaptive scale network for mass segmentation in mammograms

Abstract

U-Net is a commonly used deep learning model for mammogram segmentation. Despite outstanding overall performance in segmenting, U-Net still faces from two aspects of challenges: (1) the skip-connections in U-Net have limitations, which may not be able to effectively extract multi-scale features for breast masses with diverse shapes and sizes. (2) U-Net only merges low-level spatial information and high-level semantic information through concatenating, which neglects interdependencies between channels. To address these two problems, we propose the U-shape adaptive scale network (ASU-Net), which contains two modules: adaptive scale module (ASM) and feature refinement module (FRM). In each level of skip-connections, ASM is used to adaptively adjust the receptive fields according to the different scales of the mass, which makes the network adaptively capture multi-scale features. Besides, FRM is employed to allows the decoder to capture channel-wise dependencies, which make the network can selectively emphasize the feature representation of useful channels. Two commonly used mammogram databases including the DDSM-BCRP database and the INbreast database are used to evaluate the segmentation performance of ASU-Net. Finally, ASU-Net obtains the Dice Index (DI) of 91.41% and 93.55% in the DDSM-BCRP database and the INbreast database, respectively.

Keywords

Mammograms mass segmentation convolutional neural network adaptive scale module feature refinement module

1 Introduction

Globally, breast cancer is the most common cancer among women, and it is also the main cause of female death. According to the statistics [10], there were 2.09 million breast cancer incidents and 630,000 deaths due to breast cancer worldwide in 2018. Thus, Breast cancer has become a major public health problem in current society. Currently, reducing breast cancer mortality depends on early detection and diagnosis [35].

Participating in regular high-quality mammography screening is one of the best ways to reduce the risk of premature death from breast cancer [25]. One of the main symptoms of breast cancer is breast masses [2]. Mammography is one of the imaging methods that has been proven to reduce breast cancer mortality [7]. It can present smaller breast masses than possible when diagnosed by clinical breasts or detected by patients. Specifically, the median size of the masses observed by mammography is 1.0–1.5 cm, while the median size of malignant masses detected by autologous or radiologists is 2.0–2.5 cm [4]. Consequently, early detection by mammography has a significant survival benefit [45].

Normally, clinical experts can determine whether it is a malignant mass by observing the shape of the mass [9]. The characteristics of masses vary from benign to malignant. Benign masses, including cysts, fibroadenoma, and breast hematoma, are usually round or oval, while malignant masses that grow in an abnormal and uncontrolled manner are some rounded, irregular, or sharp peaks. Usually, the malignant mass looks brighter than any surrounding tissue [37]. In other words, the more complex the shape of the mass, the greater the possibility of malignancy. In addition, because the mammographic image could be affected by noise and distortion during the detection, the color of the mammographic image is relatively monotonous and the edges of the breast mass are blurred [42].

In the traditional diagnosis, mass segmentation is usually performed manually. The manually performed procedure has potential advantages, which can be easily adopted by clinical diagnosis. But their quality and efficiency are usually restricted by the following two factors:

The manually performed procedure is a long and arduous task. It requires radiologists to spend a lot of time and energy to complete the judgment of the mass. Inevitably, the problem of misjudgment due to fatigue may occur. Therefore, the accuracy of the segmentation results is largely affected by human factors.

Even for the same mass with the clear feature, the diagnosis of radiologists with different prior knowledge may be different. [31].

Therefore, clinical experts can use computer-aided detection (CAD) technology to improve the overall accuracy of mass detection, segmentation, and classification. False positives are reduced via an in-depth analysis of visual details in mammograms, which assists radiologists in the early detection of cancer [43]. The identification task is challenging due to the different textures, shapes, sizes, and locations of the masses in the surrounding tissues. Most traditional CAD systems rely on manual identification of breast masses and manual extraction of their features to perform segmentation works [11]. In a few cases, the high similarity between mass and non-mass, benign and malignant breast tissue may affect the segmentation performance of traditional CAD systems based on manual feature extraction [34]. Later, people proposed several methods of segmentation based on the CAD system, such as the semi-automatic segmentation method [19, 33] and the fully automatic segmentation method [14 , 50]. Usually, these methods rely heavily on the functions generated by a series of pre-processing and post-processing, which are not only sensitive to subjective operations but also inefficient.

In recent years, with the continuous development of artificial intelligence, some new algorithms based on deep learning can achieve more objective and excellent breast cancer diagnosis performance [8]. Deep learning allows a network model composed of multiple processing layers to learn data representations with multiple levels of abstraction, and it uses backpropagation to guide the network to update its internal parameters. It is very good at discovering complex structures in big data sets, thereby greatly improving the latest level of computer vision. Compared with traditional methods, these new algorithms based on deep learning can directly extract advanced features from the original input, which brings more possibilities for breast cancer diagnosis.

Currently, Convolutional Neural Network (CNN) is considered to be the first robust deep learning method, which successfully adopts a multi-layer structure network. It performs well in image classification tasks, such as: VGGNet [24], GoogLeNet [5], ResNet [23], etc. The CNN model is an end-to-end classification model, which directly inputs the image into the model, and then the image with the classification result be obtained. It does not rely on complex pre-processing and post-processing, but on continuous optimization to update network parameters. In 2015, Jonathan Long et al. proposed a segmentation network for image semantics in an article [21], namely Fully Convolutional Neural Network (FCN). FCN classifies images at the pixel level, introducing deep learning into image semantic segmentation for the first time. With the rapid development of deep learning, more semantic segmentation networks have gradually been designed, such as SegNet [46], PSPNet [15], DeepLab series [27 –30], and DANet [17], etc. SegNet [46] is a deep fully convolutional neural network architecture for understanding scenes. SegNet uses a simple encoder-decoder structure and skip-connections to achieve pixel-by-pixel image segmentation. Where the encoder layer and the decoder layer have a one-to-one correspondence. The difference with FCN is that SegNet calculates and saves the maximum pooling indices during the max-pooling process of the encoder. Then, with the help of the maximum pooling indices, nonlinear upsampling is performed on the input in the decoder. This upsampling process does not require learning, but only takes up part of the storage space. Therefore, SegNet is efficient in terms of memory and calculation time during training. Considering that FCN lacks a suitable strategy that can utilize the global contexts, Zhao et al. [15] proposed the Pyramid Scene Analysis Network (PSPNet) including a Pyramid Pooling Module. The Pyramid Pooling Module can provide more contextual information for the semantic segmentation algorithm. This module integrates the features of four different pyramid scales to fuse the contextual information of different regions and provides good global prior knowledge for the network to avoid mis-segmentation. DeepLabV3 [27] is improved based on atrous spatial pyramid pooling (ASPP) proposed by DeepLabV2 [28]. The improved ASPP is composed of BN layers and dilated convolutions with different dilation ratios. They capture multi-scale information in a serial and parallel manner. Besides, DeepLabV3 does not use the DenseCRF post-processing of DeepLabV2. In DeepLabV3+ [30], the encoder-decoder structure is proposed, which regards the ResNet or Xception as the backbone network. ASPP consists of three dilated convolutions with different dilation ratios, a 1 × 1 convolution layer, and an image-level feature. It is used to capture contextual information of different scales. The recently proposed DANet [17] is a dual attention network for scene segmentation works. Fu et al. design the position attention module and channel attention module, which are based on the self-attention mechanism to capture rich contextual dependencies. The position attention module establishes the correlation between similar features regardless of distance. The channel attention module enhances the semantic interdependencies in the channel dimensions. Finally, the outputs of two attention modules are summed to enhance the feature representation of the network.

With the continuous development of deep learning, medical image segmentation which is the branch of image segmentation is also constantly innovating. In recent years, many medical image segmentation networks based on deep learning have also demonstrated outstanding performance. U-Net [41] is a classic fully symmetric fully convolutional network, and it is also one of the earliest networks that use multi-scale features for semantic segmentation. U-Net inherits the idea of FCN. It is mainly composed of the contracting path, expansive path, and skip-connections. The contracting path is a conventional convolutional network, which uses downsampling to extract feature information. The expansive path mainly employs upsampling to combining the low-level texture information of each layer in the contracting path and the high-level semantic information of upsampling in the expansive path to restore the detailed information. The resolution of the output gradually restores to the high-resolution level of the original image. U-Net can effectively integrate low-level texture information and high-level semantic information. Therefore, the network can finally output segmentation results with spatial information and semantic information. Considering that U-Net has the uncertainty of the optimal network depth and the limitation of skip-connections, U-Net++ [52] embeds a built-in ensemble of U-Net with variable depth in the skip-connections, which share the same encoder. U-Net only allows the fusion of encoders and decoders of the same level is broken by adjusting the skip-connection structure. The flexibility of its feature fusion is increased. Besides, the model pruning strategy is designed to make full use of the features at different levels and improve the variability of the network during testing. At the same time, the deep supervision strategy is employed to supervise the output of the network to provide a higher segmentation performance for the network. Recently, Wang et al. proposed the structure of Non-local U-Net [51] based on the self-attention mechanism for biomedical image segmentation. Non-local U-Nets proposes a global aggregation block to tackle the limitations of the single-use of local operators. This module allows the network to fuse the global information in feature maps of any size during upsampling or downsampling, which improves the efficiency and effectiveness of segmentation.

In this work, a U-shape adaptive scale network (ASU-Net) for mass segmentation is proposed. Our main contributions include adaptive scale module (ASM) and feature refinement module (FRM). Generally, the deeper the encoder, the smaller the resolution of the feature map, and the size of the mass will change accordingly. Therefore, the feature map generated by the middle layer contains a large amount of multi-scale contextual information. In the multi-level feature level of the encoder, we consider that only adding a conventional multi-scale extraction module cannot effectively utilize multi-scale information. Therefore, a novel ASM is developed in this work. ASM is designed for the challenge which allows the network to selectively extract and effectively use the multi-scale feature information of the masses in the feature maps of different resolution. Specifically, with the softmax function, ASM learns three related weights of three scale convolutions and then redistributes these weights, thereby dynamically adjusting the appropriate receptive fields. Besides, in the decoder of U-Net, if only the concatenation is employed to fuse low-level spatial information and high-level semantic information, it may not be sufficient to represent the feature of the mass. As a result, we propose a novel feature refinement module (FRM). FRM fuses low-level spatial information with high-level semantic information, and then adaptively selects the features of the two branches. Specifically, FRM calculates the weight vector for the feature information after connection and redistributes the weights of high-level and low-level features, which adaptively enhances the useful feature channel representation and suppresses the interference of irrelevant noise. As a result, the decoder retains more important semantic information via FRM.

In summary, the main contributions in this article are the following three points:

We propose a novel ASM to adaptively select the appropriate scale of convolution kernel to extract feature information based on the semantic perception of the input features. In addition, the multi-scale information is aggregated in a non-linear manner, which allows the network to obtain more powerful multi-scale adaptability.

We propose a novel FRM to construct the channel-wise dependencies in a computationally efficient manner, so as to guide the discriminative fusion of low-level and high-level features, which allows the network to adaptively enhance the useful feature channel representation and suppress the interference of irrelevant noise for a more refined segmentation.

We propose a novel ASU-Net for mass segmentation in mammograms, which effectively improves the false positive problem of mass segmentation. ASU-Net achieves a DI of 91.41% in the DDSM-BCRP database and a DI of 93.55% in the INbreast database.

The rest of the paper is organized as the following. Section 2 elaborates the relevant literature on the mass segmentation of mammograms. Section 3 describes the proposed method. Section 4 introduces the experimental process, results, and analysis. The conclusion is shown in Section 5.

2 Related work

In recent years, with the development of medical image semantic segmentation based on deep learning, the study of breast mass segmentation has had a positive impact. Dhungel et al. developed a series of CNN-based breast mass segmentation algorithms [38 –40]. The first article [38] proposes a method for breast mass segmentation in mammograms. The training process of this method follows the structured support vector machine (SSVM). The cutting plane algorithm is employed to optimize the learning process. Besides, this method combines three potential functions to improve segmentation accuracy, which includes latent functions based on the proposed Deep Belief Network (DBN). A series of experiments in this article has shown that the application of structured learning and deep learning in breast mass segmentation can produce competitive results. However, this method relies heavily on pre-processing that can improve the contrast of the input image. The second method [39] explores the statistical learning method using Conditional Random Field (CRF) and combines a variety of potential functions through deep learning, Gaussian Mixture Model (GMM), and shape priors. Finally, a statistical model is developed. Where the innovation is the use of tree re-weighted belief propagation (TRW) with the supervised learning features for reasoning, and it also proves the applicability of TRW to the work of breast mass segmentation. Using these inference methods and potential functions can indeed improve segmentation performance and efficiency. However, the test results still need to use pre-processing to improve the segmentation results. In addition to the Gaussian Mixture Model (GMM) and the prior shape of the shape between the image and the segmentation plane, their last work [40] proposes to use CNN and DBN as deep learning models to perform mammograms segmentation of breast masses. Then, the segmentation results of the two are used as input into the structure prediction model. This model uses CRF and SSVM as the loss minimization parameter learning algorithm to regularize the deep learning model. The structure prediction model outputs the final segmentation results via inference and training. This article has been proved that CRF has greater advantages in reasoning and training speed of these two structured learning methods. However, these two types of post-processing not only fail to train but also increase the number of model parameters. Besides, the test results show that image pre-processing is very important, and the improvement of the DI still requires image pre-processing. Zhu et al. [48] proposed a semantic segmentation model based on FCN and integrated the position prior of the mass into the model. Then, CRF is used for structured learning of the output of the model. This method also combines adversarial training to provide a powerful regularization function for the model to reduce overfitting when processing small-scale training sets. However, the algorithm relies on cumbersome pre-processing and post-processing to a large extent, which makes the model lack robustness. Recently, CRU-Net [13] proposed by Li et al. integrates the residual learning function into U-Net which employs explicitly mapping layers with residuals to alleviate the disappearance and explosion of network gradients. Besides, CRF is employed to combine label consistency with similar pixels, which uses the advantages of probabilistic graphic modeling to achieve a more refined segmentation. However, CRF has a high computational cost, and the use of CRF increases the complexity of the network.

3 Method

3.1 Adaptive scale module (ASM)

The encoder-decoder structure of U-Net can combine shallow spatial information and deep semantic information using skip-connections. In the lower stage of the encoder, U-Net simultaneously captures the fine spatial information and insufficient semantic information of the feature. Because the receptive field of the network is small, the encoder obtains more spatial information. As the number of network layers deepens, the network gradually has a larger receptive field. The network obtains more high-level semantic information, which can provide the position of the segmentation target in the entire image and reflect the relationship between the semantic contexts. With skip-connections, the network can fuse low-level spatial information with high-level contextual information to obtained high-resolution feature representations with high-level semantic features.

Because of the different sizes and shapes of the masses, breast masses usually show multi-scale features. Besides, as the network depth increases, feature representations with different spatial resolutions will be generated after each downsampling, and the scale of the corresponding segmentation target will also change accordingly. However, directly fusing the low-level spatial features and the high-level semantic features, the multi-scale contextual information will be ignored. Inspired by the multi-level strategy literature [6], one adopts a multi-level feature extraction strategy to capture the low-level texture information better. Especially, the edge information of the masses is critical, which makes the mass segmentation more accurate.

Therefore, one can embed conventional multi-scale feature extraction modules, such as ASPP, into different feature levels to extract multi-scale information of feature maps with different resolutions. However, the spatial resolution of feature maps produced by multi-feature levels is different, so the network has different requirements for the size of the convolution kernel used to extract features. Nevertheless, ASPP does not adjust the size of the convolution kernel according to the scale information of the actual input. Specifically, the contribution value of each convolution kernel in ASPP is equal, which makes the network unable to capture more discriminating feature information. For example, when using a relatively larger-scale convolution kernel to sample feature maps with smaller masses, because of ASPP lack of adaptability to select a receptive field with a reasonable scale, useless information may be introduced into the feature map to interfere with the segmentation results. Furthermore, using linear fusion methods, such as concatenation and add strategy, to aggregate multi-scale information in different branches may not be sufficient to provide the network with strong multi-scale adaptability [49].

Motivated by the above problems, we design an adaptive scale module (ASM). The structure is shown in Fig. 1. Unlike U-Net, ASM aims to select a convolution kernel of a suitable scale for convolution according to the scale of masses in different resolutions feature maps, which extracts features information of different scales in a more effective manner. In addition, the multi-scale information is aggregated in a non-linear manner. ASM be embedded into each level of skip-connections in ASU-Net, which is employed to aggregate multi-scale features in a non-linear manner and improve the segmentation performance. ASM includes Multi-scale Feature Selection Branch and Global Feature Extraction Branch.

Fig. 1

Adaptive scale module (ASM) contains Multi-scale Feature Selection Branch and Global Feature Extraction Branch.

Multi-scale Feature Selection Branch: The Multi-scale Feature Selection Branch contains three convolution kernels of different sizes to provide multiple receptive fields and effectively extract multi-scale contextual information. Compared to ASPP, the Multi-scale Feature Selection Branch can tackle changes in the resolution of feature maps due to downsampling. According to the resolution of the input, the Multi-scale Feature Selection Branch adaptively selects the appropriate convolution kernel from three branches, which contains different dilation ratios convolution kernels, then the contextual information of the corresponding feature map is extracted. Therefore, the network can adaptively select features of different scales to adjust the receptive field in the inference process, instead of aggregating three fixed convolution kernels linearly to extract multi-scale features.

We use three 3 × 3 kernels with different dilation ratios to convolve the input X ∈ R^C×H×W. The dilation ratios are set to 6, 12, 18 [27, 30]. Through three different convolution operations $\hat{T}$ : X → A ∈ R^C×H×W, $\tilde{T}$ : X → B ∈ R^C×H×W, and $\bar{T}$ : X → C ∈ R^C×H×W, A, B, and C are generated, which carry information of different scales. $\hat{T}$ , $\tilde{T}$ and $\bar{T}$ all contain BN [44] and ReLU [26] function. To use multi-scale feature information to guide the next neuron to adaptively select the receptive field, we fuse three branch results via element summation to obtain X′ ∈ R^C×H×W. This process can be expressed as Equation (1): $X^{'} = A + B + C$ (1)

Then X′ obtains the global information G ∈ R^C×1×1 of the feature map via the global average pooling layer. Specifically, the spatial dimension H × W of the C channels in the feature map is compressed to 1 × 1. Then, let it enter a fully connected layer to prepare for the guidance of adaptive selection. The dimensionality is reducted, in order to reduce the parameter overhead. The compressed feature S ∈ R^C/d×1×1 is generated, where hyperparameter d is the reduction ratio. S gathers a lot of feature information with fewer channels. This not only reduces the number of network parameters but also allows the network to perform data training and feature extraction more intuitively and effectively. In all the experiments we conducted, d is set to 16 [18]. For a detailed discussion of the value of d, see section 4.3.3.

Next, the feature S guides the subsequent process of accurately and adaptively selecting the size of the convolution kernel. Specifically, the feature S generates a, b, and c ∈R^C×1×1 respectively via three fully connected layers in parallel. Where a, b, and c corresponds to three receptive filed sizes in this feature level. To achieve this goal, the softmax function is employed to calculate three related weights of a, b, and c and generates feature vectors a′, b′, and c′ with spatial attention weights, which represent the soft attention vectors of A, B, and C, respectively. Then, these three soft attention vectors are used to dynamically adjust the appropriate receptive field. Their formulas are shown in Equations (2), (3) and (4): $a_{i}^{'} = \frac{e^{a_{i}}}{e^{a_{i}} + e^{b_{i}} + e^{c_{i}}}$ (2) $b_{i}^{'} = \frac{e^{b_{i}}}{e^{a_{i}} + e^{b_{i}} + e^{c_{i}}}$ (3) $c_{i}^{'} = \frac{e^{c_{i}}}{e^{a_{i}} + e^{b_{i}} + e^{c_{i}}}$ (4) Where a_i is the ith element of a, and $a_{i}^{'}$ is the ith element of a′, likewise b_i, c_i, $b_{i}^{'}$ , and $c_{i}^{'}$ .

Then a′, b′ and c′ are weighted with A, B, and C respectively to obtain A′, B′, and C′. Finally, the weighted features are added pixel by pixel for fusion, and the feature map X″ after adaptive scale selection is generated. It can be expressed as Equation (5): $\begin{matrix} X_{i}^{=} A_{i}^{'} + B_{i}^{'} + C_{i}^{'} \\ = A_{i} \cdot a_{i^{'}} + B_{i} \cdot b_{i^{'}} + C_{i} \cdot c_{i^{'}} \end{matrix}$ (5) Where $a_{i}^{'} + b_{i}^{'} + c_{i}^{'} = 1$ , $X^{″} = [X_{1}^{″}, X_{2}^{″}, \dots, X_{i}^{″}, \dots, X_{C}^{″}]$ , and $X_{i}^{″} \in R^{C \times H \times W}$ denotes the feature map of the ith channel after adaptive scale selection.

Global Feature Extraction Branch: The Global Feature Extraction Branch includes a global average pooling layer, a 1 × 1 convolutional layer, and bilinear interpolation. In the output feature of each stage of the encoder, each learned filter is operated to use a local receiving field. Therefore, each area of the output feature is unable to use the global contextual information outside the unit. Inspired by ParseNet [47] and PSPNet [15], we can capture the global contextual semantic information of the feature by using the global average pooling layer. Next, a 1 × 1 convolutional layer is employed to refine the global features. Finally, the bilinear interpolation method is used to restore the feature map with global long contextual information to the original resolution. The feature map Y with global information is generated. This process can be expressed as Equation (6): $Y = η (W (F_{GAP} (X)))$ (6)

Where F_GAP denotes a global average pooling layer, W denotes a 1 × 1 convolutional layer, and η denotes the bilinear interpolation.

In the ASM, the Multi-scale Feature Selection Branch and the Global Feature Extraction Branch are parallel structures. The results obtained via two branches are concatenated with the input X to enhance the global contextual representation of the input X. The overall process can be summarized as Equations (7), (8) and (9): $X^{″} = B_{s} (X)$ (7) $Y = B_{e} (X)$ (8) $\hat{X} = CONCAT [X^{″}, Y, X]$ (9)

Where B_s (X) denotes the process that the input X obtains the feature map X″ ∈ R^C×H×W via the Multi-scale Feature Selection Branch, B_e (X) denotes the process that the input X obtains the feature map Y ∈ R^C×H×W via the Global Feature Extraction Branch, and CONCAT[X″, Y, X] denotes the concatenation of X″, Y and the input X to obtain $\hat{X} \in R^{3 C \times H \times W}$ .

Finally, a 1 × 1 convolutional layer is employed to restore the channel number of the feature map $\hat{X}$ to the channel dimension of the input X. The process is as Equation (10): $\tilde{X} = \tilde{W} (\hat{X})$ (10)

Where $\tilde{W} \in R^{C \times 3 C}$ refers to a 1 × 1 convolution layer.

In summary, the feature map $\tilde{X}$ contains information with adaptive multi-scale selection weights and spatial information with the global contexts. The embedding of ASM not only allows the network to capture the multi-scale features of masses but also provides the network strong multi-scale adaptability. Thus, the flexibility and adaptability of the network to extract the multi-scale features are enhanced, and the segmentation performance of the network for breast masses is improved. It is worth mentioning that ASM is an embedded module, which can be flexibly embedded in any feature map to enhance feature representation.

3.2 Feature refinement module (FRM)

In U-Net’s decoder, multiple upsampling is used to gradually restore feature maps containing high-level semantic information to the original resolution. After each upsampling, the low-level features are directly concatenated with the corresponding high-level features in the encoder. Then, two 3 × 3 convolutional layers are employed to learn the fused features. The simple encoder-decoder structure can fuse low-level spatial information with high-level semantic information to achieve segmentation.

However, it may not be enough that using only convolution operations to refine the fusion of two types of features. In feature fusion, we can suppress the channel of useless information and enhance the channel of useful information to achieve the enhancement of feature representation.

Inspired by the channel attention module [18], we design the feature refinement module (FRM) and embedded the FRM into each feature level of the decoder of ASU-Net. The structure of FRM is shown in Fig. 2. Firstly, FRM concatenates the low-level spatial information generated by the encoder with the advanced features generated by the decoder. Then, we use a reweighting mechanism to model the relationship between low-level feature channels with discriminative feature and high-level feature channels with strong semantic consistency in a computationally effective manner. The decoder obtains channel-wise dependence through FRM, adds attention to the selected channel of the feature map, and enhances the representation of useful features while suppressing the representation of useless features which ensure that the network effectively increases the attention to useful feature information.

Fig. 2

The structure of the feature refinement module (FRM). The low-level feature map and the high-level feature map are used as input. The channel-wise relationships of the features in the two branches are captured through the FRM, which strengthens the feature representation of the useful channel information.

We think that the features of the two branches are different at the feature representation level. Therefore, after concatenating two branch features, we use a 3 × 3 convolutional layer with BN to refine it. Then, the network captures aggregated global contextual information via a global average pooling layer. At this time, the global feature map X ∈ R^2C×1×1 is generated. This process can be explained as Equation (11): $\begin{matrix} x_{c} = F_{GAP} (M_{c}) \\ = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} m_{c} (i, j), c \in [1, 2 C] \end{matrix}$ (11)

Where the concatenated feature map has a total of 2C channels, M_c ∈ R^1×H×W denotes the cth element of the feature map M ∈ R^2C×H×W obtained after 3 × 3 convolutional layers, F_GAP denotes the global average pooling layer, and x_c ∈ R^1×1×1 denotes the cth element of the global feature map X ∈ R^2C×1×1.

The global feature map X passes through two fully connected layers for refining and obtains the feature map S ∈ R^2C×1×1. Then, the Sigmoid function is employed to reweight the channels of the feature map S to generate the channel attention feature vector. Finally, the channel attention feature vector and the feature map M are multiplied for weighting to obtain the feature map Y ∈ R^2C×H×W. This process be expressed as Equation (12): $Y = M \cdot δ (W_{2} γ (W_{1} X))$ (12)

Where W₁ ∈ R^2C×2C and W₂ ∈ R^2C×2C denote two fully connected layers, γ denotes the ReLU function, and δ denotes the Sigmoid function.

3.3 Network architecture

Based on the adaptive scale module (ASM) and feature refinement module (FRM), we propose the U-shape adaptive scale network (ASU-Net) and apply it to the work of mass segmentation. The network architecture is shown in Fig. 3.

Fig. 3

The structure of ASU-Net. According to the features of different scales of the masses, the feature of the appropriate scale is adaptively selected via the ASM in each layer of the encoder, which improves the effectiveness of the network in extracting multi-scale information. Besides, the channel attention of the useful channels is added via FRM in the decoder, which enhances the feature representation ability of the network.

ASU-Net uses an encoder-decoder structure and extends the standard U-Net model. ASU-Net mainly includes the ASM and the FRM. Through many tests, we use the pre-trained ResNet34 [23] as the backbone. In the multi-level skip-connections, the ASM is embedded to guide the aggregation of multi-scale information in a non-linear manner, so that the network can further effectively utilize the multi-scale features of the mass. At the same time, FRM is used to replace the feature fusion method in the standard U-Net, which enhances the feature representation capability of the network.

4 Experiment

4.1 Database

In this work, two commonly public mammogram databases including the DDSM-BCRP database [36] and the INbreast database [16] are used to evaluate the performance of ASU-Net. The DDSM-BCRP database contains two subsets which include mass and calcification. The mass subset contains 316 mammograms, which has a total of 79 cases including 160 mammograms of breast masses with annotations. Generally, the annotations provided by the DDSM-BCRP database are inaccurate [1, 16], which may cause problems such as unstable segmentation results of the model. Therefore, we select the mass subset of the DDSM-BCRP database to tackle this problem. Where 80 mammograms are used for training and 80 mammograms are used for testing. The INbreast database contains 410 mammograms, which has 115 cases in total including 116 mammograms of breast masses with annotations. To facilitate the comparison with the previous method, we extract the region of interest (ROI) from the patch centered on masses. Taking into account the size of the GPU memory, we set the resolution of the ROI to 256×256 pixels according to the size of the model input port. Finally, we refer to the setting method [12 , 48] to divide the ROI into a training set and a testing set. The results of the division of training set and testing set are shown in Table 1.

Table 1
The results of the division of training set and testing set

Database Training set Testing set

DDSM-BCRP 80 ROIS 80 ROIS

INbreast 58 ROIS 58 ROIS

Database	Training set	Testing set
DDSM-BCRP	80 ROIS	80 ROIS
INbreast	58 ROIS	58 ROIS

4.2 Experimental configurations

ASU-Net is implemented based on the deep learning framework Pytorch, using an i5 Intel CPU with 8GB RAM and GTX1080 GPU with 8GB. We use the cross-entropy loss function to calculate the loss between the network output and the ground truth, which assists the network in training. The optimization algorithm during training is Stochastic Gradient Descent (SGD). The initial learning rate is set to 0.05, which decayed by 10 times every 5 epoch. The momentum is set to 0.9 and the weight attenuation is set to 0.0001. The DI is a common evaluation index in medical images, which is employed to measure the degree of overlap between two samples. For example, The DI is used in related works [12 , 48]. We also use the DI to evaluate our method. The DI is defined as Equation (13): $DI = \frac{2 \times TP}{2 \times TP + FP + FN}$ (13)

Where TP refers to the pixel numbers of mass that are correctly predicted as a mass, FP refers to the pixel numbers of the surrounding tissue that are incorrectly predicted as a mass, and FN refers to the pixel numbers of mass that are incorrectly predicted as surrounding tissue.

4.3 Ablation studies

4.3.1 Ablation study of backbone network

Generally, the role of the encoder in the segmentation work is to extract the feature of the image. Its structure is relatively similar, most of which come from the network structure used in the field of image classification, such as VGGNet [24], ResNet [23] series, etc. The reason for choosing a classification network is that the segmentation network can be trained in a large database to obtain the weight parameters of the network, and then a better segmentation effect can be achieved through migration learning. The ResNet series can alleviate the problems of gradient disappearance, explosion, and network degradation caused by the increase in the number of network layers. We prefer the ResNet series to replace the encoder in U-Net. It is crucial to choose the right encoder according to the nature of the segmentation work. Therefore, we regard each network in the ResNet series as the backbone, then build a corresponding segmentation network based on ASU-Net. The experimental results are shown in Table 2.

Table 2
The DI (%) of network which uses the ResNet series as the backbone network

Method Backbone DDSM-BCRP INbreast

ASU-Net ResNet18 91.27 ± 0.0833 93.06 ± 0.0361

ResNet34 91.41 ± 0.0416 93.55 ± 0.0351

ResNet50 91.22 ± 0.1266 93.27 ± 0.0917

ResNet101 91.19 ± 0.0854 93.51 ± 0.0115

ResNet152 91.21 ± 0.0777 93.43 ± 0.1222

Method	Backbone	DDSM-BCRP	INbreast
ASU-Net	ResNet18	91.27 ± 0.0833	93.06 ± 0.0361
	ResNet34	91.41 ± 0.0416	93.55 ± 0.0351
	ResNet50	91.22 ± 0.1266	93.27 ± 0.0917
	ResNet101	91.19 ± 0.0854	93.51 ± 0.0115
	ResNet152	91.21 ± 0.0777	93.43 ± 0.1222

As shown in Table 2, the segmentation network using ResNet34 as the backbone obtains the highest DI on both the DDSM-BCRP database and the INbreast database. Moreover, the segmentation network with ResNet18, ResNet50, ResNet101, and ResNet152 as backbone performs worse than ResNet34 in the DDSM-BCRP database and the INbreast database. The comparison in Table 2 reveals that the segmentation performance of the ASU-Net does not improve as the number of network layers increases. Generally, the depth of CNN affects the learning ability of its network. The deeper CNN structure, the stronger their learning ability. Note that a larger training set is usually required to make a deeper CNN have good generalization [3]. However, a large amount of accurately labeled data is lacking in the field of breast mass segmentation based on deep learning. Therefore, we need to select the appropriate CNN as the encoder for breast mass segmentation works. Eventually, ResNet34 is regarded as the encoder in ASU-Net.

4.3.2 Ablation study of the multiscale feature extraction module

In this section, how to embed multi-scale information in the feature level is more effective is discussed. Then, the implementation of ASM is introduced. As introduced in Section 3.1, inspired by the multi-level strategy, the multi-scale feature extraction module is embedded in the multi-level feature level of ASU-Net. We use two embedding strategies to verify the performance of the module in extracting multi-scale information from feature maps of different resolutions. One is to embed the conventional multi-scale feature extraction module ASPP into each layer of skip-connections to form RU-Net (ASPP), and the other is to embed ASM into each layer of skip-connections to form ASU-Net*. The experimental results are shown in Table 3.

Table 3
The DI (%) achieved by using two different multi-scale feature extraction modules, ASPP and ASM

Module DDSM-BCRP INbreast

RU-Net(ASPP) 91.21 ± 0.0379 92.94 ± 0.2991

ASU-Net* 91.34 ± 0.0346 93.32 ± 0.0985

Module	DDSM-BCRP	INbreast
RU-Net(ASPP)	91.21 ± 0.0379	92.94 ± 0.2991
ASU-Net*	91.34 ± 0.0346	93.32 ± 0.0985

Table 3 shows that the segmentation performance of ASU-Net* in the DDSM-BCRP database and the INbreast database is better than RU-Net (ASPP). In particular, taking the INbreast database as an example, ASM increases the segmentation performance from 92.94% to 93.32%, which is a more obvious improvement. In the DDSM-BCRP database, ASM is only 0.13% better than ASPP in terms of the DI. We speculate that this may be the reason for the inaccurate labeling of the DDSM-BCRP database. The results in Table 3 illustrate using ASM to extract multi-scale feature information can achieve better segmentation performance. Specifically, the ASM has the function to adaptively guide the aggregation of multi-scale features in a non-linear manner, which can effectively complete the work of extracting feature information of different scales.

4.3.3 Ablation study for hyperparameter d

In this section, the hyperparameter d mentioned in Section 3.1 is discussed. By setting the hyperparameter d, the parameter capacity and calculation cost of ASM can be controlled. Referring to the relevant experimental settings in SENet [18], four values are set for the hyperparameter d in ASM, which are 4, 8, 16, and 32 respectively. To study this relationship, we conduct a series of ablation experiments with different values of d based on ASU-Net.

As is illustrated in Table 4, the segmentation performance of the network does not improve with the increase of parameter capacity. We speculate that this may be caused by ASM’s overfitting of the correlation characteristics between channels in the training set. It is worth noting that when d = 16, a good compromise can be provided for the network between segmentation performance and computational complexity. In this way, the parameter scale of the network can be reduced without reducing the network performance, thereby saving the calculation cost. Therefore, we use 16 as the value of the hyperparameter d.

Table 4
The DI (%) metric comparison at the different value of d in ASM

Reduction ratio d DDSM-BCRP INbreast

4 91.39 ± 0.0002 93.20 ± 0.0529

8 91.33 ± 0.0404 93.26 ± 0.1137

16 91.41 ± 0.0416 93.55 ± 0.0351

32 91.35 ± 0.0709 93.13 ± 0.0808

Reduction ratio d	DDSM-BCRP	INbreast
4	91.39 ± 0.0002	93.20 ± 0.0529
8	91.33 ± 0.0404	93.26 ± 0.1137
16	91.41 ± 0.0416	93.55 ± 0.0351
32	91.35 ± 0.0709	93.13 ± 0.0808

4.3.4 Ablation study for FRM

The FRM is embedded after each feature level of the decoder of ASU-Net. As described in section 3.2, after concatenating low-level features and advanced features, a channel attention mechanism is added to complete the refining of features and enhance the feature representation of the useful information. To verify the effectiveness of FRM, ASU-Net* which only keeps the ASM is used for the ablation experiment about FRM. The experimental results are shown in Table 5.

Table 5
The DI (%) of the two networks in the DDSM-BCRP database and the INbreast database

Module DDSM-BCRP INbreast

ASU-Net* 91.34 ± 0.0346 93.32 ± 0.0985

ASU-Net 91.41 ± 0.0416 93.55 ± 0.0351

Module	DDSM-BCRP	INbreast
ASU-Net*	91.34 ± 0.0346	93.32 ± 0.0985
ASU-Net	91.41 ± 0.0416	93.55 ± 0.0351

Table 5 shows the segmentation performance of ASU-Net* and ASU-Net. The segmentation accuracy achieved by ASU-Net is higher than that of ASU-Net*. In particular, in the INbreast database, ASU-Net outperforms ASU-Net* by 0.23% in terms of the DI, which proves that FRM can ensure the network effectively increase its attention to useful feature information. Besides, we calculate the parameters of the network. As shown in Table 7, the parameter amount of ASU-Net only increased by about 3.97 M after embedding FRM. It can be shown that FRM can bring about 0.1 0.2% improvement for the mass segmentation with its slight increase in parameter. This shows that the performance of FRM is good.

Table 6

The DI (%) of networks which include ASU-Net and the latest segmentation methods

Reference	DDSM-BCRP	INbreast
Cardoso et al. [22]	88.00	N/A
Beller et al. [32]	N/A	70.00
U-Net [41]	90.82 ± 0.0709	91.48 ± 0.2476
Dhungel et al. [38]	90.00	90.00
Zhu et al. [48]	89.36 ± 0.3700	90.62 ± 0.1600
Li et al. [13]	90.95 ± 0.2600	93.32 ± 0.1200
ASU-Net	91.41 ± 0.0416	93.55 ± 0.0351

Table 7

Params(M) and FLOPS(G) of each network

Module	U-Net	RU-Net	RU-Net (ASPP)	ASU-Net*	ASU-Net
Params	31.04	24.53	36.09	38.14	42.07
FLOPS	54.69	10.92	18.92	21.51	25.75

4.4 Comparison with other methods

In order to demonstrate the superiority of ASU-Net, we quantitatively compare it with some of the latest breast mass segmentation methods. The comparison results are shown in Table 6.

As shown in Table 6, ASU-Net achieves the best DI of 91.41% on the DDSM-BCRP database and the best DI of 93.55% on the INbreast database. Compared with U-Net serving as the baseline, the DI of ASU-Net in the DDSM-BCRP database increased from 90.82% to 91.41%, and the DI of ASU-Net in the INbreast database increased from 91.48% to 93.55%. This result illustrates the significant performance improvement induced by ASM and FRM. Specifically, ASM can enhance the ability of the network to adaptively extract the multi-scale features of the mass. FRM can ensure that the network effectively increases its attention to useful feature information. Compared with other state-of-the-art methods, ASU-Net has also obtained obvious advantages, which fully indicates the effectiveness of ASU-Net.

4.5 Computational complexity analysis

In this section, we analyze the computational complexity and the amount of parameters of baseline, backbone and a series of ablation experiments. In Table 7, the FLOPS and the number of parameters of each network are listed. The FLOPS shows the complexity of the network. The amount of parameters shows the number of weights for the network.

As shown in Table 7, U-Net has 31.04 M parameters, and its FLOPS reach 54.69 G. RU-Net, the backbone model we chose, has 24.53 M parameters and its FLOPS are only 10.92 G, compared with U-Net. This shows that using RU-Net as the backbone can not only save computational costs but also simplify the complexity of the network. For the multi-scale feature extraction module, ASM is slightly higher than ASPP in terms of parameter count and computational complexity. Note that ASU-Net* increases the amount of calculation within a reasonable range while also improving the segmentation performance of the network. For details, see section 4.3.2. In particular, the proposed ASU-Net based on the backbone RU-Net not only improves the segmentation performance of the model but also greatly reduces the complexity of the model by adding fewer parameters. ASU-Net reduces 28.94 G in FLOPS compared to U-Net. The reduction of the computational complexity of ASU-Net raises the inference speed of the network.

4.6 Qualitative analysis

In order to make a more adequate evaluation of the segmentation results of ASU-Net, the experimental results are visualized. Figure 4 shows some visual examples of segmentation results.

Fig. 4

Partial visual examples of segmentation results in the INbreast database (row 1 to 3) and the DDSM-BCRP database (row 4 to 6). From left to right, these columns correspond to the visualization results of the input image, U-Net, RU-Net, and ASU-Net. The red outline is the ground truth, and the green outline is the predicted result generated by the corresponding model.

For images with relatively regular mass shapes, U-Net can obtain good segmentation results. In this case, the segmentation performance of ASU-Net is only slightly higher than that of U-Net, such as the segmentation result of the third row. However, for images with irregular shapes, U-Net and RU-Net have more false-negative samples and false-positive samples. In comparison with U-Net and RU-Net, the segmentation performance of ASU-Net is more excellent, such as the segmentation result image in the first row. We speculate that the conventional skip-connections in the standard U-Net or RU-Net just sends the low-level spatial information in the encoder directly to the decoder. This causes the network to ignore the multi-scale information of the multi-level feature level, which makes the network to be unable to capture the feature information of different shapes and sizes of masses.

When ASM is embedded in the multi-level feature level, multi-scale features are introduced into the network, which improves the extraction performance of multi-scale information of the network. Specifically, according to feature maps of different resolutions, ASM can learn the nonlinear interaction between convolution kernels of different sizes, and adaptively select a convolution kernel of appropriate size to complete the extraction of the feature. Then, they merge with the high-level semantic features in the decoder through skip-connections. In addition, FRM is introduced to integrate the channel-wise relationships to improve the feature fusion performance in the decoder. FRM improves the fusion method of each level in the decoder, selectively strengthens the representation of low-level spatial features and high-level semantic features, which effectively enhancing the feature representation of the network.

5 Conclusion

In this article, adaptive scale module (ASM) and feature refinement module (FRM) are proposed. ASM can adaptively extract multi-scale information according to different scale masses. FRM can integrate the channel-wise relationships among channels, which allows the decoder to obtain channel dependencies. Based on ASM and FRM, ASU-Net can make full use of the multi-scale feature information extracted from the encoder. Moreover, the network can selectively enhance the feature representation of useful channels in the decoder to achieve effective feature fusion. Finally, our proposed ASU-Net obtains the DI of 91.41% and 93.55% in the DDSM-BCRP database and the INbreast database, respectively. A series of experimental results show that ASU-Net effectively improves the false positive problem of mass segmentation and improves segmentation performance.

It is noteworthy that, the design of the current study like most studies is subject to certain restrictions. To compare with other related algorithms, we extract ROI from the database in advance. On this basis, ASU-Net shows good segmentation performance, which means that this method is promising. However, our method has certain limitations. Currently, our method can only have excellent breast mass segmentation performance in a small ROI. Enhancing the robustness of the model will be our future study work.

Conflict of interest

The authors declare that they have no conflict of interest.

Footnotes

Acknowledgments

We would like to thank the Breast Research Group, INESC Porto, Portugal for the INbreast database. This work is jointly supported by the National Natural Science Foundation of China (Nos. 61662062) and Sub-project of Qinghai Province Major Science and Technology Project (Nos. 2019-ZJ-A10).

References

Horsch

, Hapfelmeier

and Elter

, Needs assessment for next generation computer-aided mammography reference image databases and evaluation studies, International Journal of Computer Assisted Radiology and Surgery 6(6) (2011), 749–767. doi: 10.1007/s11548-011-0553-9. PubMed PMID: WOS:000295680700003

Kabel

A.M.

and Baali

F.H.

, Breast Cancer: Insights into Risk Factors, Pathgenesis, Diagnosis and Management, Journal of Cancer Research and Treatment 3(2) (2015), 28–33. doi: 10.12691/jcrt-3-2-3.

Sahiner

, Petrick

, Chan

H.P.

, Hadjiiski

L.M.

, Paramagul

, Helvie

M.A.

, et al., Computer-aided characterization of mammographicmasses: Accuracy of mass segmentation and its effects oncharacterization, Ieee Transactions on Medical Imaging 20(12) (2001), 1275–1284. doi: 10.1109/42.974922. PubMed PMID: WOS:000173296700008

Dilaveri

, Klassen

, Fazzio

and Ghosh

, Breast Cancer Screening for Women at Average Risk, Current Breast Cancer Reports 11(3) (2019), 123–128. doi: 10.1007/s12609-019-00324-4

Szegedy

, Liu

, Jia

, Sermanet

, Reed

, Anguelov

, et al., Going Deeper with Convolutions. 2015 Ieee Conference on Computer Vision and Pattern Recognition, IEEE Conference on Computer Vision and Pattern Recognition (2015), 1–9.

, Wang

, Peng

, Gao

, Yu

, Sang

, et al., Learning a Discriminative Feature Network for Semantic Segmentation, 2018 Ieee/Cvf Conference on Computer Vision and Pattern Recognition, IEEE Conference on Computer Vision and Pattern Recognition (2018), 1857–1866.

Lee

C.H.

, Dershaw

D.D.

, Kopans

, Evans

, Monsees

, Monticciolo

, et al., Breast cancer screening with imaging: recommendations from the Society of Breast Imaging and the ACR on the use of mammography, breast MRI, breast ultrasound, and other technologies for the detection of clinically occult breast cancer, J Am Coll Radiol 7(1) (2010), 18–27. doi: 10.1016/j.jacr.2009.09.022. PubMed PMID: 20129267

Ghieh

, Saade

, Najem

, El Zeghondi

, Rawashdeh

M.A.

and Berjawi

, Staying abreast of imaging - Current status of breast cancer detection in high density breast, Radiography (London, England: 1995) 27(1) (2021), 229–235. doi: 10.1016/j.radi.2020.06.003. PubMed PMID: MEDLINE:32611494

Guliato

, Rangayyan

R.M.

, Carvalho

J.D.

and Santiago

S.A.

, Polygonal modeling of contours of breast tumors with the preservation of spicule’s, Ieee Transactions on Biomedical Engineering 55(1) (2008), 14–20. doi: 10.1109/tbme.2007.899310. PubMed PMID: WOS:000251908300002

10.

Chan

D.S.M.

, Abar

, Cariolou

, Nanu

, Greenwood

D.C.

, Bandera

E.V.

, et al., World Cancer Research Fund International: Continuous Update Project-systematic literature review and meta-analysis of observational cohort studies on physical activity, sedentary behavior, adiposity, and weight change and breast cancer risk, Cancer Causes Control 30(11) (2019), 1183–1200. doi: 10.1007/s10552-019-01223-w. PubMed PMID: 31471762

11.

Muralidhar

G.S.

, Haygood

T.M.

, Stephens

T.W.

, Whitman

G.J.

, Bovik

A.C.

and Markey

M.K.

, Computer-aided detection of breast cancer - have all bases been covered? Breast Cancer: Basic and Clinical Research 2 (2008), 5–9. PubMed PMID: MEDLINE:21655364

12.

, Chen

, Nailon

W.H.

, Davies

M.E.

and Laurenson

, Ieee. A Deep dual-path network for improved mammogram image processing, 2019 Ieee International Conference on Acoustics, Speech and Signal Processing. International Conference on Acoustics Speech and Signal Processing ICASSP (2019), 1224–1228.

13.

, Chen

, Nailon

W.H.

, Davies

M.E.

and Laurenson

, Improved Breast Mass Segmentation in Mammograms with Conditional Residual U-Net, In: D. Stoyanov, Z. Taylor, B. Kainz, G. Maicas, R.R. Beichel, editors. Image Analysis for Moving Organ, Breast, and Thoracic Images, Lecture Notes in Computer Science 11040 (2018), 81–89.

14.

Min

, Wilson

, Huang

, Liu

, Crozier

, Bradley

A.P.

, et al., Fully Automatic Computer-aided Mass Detection and Segmentation via Pseudo-color Mammograms and Mask R-CNN. 2020 Ieee 17th International Symposium on Biomedical Imaging, IEEE International Symposium on Biomedical Imaging (2020), 1111–1115.

15.

Zhao

, Shi

, Qi

, Wang

and Jia

, Pyramid Scene Parsing Network. 30th Ieee Conference on Computer Vision and Pattern Recognition, IEEE Conference on Computer Vision and Pattern Recognition (2017), 6230–6239.

16.

Moreira

I.C.

, Amaral

, Domingues

, Cardoso

M.J.

and Cardoso.

J.S.

, INbreast: Toward a Full-field Digital Mammographic Database, Academic Radiology 19(2) (2012), 236–248. doi: 10.1016/j.acra.2011.09.014. PubMed PMID: WOS:000299245100016

17.

, Liu

, Tian

, Li

, Bao

, Fang

, et al., Dual Attention Network for Scene Segmentation. 2019 Ieee/Cvf Conference on Computer Vision and Pattern Recognition, IEEE Conference on Computer Vision and Pattern Recognition (2019), 3141–3149.

18.

, Shen

and Sun

, Squeeze-and-Excitation Networks, IEEE Conference on Computer Vision and Pattern Recognition (2018).

19.

Levman

, Warner

, Causer

and Martel

, Semi-Automatic Region-of-Interest Segmentation Based Computer-Aided Diagnosis of Mass Lesions from Dynamic Contrast-Enhanced Magnetic Resonance Imaging Based Breast Cancer Screening, Journal of Digital Imaging 27(5) (2014), 670–678. doi: 10.1007/s10278-014-9723-y. PubMed PMID: WOS:000342432200014

20.

, Wu

, Shen

, Zhang

, Chen

, Sun

, et al., A fully automatic computer-aided diagnosis system for hepatocellular carcinoma using convolutional neural networks, Biocybernetics and Biomedical Engineering 40(1) (2020), 238–248. doi: 10.1016/j.bbe.2019.05.008. PubMed PMID: WOS:000528800500019

21.

Long

, Shelhamer

and Darrell

, Ieee. Fully Convolutional Networks for Semantic Segmentation. 2015 Ieee Conference on Computer Vision and Pattern Recognition, IEEE Conference on Computer Vision and Pattern Recognition (2015), 3431–3440.

22.

Cardoso

J.S.

, Domingues

and Oliveira.

H.P.

, Closed Shortest Path in the Original Coordinates with an Application to Breast Cancer, International Journal of Pattern Recognition and Artificial Intelligence 29(1) (2015). doi:10.1142/s0218001415550022. PubMed PMID: WOS:000347966200010

23.

, Zhang

, Ren

and Sun

, Ieee. Deep Residual Learning for Image Recognition. 2016 Ieee Conference on Computer Vision and Pattern Recognition, IEEE Conference on Computer Vision and Pattern Recognition (2016), 770–778.

24.

Simonyan

and Zisserman

, Very deep convolutional networks for large scale image recognition, International Conference on Learning Represetations (2015).

25.

Tabar

, Dean

P.B.

, Chen

T.H.

, Yen

A.M.

, Chen

S.L.

, Fann

J.C.

, et al., The incidence of fatal breast cancer measures the increased effectiveness of therapy in women participating in mammography screening, Cancer 125(4) (2019), 515–523. doi: 10.1002/cncr.31840. PubMed PMID: 30411328; PubMed Central PMCID: PMCPMC6588008

26.

, Choy

C.-S.

and Li

Y.-W.

, Ieee, Deep sparse rectifier neural networks for speech denoising (2016).

27.

Chen

L.-C.

, Papandreou

, Schroff

and Adam

, Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv: 1706.05587, (2017).

28.

Chen

L.-C.

, Papandreou

, Kokkinos

, Murphy

and Yuille

A.L.

, DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Ieee Transactions on Pattern Analysis and Machine Intelligence 40(4) (2018), 834–848. doi: 10.1109/tpami.2017.2699184. PubMed PMID: WOS:000426687100005

29.

Chen

L.-C.

, Papandreou

, Kokkinos

, Murphy

and Yuille

A.L.

, Semantic image segmentation with deep convolutional nets and fully connected crfs, International Conference on Learning Representations (2015).

30.

Chen

L.-C.

, Zhu

, Papandreou

, Schroff

and Adam

, Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss, editors, Computer Vision - Eccv 2018, Pt Vii. Lecture Notes in Computer Science 11211 (2018), 833–851.

31.

Grimm

L.J.

and Mazurowski

M.A.

, Breast Cancer Radiogenomics: Current Status and Future Directions, Academic Radiology 27(1) (2020), 39–46. doi: 10.1016/j.acra.2019.09.012. PubMed PMID: WOS:000503916400006

32.

Beller

, Stotzka

, Müller

T.O.

and Gemmeke

, An example-based system to support the segmentation of stellate lesions, Bildverarbeitung für die Medizin 2005, Algorithmen - Systeme - Anwendungen, Proceedings des Workshops vom 13.-15. Marz 2005 in Heidelberg. (2005), 475–479.

33.

Mohamed Saleck

, El Moutaouakkil

, Rmili

and M. Assoc Comp, Semi-Automatic Segmentation of Breast Masses in Mammogram Images (2018), 59–62.

34.

Al-Antari

M.A.

, Al-Masni

M.A.

, Choi

M.T.

, Han

S.M.

and Kim

T.S.

, A fully integrated computer-aided diagnosis system for digital X-ray mammograms via deep learning detection, segmentation, and classification, Int J Med Inform 117 (2018), 44–54. doi: 10.1016/j.ijmedinf.2018.06.003. PubMed PMID: 30032964

35.

Gemignani

M.L.

, Breast Cancer Screening: Why, When, and How Many? Clinical Obstetrics and Gynecology 54(1) (2011), 125–132. doi: 10.1097/GRF.0b013e318208020d. PubMed PMID: WOS:000286656300018

36.

Sallam

M.Y.

, Bowyer

K.W.

, Woods

, Kopans

D.B.

, Moore

R.H.

and Kegelmeyer

W.P.

, The digital database for screening mammography (DDSM): Lessons learned, Radiology 205 (1997), 323. PubMed PMID: WOS:A1997YD97100322

37.

Al-Najdawi

, Biltawi

and Tedmori

, Mammogram image visual enhancement, mass segmentation and classification, Applied Soft Computing 35 (2015), 175–185. doi: 10.1016/j.asoc.2015.06.029

38.

Dhungel

, Carneiro

and Bradley

A.P.

, Ieee. Deep structured learning for mass segmentation from mammograms, 2015 Ieee International Conference on Image Processing. IEEE International Conference on Image Processing ICIP (2015), 2950–2954.

39.

Dhungel

, Carneiro

and Bradley

A.P.

, Ieee. Tree re-weighted belief propagation using deep learning potentials for mass segmentation from mammograms, 2015 Ieee 12th International Symposium on Biomedical Imaging. IEEE International Symposium on Biomedical Imaging (2015), 760–763.

40.

Dhungel

, Carneiro

and Bradley

A.P.

, Deep Learning and Structured Prediction for the Segmentation of Mass in Mammograms. In: N. Navab, J. Hornegger, W.M. Wells, A.F. Frangi, editors, Medical Image Computing and Computer-Assisted Intervention -Miccai 2015, Pt I. Lecture Notes in Computer Science 9349 (2015), 605–612.

41.

Ronneberger

, Fischer

and Brox

, U-Net: Convolutional Networks for Biomedical Image Segmentation. In: N. Navab, J. Hornegger, W.M. Wells, A.F. Frangi, editors, Medical Image Computing and Computer-Assisted Intervention, Pt Iii. Lecture Notes in Computer Science 9351 (2015), 234–241.

42.

Guzman-Cabrera

, Guzman-Sepulveda

J.R.

, Torres-Cisneros

, May-Arrioja

D.A.

, Ruiz-Pinales

, Ibarra-Manzano

O.G.

, et al., Digital Image Processing Technique for Breast Cancer Detection, International Journal of Thermophysics 34(8-9) (2013), 1519–1531. doi: 10.1007/s10765-012-1328-4. PubMed PMID: WOS:000325815500021

43.

Nithya

and Santhi

, Computer Aided Diagnosis System for Mammogram Analysis: A Survey, Journal of Medical Imaging and Health Informatics 5(4) (2015), 653–674. doi: 10.1166/jmihi.2015.1441. PubMed PMID: WOS:000358625600001

44.

Loffe

and Szegedy

, Batch normalization: Accelerating deep network training by reducing internal covariate shift, International Conference on Machine Learning (2015).

45.

Njor

S.H.

, Vejborg

and Larsen

M.B.

, Breast cancer survivors’ riskof interval cancers and false positive results in organizedmammography screening, Cancer Medicine 9(16) (2020), 6042–6050. doi: 10.1002/cam4.3182. PubMed PMID: WOS:000544122000001

46.

Badrinarayanan

, Kendall

and Cipolla

, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, IEEE Trans Pattern Anal Mach Intell 39(12) (2017), 2481–2495. doi: 10.1109/TPAMI.2016.2644615. PubMed PMID: 28060704

47.

Liu

, Rabinovich

and Berg

A.C.

, Parsenet: looking wider to see better, International Conference on Learning Representations (2016).

48.

Zhu

, Xiang

, Tran

T.D.

, Hager

G.D.

and Xie

, Ieee. Adversarial deep structured nets for mass segmentation from mammograms, 2018 Ieee 15th International Symposium on Biomedical Imaging. IEEE International Symposium on Biomedical Imaging (2018), 847–850.

49.

, Wang

, Hu

, Yang

and Soc

I.C.

, Selective Kernel Networks, 2019 Ieee/Cvf Conference on Computer Vision and Pattern Recognition. IEEE Conference on Computer Vision and Pattern Recognition (2019), 510–519.

50.

Zhu

, Wei

, Gao

, Ding

, Zhang

, Wang

, et al., Fully automatic segmentation on prostate MR images based on cascaded fully convolution network, Journal of Magnetic Resonance Imaging 49(4) (2019), 1149–1156. doi: 10.1002/jmri.26337. PubMed PMID: WOS:000461233600025

51.

Wang

, Zou

, Shen

and Ji

, Non-local u-nets for biomedical image segmentation, AAAI (2020), 6315–6322.

52.

Zhou

, Siddiquee

M.M.R.

, Tajbakhsh

and Liang

, UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation, IEEE Trans Med Imaging 39(6) (2020), 1856–1867. doi: 10.1109/TMI.2019.2959609. PubMed PMID: 31841402; PubMed Central PMCID: PMCPMC7357299.

ASU-Net: U-shape adaptive scale network for mass segmentation in mammograms

Abstract

Keywords

1 Introduction

2 Related work

3 Method

3.1 Adaptive scale module (ASM)

4.1 Database

Table 1 The results of the division of training set and testing set Database Training set Testing set DDSM-BCRP 80 ROIS 80 ROIS INbreast 58 ROIS 58 ROIS

4.3.1 Ablation study of backbone network

Table 3 The DI (%) achieved by using two different multi-scale feature extraction modules, ASPP and ASM Module DDSM-BCRP INbreast RU-Net(ASPP) 91.21 ± 0.0379 92.94 ± 0.2991 ASU-Net* 91.34 ± 0.0346 93.32 ± 0.0985

Table 4 The DI (%) metric comparison at the different value of d in ASM Reduction ratio d DDSM-BCRP INbreast 4 91.39 ± 0.0002 93.20 ± 0.0529 8 91.33 ± 0.0404 93.26 ± 0.1137 16 91.41 ± 0.0416 93.55 ± 0.0351 32 91.35 ± 0.0709 93.13 ± 0.0808

Table 5 The DI (%) of the two networks in the DDSM-BCRP database and the INbreast database Module DDSM-BCRP INbreast ASU-Net* 91.34 ± 0.0346 93.32 ± 0.0985 ASU-Net 91.41 ± 0.0416 93.55 ± 0.0351

4.5 Computational complexity analysis

4.6 Qualitative analysis

Conflict of interest

Footnotes

Acknowledgments

References

Table 1
The results of the division of training set and testing set

Database Training set Testing set

DDSM-BCRP 80 ROIS 80 ROIS

INbreast 58 ROIS 58 ROIS

Table 3
The DI (%) achieved by using two different multi-scale feature extraction modules, ASPP and ASM

Module DDSM-BCRP INbreast

RU-Net(ASPP) 91.21 ± 0.0379 92.94 ± 0.2991

ASU-Net* 91.34 ± 0.0346 93.32 ± 0.0985

Table 4
The DI (%) metric comparison at the different value of d in ASM

Reduction ratio d DDSM-BCRP INbreast

4 91.39 ± 0.0002 93.20 ± 0.0529

8 91.33 ± 0.0404 93.26 ± 0.1137

16 91.41 ± 0.0416 93.55 ± 0.0351

32 91.35 ± 0.0709 93.13 ± 0.0808

Table 5
The DI (%) of the two networks in the DDSM-BCRP database and the INbreast database

Module DDSM-BCRP INbreast

ASU-Net* 91.34 ± 0.0346 93.32 ± 0.0985

ASU-Net 91.41 ± 0.0416 93.55 ± 0.0351