Abstract
BACKGROUND:
Automatic detection of tumor in breast ultrasound (BUS) images is important for the subsequent image processing and has been researched for decades. However, there still lacks a robust method due to poor quality of BUS images.
OBJECTIVE:
To propose and test a salient object detection method for BUS images.
METHODS:
BUS image is preprocessed by an adaptively selective replacement and speckle reducing anisotropic diffusion (SRAD) algorithm. Then, the preprocessed image is segmented into super pixels by a simple linear iterative clustering (SLIC) algorithm to form a graph model, and the saliency of the nodes in the graph is calculated by using the absorbed time of absorbing Markov chain (AMC). Finally, the initial saliency map is optimized by the recurrent time of ergodic Markov chain (EMC) and a distance weighting formula.
RESULTS:
Results of the proposed method were compared both qualitatively and quantitatively with two saliency detection models. It was observed that the proposed method outperformed the comparison models and yielded the highest Accuracy value (97.49% vs. 86.63% and 90.33%) using a dataset of 1000 BUS images.
CONCLUSIONS:
After the adaptively selective replacement, AMC can effectively distinguish tumors from background by random walks.
Keywords
Introduction
Breast cancer is the most common form of cancer among women [1], which alone accounts for 30% all new cancer diagnoses in 2019 [2]. Ultrasound imaging is a widely employed modality for early diagnosis of breast cancer [3], due to its high sensitivity. Computer-aided diagnosis (CAD) system for breast ultrasound (BUS) is utilized as a second reader to improve radiologists’ accuracy in diagnosis [4]. Automatic detection of tumor is one of the most important components in CAD system [5], which can be helpful during screening and breast imaging, reducing processing time by enabling clinicians to focus on images where a lesion is already known to be present. The detecting results not only help to discard the complex unwanted regions, but also serve as an initial step for the subsequent segmentation and classification of tumor.
Many automatic breast tumor detection methods have been proposed in the last few years [5–10]. These approaches often depend on artificially defined features or improve the traditional segmentation methods to locate tumors, which may result in performance degradation in clinical practice due to the various of tumors and the poor quality of BUS images.
Visual saliency estimation (VSE) is a kind of method to find the most salient object on complex scenes based on visual attention mechanism. VSE based detection methods are widely used in natural images, and have achieved great success. Existing VSE based detection methods can be roughly classified into top-down and bottom-up schemes. The top-down methods [11–13] usually based on supervised learning with class labels, thus it is expensive to generate training data [14]. In contrast, the more widely used bottom-up methods [14–20] often explore various heuristic prior cues to guide the saliency detection process, which makes them involuntary and fast. Recently, some VSE based bottom-up detection methods for breast tumor have emerged [21–23], which has attracted increasing attention.
Shao et al. [21] proposed a saliency estimation model, which utilized the prior knowledge of BUS images to segment it into three layers of fat, mammary, and muscle. Based on the located mammary layer, two saliency maps were calculated from the two aspects of anatomy cue and contrast cue, and then the two saliency maps were normalized and multiplied to generate final saliency map. Shao’s method yielded high average precision rate (APR) and average recall rate (ARR) values using their dataset. However, it has three main drawbacks: (i) the use of prior knowledge to automatically segment the image into three layers may fail due to the diversity of BUS images. (ii) It fails to handle the images with large tumors, shadows, and low contrast [23]. (iii) It misses tumors when there is more one lesion in the image.
Xie et al. [22] provided a Itti based model for tumor saliency detection. They applied the Gaussian pyramid to calculate the intensity and blackness multi-scale feature maps of the BUS image, and then used the center-surround operator to fuse the intensity and blackness feature maps, respectively. They combined the intensity and blackness feature maps with a super pixel feature map to obtain the final saliency map, and adopted Winner-take-all (WTA) network to locate tumor as the most salient region. They achieved high accuracy on their dataset, but they also failed when handle the BUS images with multiple tumors due to the strategy of WTA.
Xu et al. [23] put forwarded a hybrid framework for tumor saliency estimation. They formulated the saliency estimation problem as a quadratic programming problem. In their method, the Neutro-Connectedness map, which was used as a measurement of the confidence of the connectedness, the adaptive center-bias, which was used as a visual clue, the regions’ correlation, and the layer structure-based weighted map were incorporated to model both the domain-related knowledge and low-level visual clue. This model can handle images without tumors, however, it shares the drawbacks with [22].
Based on the previous study, a saliency model combined with absorbing Markov chain (AMC) for tumor detection of BUS was proposed, which not only can correctly locate a single tumor, but also accurately locate multiple tumors. In this method, we investigate the relationship between tumor saliency estimation and AMC to develop a graph-based saliency model. The main contributions of this work can be concluded as follows: (i) employed AMC to locate tumors in BUS images and achieved better performance. Previously, AMC was only used to natural images [24–28]. (ii) Proposed a preprocessing method, which improved the accuracy of tumor detection. (iii) The proposed method does not rely on manually designed features, which can overcome the poor quality of ultrasound images and achieve robustness. The rest of the paper is organized as follows. Section 2 describes the proposed method and material used in the method. Section 3 chooses the parameter and gives the experimental results of the proposed method. Section 4 discusses the comparison results of three saliency models and shows the failure case of the present method. The conclusions are presented in Section 5.
Method
The flowchart of the proposed method is illustrated in Fig. 1. The model is composed of three parts: preprocessing, saliency estimation and final saliency map generation. In the preprocessing step, the “white stripes” in the BUS images is removed by adaptive selective replacement, for reducing the difference between all the background regions. Then the speckle reducing anisotropic diffusion (SRAD) [29] algorithm is employed on the replaced image. In the saliency estimation step, the filtered image is segmented into super pixels by simple linear iterative clustering (SLIC) algorithm [30] to construct a graph model. The graph is used to represent the AMC, with all the nodes serving as transient nodes and the boundary nodes duplicated as absorbing nodes. The expected absorbed time of the transient nodes is computed to generate initial saliency map. To obtain the final saliency map, the initial map is updated by the normalized recurrent time of ergodic Markov chain (EMC). In addition, the updated saliency map is refined by a distance weighting formula to obtain the final saliency map, which can be segmented into a binary saliency map by adaptive thresholding.

Flowchart of the proposed method.
Adaptive selective replacement
The “white stripes” in BUS images have high contrast with other background region, which pointed out by red arrows in Fig. 2(a). They may gain high saliency value to lead to errors.
In this step, in order to remove the “white stripes”, an adaptive iterative thresholding [31] is used to find them, which is calculated by:

(a) BUS image with “white stripes” (red arrow pointing), (b) Segmentation result, (c) Replaced image.
Where μ1 and μ2 are the mean intensity of the segmented object and background regions, and the σ1 and σ2 are the corresponding standard deviation, θ is the probability of object. Assuming that σ1 = σ2 = σ, then the threshold T is obtained by:
The T is initialized by:
Where gmax and gmin are the maximum and minimum gray values of the BUS image, respectively. Then the initialized T0 is used to segment image into object and background. Then μ1, μ2, σ1, σ2 and θ are calculated, and Equation 2 update threshold T. Using the updated T to segment image again. Repeat the segmentation and updating until the T value no longer changes, that is, the optimal threshold is obtained.
The segmentation result shows as Fig. 2(b), in which the white corresponds to the “white stripes”. Then values of “white stripes” are replaced with the average intensity of the rows in which “white stripes” are located. As is shown in Fig. 2(c), the replaced image removes all bright spots causing interference, as a result, the complexity of the background area is reduced.
In this step, the replaced image is filtered by SRAD algorithm. SRAD can preserve and even enhance edges by processing input image with adaptive weighted filters. In addition, SRAD allows the background area to be more homogeneous, which can depress the saliency of background more effective. Given the replaced image I0 (x, y), the filtered image I(x, y; t) is obtained by SRAD partial differential equation:
Where t is iteration time, which is set as 10 in our experiment, div is the divergence operator, ∇ is the gradient operator, Ω is a 2-D coordinate grid, ∂Ω denotes the border of Ω,
Where q (x, y ; t) is the instantaneous coefficient of variation, which serves as the edge detector and is defined by:
The q0 (t) is the speckle scale function, which is used to control the amount of smoothing applied to the image by SRAD algorithm and is estimated by:
Where var [z (t)] and z (t) are the intensity variance and mean over the homogeneous area at iteration t, respectively. Figure 3 shows the effect of preprocessing. Figure 3(a-c) is the original image, the replaced image, the filtered image, respectively, and their corresponding final saliency maps are displayed in Fig. 3(d-f). It can be seen from Fig. 3 that the selective replacement eliminates interference with “white stripes” on the detecting process, and the SRAD filtering helps to maintain the integrity of the tumor area and suppress the saliency of the background area.

(a) Original image, (b) Replaced image, (c) Filtered image, (d) Final saliency map of the original image, (e) Final saliency map of the replaced image, (f) Final saliency map of the filtered image.
Graph construction
In this step, the filtered image is segmented into super pixels through SLIC algorithm to construct a graph model. Due to the variety of tumor shapes and sizes in the image, the fixed super pixel number SLIC segmentation is difficult to balance both detail and region integrity for all images. In our study, an adaptive super pixel segmentation method [32] is adopted to determine the number of super pixels. Supposing that the gray level of the filtered images consists of is N, then the distribution of the gray value can be computed as:
Where x = 1, 2, …, N indexes the gray level, and C (x) is the number of pixels in level x, and S is the size of the filtered image. Then, the distribution is used to calculate the entropy value H of the filtered image by:
The maximum value of entropy H is obtained by:
Finally, the number of super pixels K is computed as:
The filtered image is segmented into K super pixels to construct the graph model G = (V, E). In model G, the V is set of nodes, which consists of all the super pixels; and the E is set of edges, which are defined in the same way as Zhang et al. [28]. The weights of edges encode nodal affinity between nodes, which means if the two nodes are similar to each other, the affinity between them is large, and the weight of the edge between them is also large, and vice versa. The edge weight between a pair of nodes is defined as:
Where i and j index the nodes, x i and x j are donated by the mean intensity value of the corresponding super pixel node, the σ is fixed to control the strength of edge weight.
The AMC is represented by the graph model G as follow steps: (i) all the nodes in G are served as transient nodes of AMC; (ii) the boundary nodes in G are duplicated as virtual absorbing nodes of AMC. Since the tumor is hardly connected to the image boundary [9, 22], it is reasonable to select the nodes at the four boundaries of G as the absorbing nodes.
In this section, the absorbed time of the transient node in AMC is used to compute the initial saliency map of BUS image. For the tumor transient nodes having higher contrast to absorbing nodes the absorbed time can be long, and for non-tumor transient nodes similar to the absorbing nodes, the absorbed can be short. Therefore, computing the absorbed time can separate the tumor nodes from the background.
The AMC can be specified by a transition matrix P as the following canonical form:
The entry p
i
j of P is the transition probability between each pair of nodes. The Q consists of the transition probability between transient nodes, while R denotes the probability from any transient node to an absorbing one. 0 is a zero matrix, I is an identity matrix. The computation of saliency value of each super pixel based on the absorbed time of transient nodes relies on the solution of the matrix P, which can be computed as:
Where the affinity matrix A representing the correlation of nodes is calculated as:
Where the δ (i) denotes the nodes connected to node i, and the t means the number of transient nodes, and the r means the number of absorbing nodes.
The degree matrix D is used to normalize A, which is written as a diagonal matrix:
With transition matrix P, the fundamental matrix F of AMC can be obtained by:
The entry f
i
j of F is the expected number of times the chain spends from node i to node j, and the sum ∑
j
f
ij
gives the expected number of times until node i is absorbed. Thus, the absorbed time for each transient node is given by:
Where c is a column vector whose elements are 1. The y is a column vector whose length is the number of transient nodes. The initial saliency map can be obtained by assigning the value of y to the corresponding super pixel as transient node in the graph model G. Figure 4 shows some of the process results at saliency estimation stage. Figure 4(a) is SLIC segmentation of the filtered image, in which all the super pixels represent the transient nodes of AMC. Figure 4(b) illustrates duplicated absorbing nodes, i.e., the super pixels at four boundaries of the BUS image. Figure 4(c) exhibits the initial saliency map, in which the tumor is correctly highlighted, while the saliency of background is not depressed effectively.

(a) Transient nodes (all the super pixels), (b) Absorbing nodes (the super pixels at four boundaries of the image), (c) Initial saliency map, (d) Updated saliency map, (e) Final saliency map, (f) Binary image of final saliency map.
Recurrent time updating
The absorbed time is influenced by edge weight and distance, which means the background nodes far from boundary can also have long absorbed time. In addition, the big number of similar background nodes leading to large transition probabilities between each other, that is, the random walk may transfer many times among similar nodes before absorbed. Inspired by [26], we adopted the normalized recurrent time of EMC to update saliency map, the difference is that we do not have to calculate the “score” because the tumor regions barely have “great” global contrast to background regions. An EMC always reaches equilibrium, which is characterized by the equilibrium distribution π. Equilibrium distribution π can be calculated by the affinity matrix A as:
where i and j are indices of the transient nodes. The recurrent time of a node can be obtained by the reciprocal of π as h
i
= 1/π
i
. The average recurrent time h
i
is used to weight the corresponding f
i
j (element of fundamental matrix F) because the nodes in homogeneous region have small recurrent time, and the weighting column vector u can be computed as:
where i and k are indices of the transient nodes. After the weighting column vector u is obtained, it is used to replace the vector c in the Equation 18 to obtain the updated absorbed time, and then the updated saliency map can be generated by the same way.
A distance weighting formula is defined to further improve the saliency map. A threshold (set as 0.8 based on experimental experience) is chosen to measure the saliency of pixels after the updated saliency map has been normalized to [0, 1]. The pixels of updated map that have a higher value than the threshold form a point set named salient pixels (SP). For a pixel p (x, y) in map, the distance between p (x, y) and every point of SP is calculated and the minimum distance is picked as the value of the distance weighting map (DWM) at position (x, y). The final saliency map is obtained as:
Where S w represents the 2-D updated saliency map, DWM donates the 2-D distance weighting map, i and j index the pixel in corresponding map, and the final saliency map is generated by point multiplication of the two maps. Figure 5 displays some process results of refinement step. Figure 4(d) is the result of recurrent time updating. Figure 4(e) is the result of distance weighting, i.e., the final saliency map of our model. The final saliency map can be segmented into binary image to locate tumor, as the white area shown in Fig. 4(f).

Sketch map of evaluation indices. (a) Detection result (red) and ground truth (green), (b) Area corresponding to TP, FP, TN, and FN.
Data acquisition
In this study, BUS images were collected from the Ultrasound Department of China-Japan Friendship Hospital and West China Hospital of Sichuan University. There were total 1000 images containing 500 benign and 500 malignant tumors as confirmed by biopsy. These images have been previously removed the information of hospitals and patients, etc. These tumors have various sizes and locations, and they were not specifically selected for a certain size and location. For each case, the tumor area is marked manually by the experienced radiologists and its bounding box is obtained to serve as the ground truth (GT).
Evaluation indices
The saliency map is segmented into a binary image, where the region with white pixels represents the located tumor region. The bounding box of the located tumor is used as the detection result (also referred to as region of interest (ROI)) to carry out the quantitative comparison. The relationship between ROI and GT can be seen in Fig. 5(a), in which the ROI (red) and GT (green) are marked in different colors.
The performance of the proposed method is evaluated by sensitivity, specificity, area under the curve (AUC), and Accuracy. Sensitivity measures the degree of the fraction in the GT that is also enclosed by the ROI. Specificity is the degree of the fraction excluded by the GT that is also outside the ROI. AUC captures a single point on the receiver operating characteristic (ROC) curve that measures the method’s ability to avoid false detection [8]. Accuracy reflects the difference between the ROI and GT. Similar to [8, 10], the true positive (TP), false positive (FP), true negative (TN), and false negative (FN) are applied to define the evaluation indices (shown in different colors in Fig. 5(b)). These evaluation indices are defined as:
The experimentation and performance evaluation are carried out on 1000 BUS images qualitatively and quantitatively.
New gray value
In the preprocessing stage, the new gray value of “white stripes” also makes difference to the performance. In our work, six kinds of new gray values are tested: average intensity of the row (AR), average intensity of the column (AC), average intensity of the whole image (AW), the intensity with largest value in histogram (LH), random intensity (RI), intensity of neighbor pixels (NP). The segmentation result of adaptive thresholding is dilated to find the neighbor pixels of the specified pixels. Figure 6 displayed the replaced images and the corresponding final saliency maps by using different new gray value. The indices of all the schemes are calculated on 1000 BUS images and the result is stored in the Table 1. From the final saliency maps and indices values, it can be concluded that when chose AR as the new gray value to replace the specified pixels intensity, the proposed method yields the best performance. Thus, we choose AR as the new gray value to replace the “white stripes” in adaptive selective replacement stage.

Comparison of different new gray values. BUS: BUS image, GT: ground truth, AR, AC, AW, LH, RI, NP: six kinds of new gray values.
The detection results of the present method are shown in Fig. 7. Figure 7(a) shows two benign and two malignant tumors from top to bottom. Figure 7(b) exhibits the tumor area manually outlined by radiologists. Figure 7(c) displays the saliency maps of our method and Fig. 7(d) shows the comparison of the ROI (marked in red) and GT (marked in green). The first BUS image in Fig. 7(a) contains two lesions and the contrast between lesions and background is low. The second benign tumor has blurred boundary and its contrast with background is also not high. The third malignant tumor has irregular shape and significant posterior acoustic shadows below it, which may be wrongly located as tumor in other methods. The last BUS image exhibits a large tumor with irregular shape and inhomogeneity inside the tumor region. These BUS images demonstrate that the proposed method can locate multiple tumors and can be successfully applied to the tumors with low contrast, blurred boundary, shadows, and to the tumors of various shapes, sizes, and positions.
Performance of different new gray values
Performance of different new gray values

(a) BUS images, (b) Manually marked tumor area, (c) Saliency maps of the proposed method, (d) GT (green) and ROI (red).
The performance of the proposed method is further quantified by evaluation indices on benign and malignant tumors, respectively. The calculated results of the two categories are listed in Table 2. It can be observed that the performance of our method on benign tumors is better than that on malignant tumor. From the Accuracy it can be concluded that the proposed method can locate tumors accurately. More specifically, the present method has high sensitivity and specificity, which are widely used in clinical practice.
Performance of the proposed method
Performance of the proposed method
The detection results of the proposed method are compared both qualitatively and quantitatively with the saliency models of Xie [22] and Shao [21].
Qualitative comparison results
For comparison of qualitatively results, the proposed method and two saliency detection models are applied to the same dataset and their detection results demonstrated in Fig. 8. Six different tumors are shown in Fig. 8(a). Figure 8(b-d) displays the saliency maps of the three models, which are marked with ROI in different colors. The comparisons of GT and ROI are given in Fig. 8(e).

In Fig. 8(a), the first two rows show the BUS images containing more than one lesion. It can be seen that the proposed method accurately highlights all lesions, while Shao’s [21] method can only locate one tumor, and the Xie’s [22] method locates in wrong positions. The BUS images of the last four rows contain only one tumor, and these tumors have different characteristics. For the third blurred and irregularly shaped tumor, the proposed method generated the ROI closest to the GT, due to the difference in appearance and spatial distance between the tumor and the absorbing nodes. In the fourth BUS image, the fat and the tumor are close in intensity and spatial distance, which is unfavorable for Shao [21] who needs to perform fat layer separation. It can be seen that Shao’s [21] corresponding ROI does not completely contain the tumor region. In the fifth BUS image, there is a fat region attached to the tumor in the upper left corner, which may cause the Xie’s [22] method based on intensity and blackness features to mistake the tumor region as the background, thus wrongly locating the brighter pixels under the tumor. There are also some tumor-like shadow structures in the lower right corner of the fourth BUS images, which interfere with the detecting process of Shao’s [21] method.
It can be seen that the most salient area in the corresponding saliency map is the tumor-like structure. When the background area is complicated, as shown in the last BUS image, the fat layer above the tumor and the hypoechoic area below the tumor both have a grayscale distribution similar to that of the tumor. This is not only difficult for Xie [22] to exploit the difference in features between the tumor and the background, but also hard for Shao [21] to automatically locate mammary, and it is inconvenient to use anatomy and contrast features. But for the proposed method, these regions can be quickly absorbed by the boundary, for some of its pixels are replicated as the absorbing nodes.
Table 3 demonstrates the quantitative comparison results of the proposed method and the comparison methods. It can be observed from the data in Table 3 that the proposed method has high sensitivity and specificity values. More specifically, the AUC of our method is 19.28% and 7.31% higher than Xie [22] and Shao [21], respectively. In addition, the Accuracy of the proposed method is also the highest, which indicates the adequate region matching between ROI and GT. It can be concluded that the proposed method outperform others in terms of all evaluation indices.
Performance of the proposed method and the comparison methods
Performance of the proposed method and the comparison methods
The difference between tumor and image borders is measured by the absorbed time of absorbing Markov chain, which is influenced by the appearance and spatial distance between transient nodes and absorbing nodes. Thus the present approach doesn’t work well if the lesion and border are very close both in appearance and distance, which also happened in [26, 28]. Figure 9 shows the failure case. In Fig. 9(a), the lesion is near the top border and their gray value is extremely close, which means the tumor can be absorbed quickly. As shown in Fig. 9(b) the final saliency map obtained by the proposed method highlights the wrong position. Figure 9(c) gives the comparison of GT (green) and the ROI (red) obtained by our method.

(a) BUS image, (b) Final saliency map; (c) GT (green) and ROI (red).
In this paper, an automatic tumor saliency detection model for breast ultrasound images has been proposed. In the present method, the BUS image is segmented into super pixels to construct a graph model, and then the saliency of the nodes in the graph is estimated by using the absorbed time of the AMC. The final saliency map which locate tumor accurately is obtained by EMC based updating and distance weighting. Considering the “white stripes” in the BUS image is an important factor to interfere the tumor detection, we proposed the adaptive selective replacement, through which the accuracy of tumor detection is extremely improved. The AMC based saliency detection model can handle with the BUS images with multiple tumors and overcome the poor quality of BUS images with blurred tumors, confusing fat layers, tumor-like structures and complicated background areas. In addition, the proposed method outperforms the comparison methods with high robustness and accuracy on 1000 BUS images. The saliency detection method that we described can be used as an effective tool to detect breast lesions in breast imaging, accelerating the process of reading BUS images, and improve the diagnostic accuracy of clinicians.
In the future studies, we will apply the new algorithm to the negative BUS images without tumors, and then focus on improving the algorithm to be able to deal with all BUS images without tumors, and with single and multiple tumors.
Footnotes
Acknowledgments
This paper is supported by the National Science Foundation of China Grant (No. 81301286), the Science and Technology Support Project of Sichuan Province (No. 2014GZ0005-7), the Achievement Conversion and Guidance Project of Chengdu Science and Technology Bureau (No.2017-CY02-00027-GX), and the Enterprise Commissioned Technology Development Project of Sichuan University (No.18H0832). Our images are supported by the Department of Ultrasound, China-Japan Friendship Hospital (Beijing 100029) and West China Hospital of Sichuan University (Chengdu 610000).
