Abstract
As the shapes of breast cell are diverse and there is adherent between cells, fast and accurate segmentation for breast cell remains a challenging task. In this paper, an automatic segmentation algorithm for breast cell image is proposed, which focuses on the segmentation of adherent cells. First of all, breast cell image enhancement is carried out by the staining regularization. Then, the cells and background are separated by Multi-scale Convolutional Neural Network (CNN) to obtain the initial segmentation results. Finally, the Curvature Scale Space (CSS) corner detection is used to segment adherent cells. Experimental results show that the proposed algorithm can achieve 93.01% accuracy, 93.93% sensitivity and 95.69% specificity. Compared with other segmentation algorithms of breast cell, the proposed algorithm can not only solve the difficulty of segmenting adherent cells, but also improve the segmentation accuracy of adherent cells.
Introduction
Nowadays, breast cancer (BC) is the most prevalent type of cancer in women [1]. In recent times, there has been lot of advancement for analyzing of BC progression and molecular markers related discovery, but the histopathological analysis still remains the most widely used for BC diagnosis [2]. The traditional method of manual pathological analysis has strong subjectivity. The diagnostic results depend on the doctor’s visual recognition and experience, which can easily lead to misdiagnosis, so the accuracy of diagnosis is very unstable.
Total flow chart of the algorithm.
Accurate segmentation for breast cell is the important step towards pathological analysis of microscopic images. The breast cell image has the characteristics of different shapes, staining technique variations and adherent cells, which directly brings difficulty to accurate segmentation. At present, the existing cell segmentation algorithms are not ideal for the segmentation of the cell image with indistinct contour and low color contrast [3, 4]. Especially how to extract cell region from complex background and segmentation of adherent cells, it has always been the focus and difficult point of the current.
In order to improve the accuracy of segmentation, the use of Computer-Aided Diagnosis (CAD) system to help analyze image of breast histological sections is the current development trend [5]. In recent years, many breast cell image segmentation algorithms mainly include threshold algorithm (OTSU) [6], Watershed segmentation algorithm (WATERSHED) [7] and Fuzzy C-means clustering algorithm (FCM) [8], etc. OTSU is simple to implement and the performance is stable, but does not consider the spatial characteristics of the image and is sensitive to noise. WATERSHED is the most widely used algorithm, which has a good response to cell weak edge, but is prone to over-segmentation problems. FCM uses the degree of clustering of each data point to segment the cell, but it is sensitive to the initialization and difficult to converge to the cluster center.
From these studies, it is found that the main reason for the low accuracy of existing algorithms is that they are sensitive to noise and are prone to over-segmentation, which results in inaccurate acquisition dividing lines of adherent cells. In recent years, Convolutional Neural Networks (CNN) [9] have been widely used in various fields of medical image analysis, such as cell counting [10], cell segmentation [11], and cell detection [12]. Compared with the traditional algorithms, CNN can automatically extract the deeper features of the image through multiple nonlinear transformations, and enhance the robustness of feature extraction by local connection and weight sharing [13].
In this study, combining the Multi-scale CNN with the CSS corner detection, this paper puts forward a full automatic segmentation algorithm for breast cell image. The total flow chart of the algorithm is shown in Fig. 1. The staining regularization is used to enhance the cell image before cell segmentation. The following two modules are used for cell segmentation of the enhanced cell image. The first module adopts Multi-scale CNN to extract cell region from the complex background of the image to complete the initial segmentation of cells. The second module adopts CSS corner detection to segment the adherent cells after initial segmentation. The rest of this paper is organized as follows. An overview of cell image staining regularization is given in Section 2. The initial segmentation based on Multi-scale CNN and the segmentation refinement based on CSS corner detection are described in detail in Sections 3 and 4, respectively. Section 5 shows the experimental results. The paper ends with our conclusions in Section 6.
Cell image staining regularization.
Multi-Scale CNN model.
When making histological cell sections, the slice image produced by different methods or different conditions have greater heterogeneity, which makes it difficult to make quantitative analysis on slice image. Therefore, this paper uses a staining regularization algorithm proposed by Ruifrok, which is used for enhancement of Hematoxylin and Eosin (H&E) stained breast cell image [14]. By staining regularization, the slice image can be reconstructed separately according to the stained color, thereby promoting the quantitative analysis of the slice image.
To perform the separation of the stains, we have to do an orthogonal transformation of the RGB information, to get independent information about each stain’s contribution. The transformation has to be normalized to achieve correct balancing of the absorption factor for each stain. For normalization, we divide each Optical Density (OD) vector by its total length, resulting in a normalized OD matrix. The literature [14] defines a normalized OD matrix stained with hematoxylin, eosin and DAB. In this paper, H&E stained breast cell image is processed, and the zero components are regularized. The normalized OD matrix
Where, the first row represents hematoxylin stain, the second row represents eosin stain, and the third row is empty. Every column represents the optical density as detected by the red, green and blue channel for each stain.
If
The single stained image of H&E were separated by the above method. The separated single stained image are added to obtain a regularized image
Aiming at the diversity of breast cell shapes, this section uses convolution kernels of different scales to extract image features of different scales, so that the filter extracts and learns more abundant image information. In this work, we make full use of the CNN to learn discriminative features, and build three CNN in parallel to extract multi-scale features from cell image with different sizes. The detailed structure of the Multi-scale CNN model is shown in Fig. 3. Our model is mainly divided into three parts: multi-scale decomposition of cell image, multi-scale feature extraction, pixel classification labeling.
Multi-scale decomposition of cell image
In order to extract a set of more representative features in the image, the Laplacian pyramid is used to decompose the cell image into three scales of 256
The staining regularized cell image
Where
In this paper,
Then the Laplace Pyramid decomposition of cell image can be expressed as:
Therefore, the original cell image with size 256
The position of the cell remains unchanged on cell image of different scales. We use the convolutional neural networks of the same structure to extract features from cell image of different scales. The weight sharing is performed by the convolutional neural network of different scales, alleviating the over-fitting problem.
Each convolutional neural network
for all
Finally, the outputs of the 3 networks are up-sampled and concatenated so as to produce
where
Ideally a linear classifier should produce the correct categorization for all pixel locations (
Where
The resulting labeling
Judgment of adherent cells
After the cell region is initially positioned, it is necessary to judge the single cell and the adherent cells so that only the adherent cells are segmented. The specific steps for distinguishing single cell from adherent cells is as follows:
Calculate the circularity of each region Preserve the area of circularity The average area of Calculate the area of each region in
After the analysis of sample images and experimental results, the values of parameters
After the single cell was separated from the adherent cells, the adherent cells were segmented by CSS corner detection. The CSS corner detection algorithm is more suitable for corner detection of multi-scale complex images. Its main principle is to determine the position of the corners by the change of curvature, and the corner positioning is very accurate. Hence, this section uses CSS corner detection to segment adherent cells, which can effectively overcome the influence of complex background to obtain accurate cell dividing line. The following steps are used by the CSS corner detection algorithm to detect corners of an image.
The definition of curvature
The curve
Where
The curvature
Where
The following steps are used by the CSS algorithm to detect corners of cell image:
Apply the Canny edge detector to the gray level image and obtain a binary edge-map. Extract the edge contours from the edge-map, fill the gaps in the contours and find the T-junctions. Compute curvature at a fixed low scale for each contour to retain all true comers. All of the curvature local maxima are considered as corner candidates, including the false corners. By classifying the false corners into rounded and boundary noise [16], remove the false corners to get the correct corners. Track the corners from the highest scale to the lowest scale to improve localization accuracy. Compare the T-junction with other corners and remove one of the two corners which are very close.
Workflow of this algorithm.
Among the corner candidates, although some corners are detected numerically as the local absolute maximum, its neighbors in the region of support are often very small. We can remove them by using an adaptive local curvature threshold. In principle, we set the threshold for a candidate according to its neighborhood region’s curvature, and the local curvature maximum smaller than its local threshold is eliminated. This adaptive threshold is given by:
Where
After the corners are detected, the dividing line can be obtained by connecting the corners. The selection of the best dividing line is as follows:
The circularity of the cells before segmentation and the circularity and area of the two regions after segmentation were calculated, respectively. If the circularity and area of both regions are greater than the threshold and the average circularity is greater than the circularity of the cells before segmentation, the corresponding dividing line is used as a candidate dividing line. Select the shortest dividing line as the best dividing line.
In general, the use of CSS corner detection in cell image segmentation is a very good method, which can effectively detect corners and obtain more accurate cell dividing line.
Data set extension
Data set contains 58 H&E histopathology images of breast tissue from the Yale Tissue Microarray Facility. There are 26 malignant and 32 benign images respectively. Each image is represented in analog image as 8 bit with 896
Segmentation result comparison of different algorithms.
During the experiment, this paper randomly selects 30000 images of the whole data set as the training samples, and the remaining 14544 images as test samples to evaluate the performance of the algorithm. The Multi-scale CNN model frame implementation is based on the open source software library TensorFlow 1.6 and is trained in end-to-end training on the NVIDIA GeForce GTX 1080Ti GPU. In the Multi-scale CNN structure, the nonlinear activation function of Rectifying Linear Units (ReLUs) and Max-pooling are used in the convolution layer and the pooling layer. The entire structure of the algorithm is shown in Fig. 4. First, the stain regularized breast cell image is input into a multi-scale CNN model for parameter training and model testing. However, it was found through the output results that although cell regions were accurately located, the adherent cells were still exist. The CSS corner detection is used to detect the cell corner to obtain the optimal dividing line, thereby realizing the segmentation of the adherent cells. In the whole segmentation algorithm, the proposed algorithm adaptively selects parameters and has good universality.
Cell segmentation results
In order to evaluate the effect of algorithm segmentation, sensitivity and specificity are selected as the evaluation indexes of various segmentation algorithms. The calculation formula is as follows:
Here,
In addition, in order to evaluate the effect of the algorithm, this paper selects some algorithms that are well performed in cell segmentation and compares the results of these algorithm. The segmentation result of each algorithm is shown in Fig. 5. Among them, the first line is a benign breast cell image, and the second line is a malignant breast cell image. The marker in the figure is the object to be compared.
As can be seen from the figure, for images with uniform cell size and simple background, OTSU and FCM have better segmentation effects, but cannot be effectively segmented for adherent cells. However, The proposed algorithm can effectively segment the adherent cells and is basically without error segmentation. As indicated by the red circle mark in the Fig. 5. For images with uneven gray scale and complex background, WATERSHED has more over-segmentation. However, the algorithm still has more accurate segmentation results. As indicated by the red rectangle mark in the figure. CNN uses the same data set as this paper, but the algorithm is not ideal for segmentation of adherent cells. As indicated by the green circle mark in the figure. The superiority of the proposed algorithm in this paper is that convolution kernels of different scales can extract features of different scales, making its location on the cell area more precise. Combined with the CSS corner detection algorithm for segmentation of adherent cells, it can be seen that the overall segmentation effect of the algorithm is better than other algorithms. The results of each algorithm are shown in Table 1.
Segmentation results comparison of different methods
Table 1 shows the quantitative statistical analysis of the segmentation results of different algorithms. It can be seen that in the segmentation result of FCM,
Comparison of segmentation accuracy
It can be seen from Table 2 that this algorithm outperforms other algorithms in the three evaluations of Accuracy, Sensitivity and Specificity. The mean accuracy of segmentation is 93.01%. The sensitivity indicates the recall rate of cells. The proposed algorithm has a small amount of over-segmentation and under-segmentation, so the sensitivity is 93.91%. The specificity indicates the recall rate of the background. In this paper, the cell over-segmentation is slightly more than the under-segmentation, and there is a small amount of mis-segmentation, so the specificity is slightly higher than the sensitivity, which is 95.69%. However, the results show that compared with other algorithms, the proposed algorithm has obvious advantages in identifying adherent cells and inhibiting single cell over-segmentation.
In this paper, we propose an automatic segmentation algorithm for the breast cell image. The Multi-scale CNN and CSS corner detection methods are combined in the algorithm to make full use of image information, which can effectively overcome the influence of complex background and has high segmentation accuracy. The features of different scale images can be extracted by multi-scale CNN to improve the accuracy of cell localization and complete the initial segmentation of breast cell. The dividing line of adherent cells can be obtained more accurately by the CSS corner detection to improve the segmentation accuracy of adherent cells. The experimental results have demonstrated the proposed algorithm can effectively solve the problem of adherent cells segmentation, and the segmentation accuracy is 93.01%, which is significantly higher than other segmentation algorithms.
Footnotes
Acknowledgments
This work is supported by Shaanxi Provincial International Science and Technology Cooperation Program (No.2018kw-026), Xi’an Science and Technology Plan Program (No.201805040 YD18CG24).
