An improved YOLOv3 model for detecting location information of ovarian cancer from CT images

Abstract

Ovarian cancer is a malignant tumor that poses a serious threat to women’s lives. Computer-aided diagnosis (CAD) systems can classify the type of ovarian tumors, but few of them can provide exactly the location information of ovarian cancer cells. Recently, deep learning technology becomes hot for automatic detection of cancer cells, particularly for detecting their locations. In this work, we propose a novel end-to-end network YOLO-OC (Ovarian cancer) model, which can extract the characteristics of ovarian cancer more efficiently. In our method, deformable convolution is used to enhance the model’s ability to learn geometric deformation in space. Squeeze-and-Excitation (SE) module is proposed to automatically learn the importance of different channel features. Data experiments are conducted on datasets collected from The Affiliated Hospital of Qingdao University Medical College, China. Experimental results show that our YOLO-OC model achieves 91.83%, 85.66% and 73.82% on mean average precision mAP@.5, mAP@.75 and mAP@[.5,.95], respectively, which performs better than Faster R-CNN, SSD and RetinaNet on both accuracy and efficiency.

Keywords

Ovarian cancer deep learning CT image object detection YOLOv3

1. Introduction

Ovarian cancer is the second most common cause of gynecologic cancer death in women around the world [1]. The ovary is located deeply in the pelvic cavity, and it is difficult to detect the disease in the early stage and the disease develops rapidly. Serous cystadenoma carcinoma is the most common ovarian cancer, accounting for 70% of epithelial ovarian cancer, and its 5-year survival rate is as low as 40% [2]. In addition, common ovarian cancers include endometrioid carcinoma, clear cell carcinoma, and mucinous cystadenoma carcinoma. Diagnosis is the prerequisite for treatment, and timely diagnosis of ovarian cancer type is extremely significant for improving ovarian cancer patients’ survival rate.

Recently, convolutional neural networks (CNNs) have developed rapidly in the fields of image classification, object detection, and semantic segmentation [3, 4, 5]. The task of image classification is to divide the image into a category, which corresponds to the most prominent object on the image with smaller object to be ignored. Since almost all practical images contain multiple objects, it is crude to use such models to assign a single label to an image [6, 7]. In CT images, there are organs and lesions of a certain part of the body. Detecting the positions and types of lesions on CT images exactly is important for disease diagnosis.

So far, many researchers have applied deep learning to medical images [8, 9, 10, 11]. Similarly, there are many researches on the combination of deep learning and ovarian cancer images. Wang et al. explored an SVM-based automatic recognition algorithm for ovarian cancer diagnosis using photoacoustic imaging [12]. It is proposed deep convolutional neural network (DCNN) based on AlexNet to automatically classify different types of ovarian cancer from cytological images achieving accuracy 78.20% [13]. In [14], four mainstream classification models: VGG-16, ResNet-50, DenseNet and GoogleNet (Inception V3) are developed to classify cancer cell subtypes on ovarian ultrasound images, where GoogleNet (Inception V3) ranks first with 92.50% accuracy. After that, an early diagnosis system for ovarian cancer detection with improved deep learning network and cost-sensitive learning is obtained in [15].

Until now, most CAD systems are used for ovarian cancer diagnosis classification, and cannot provide the specific location of the tumor. Furthermore, most of the existing models are trained on ultrasound images, so it is urgent to develop a CAD system to detect ovarian cancer on CT images. It is shown in Fig. 1 the boundary between tumor and non-tumor, where the cancer regions is not clear and the shape of the tumor is irregular. This undoubtedly puts forward higher requirements on the ability of the detection network to extract features.

Figure 1.

(a) and (b) are two different types of tumors, in which the red borders are marked by professional radiologist. The boundary between the tumor area and the non-tumor area is not obvious in (a), and the shape of the tumor is irregular in (b).

In the work, we propose a novel YOLO-OC model by improving on the basis of YOLOv3, which can detect location information of ovarian cancer cells on CT images quickly with acceptable accuracy. Specifically, in YOLO-OC, we introduced deformable convolution from [16, 17] to enhance the model’s ability to learn geometric deformation in space. Squeeze-and-Excitation (SE) module [18] is proposed to automatically learn the importance of different channel features. We add SE blocks to the Darknet53 to suppress inaccurate semantic information in low-level features. It is used kmeans++ to generate priori anchor boxes to obtained a higher value to average IOU (Intersection over Union).

Data experiments are conducted on datasets collected from The Affiliated Hospital of Qingdao University Medical College, China. Experimental results show that our YOLO-OC model achieves 91.83%, 85.66% and 73.82% on mean average precision mAP@.5, mAP@.75 and mAP@[.5,.95], respectively, which performs better than Faster R-CNN, SSD and RetinaNet on both accuracy and efficiency.

2. Related work

There are two types of current mainstream object detection algorithms: two-stage method and one-stage method. In two-stage method, it firstly extracts candidate regions from the input image, and then classifies each candidate region. The common two-stage networks are R-CNN series models, including R-CNN, Fast R-CNN, Faster R-CNN, Cascade R-CNN and Dynamic R-CNN. One-stage method is an end-to-end detection network, which doesn’t need to find candidate regions separately, like SSD (Single Shot MultiBox Detector) and YOLO series models [19]. Most of the above-mentioned networks are based on anchor boxes. In recent years, anchor-free method without anchor boxes is also a research highlight of object detection. Its representative networks include CornerNet and CenterNet.

2.1 Two-stage detectors

One-stage detection network runs rapidly, while two-stage detection network is with higher accuracy. For example, R-CNN is adopted for selective search to obtain about 2000 regions of interest (ROI) [20]. These regions are converted into fixed-size images and sent to convolutional neural networks, respectively. Fast R-CNN [21] applied CNN to extract the features of the entire image, and created candidate regions on the feature map, which improved the detection speed. In [22], Faster R-CNN adopted the Region Proposal Network (RPN) to generate ROI, which greatly improved the detection speed. Cascade R-CNN used cascade regression as a resampling mechanism to increase the IOU value of proposal by stage, so that the proposals resampled by the previous stage can adapt to the next stage with higher threshold [23]. Dynamic R-CNN not only adjusted the IOU threshold but also Smooth L1 loss parameters according to the sample distribution during training [24].

2.2 One-stage detectors

YOLOv3 divided the feature map into ${S}\times{S}$ grid cells, each grid cell predicted 3 prior bounding boxes. In addition, Feature Pyramid Network (FPN) is designed for multi-scale detection [25, 26, 27]. RetinaNet introduced focal loss to solve the problem of serious imbalance between positive and negative samples in one stage object detection [28]. CornerNet detected objects by detecting the top-left corner and bottom-right corner of the bounding box, eliminating the anchor mechanism [29]. Similar anchor-free method was proposed in CenterNet, which uses key point estimation to find the center point and returns to other object attributes such as size, 3D position, orientation and pose [30].

3. Materials and methods

3.1 Materials

Dataset of CT images used in data experiments are collected from The Affiliated Hospital of Qingdao University, China. The study are approved by the ethics committee. All patients are anonymous and personal information has been filtered. After filtering out invalid data, we collected pelvic CT images with manual annotations and corresponding pathological reports for 223 patients. We use a graphical image annotation tool, named LabelImg, to label the CT images according to the manual marking of professional doctors and submitted them for verification. After verification by medical experts, it is obtained more than 5100 CT images of ovarian cancer with lesions. There are five types of image labels, namely endometrioid carcinoma, clear cell carcinoma, mucinous cystadenoma carcinoma, serous cystadenoma carcinoma and others. They are shown in Fig. 2.

Table 1
Number of data used for training and testing

Class	Training	Testing
Endometrioid	347	62
Clear cell	375	66
Mucinous	390	68
Serous	2901	513
Others	367	64

Figure 2.

The five types of CT images of ovarian cancer: (a) Endometrioid carcinoma; (b) clear cell carcinoma; (c) mucinous cystadenoma carcinoma; (d) serous cystadenoma carcinoma; (e) others.

Figure 3.

The amount of CT images after data augmentation.

As illustrated in Table 1 and Fig. 3, since the amount of data of serous cystadenoma cancer was much larger than the other four types, we employed the data augmentation method of horizontal rotation and zoom deformation to expand the number of other four types of images (the images obtained using data augmentation are only used for training).

3.2 Methods

In this section, we introduce the topological architecture of the model, and then the detection process of YOLO-OC with specific improvements including Kmeans++, Deformable Convolution, Attention Mechanism and the SSP Module.

3.2.1 The YOLO-OC network architecture

In this subsection, we propose a novel YOLO-OC model to detect ovarian cancer by characteristics of fuzzy boundary and unobvious features of ovarian tumors. The network architecture of YOLO-OC is shown in Fig. 4.

Figure 4.

YOLO-OC network architecture.

YOLO-OC adopts the improved Darknet53 as the backbone network to extract features. The improvements made by YOLO-OC are as follows.

(1)

Introducing deformable convolution in the residual block of the backbone network to increase the flexibility of the convolution operation.

(2)

Adding attention channels after each short-cut layer to distinguish the importance of feature maps.

(3)

Introducing SPP modules in front of the three detection ports to promote feature fusion.

In object detection, the low-level feature semantic information is relatively small, but the object location information is accurate, while the high-level feature is just the opposite. YOLO-OC applies Feature Pyramid Networks (FPN) to fuse feature maps of three scales (13 $\times$ 13, 26 $\times$ 26, and 52 $\times$ 52), and independently detect the fusion feature maps at multiple scales.

It is shown in Fig. 5 that YOLO-OC first extracts features from the input image through the improved Darknet53 to obtain a feature map of a certain size, such as 13 $\times$ 13. After that, it divides the input image into 13 $\times$ 13 grid cells. If the central coordinate of the bounding box corresponding to a certain ground truth in the training set is exactly in a grid cell of the input image, the grid cell is responsible for predicting the bounding box of the object. Each grid cell is given 3 prior bounding boxes of different sizes. The dimensions of the output feature map predicted by YOLO-OC are [S, S, B $\times$ (C $+$ Conf $+$ Cls)]. By S $\times$ S, we denote the number of grid cells of the image. Symbol B is the number of bounding boxes predicted by each grid cell; C is the coordinate of the bounding box; Conf is the objectness score; Cls is the class of the dataset.

3.2.2 The Kmeans++ algorithm

In YOLO series models, prior anchor box is used to detect the object, which has a significant influence on the positioning accuracy. The default 9 prior anchor boxes of three scales in YOLOv3 are the result of clustering the COCO dataset using the Kmeans algorithm [31]. The defect of Kmeans algorithm is that it randomly initializes the cluster centers. In YOLO-OC, we adopt Kmeans++ algorithm to initialize the clustering center to improve the average Intersection over Union (IOU) of the prior anchor box and ground truth. The process of Kmeans++ algorithm is shown in Table 2.

Table 2
The process of Kmeans++ algorithm

Kmeans++ algorithm

Step1: Randomly select a sample from the dataset as the initial clustering center c1;

Step2: First calculate the shortest distance

D(x)

between each sample and the existing cluster center; then calculate the

probability of each sample being selected as the next cluster center

\frac{D(x)^{2}}{\sum_{x\in X}{D(x)^{2}}}

; finally, use the roulette method to select

the next cluster center.

Step3: Repeat the second step until K cluster centers are selected.

Step4: Iteratively update cluster centers using Kmeans algorithm.

Figure 5.

The detection principle of YOLO-OC model.

3.2.3 Deformable convolution

Some ovarian tumors invade other tissues, resulting in irregular shapes. Since the geometric structure of the modules used to construct the convolutional neural network (CNN) is fixed, its geometric transformation modeling capability is essentially limited. This reduces the recognition rate. In Deformable ConvNets v1 (DCNv1), an offset variable is added to each sampling point of the convolution kernel, and the offset variable is learned by the network. As shown in Fig. 6, the convolution kernel with added offset variables is no longer limited to regular sampling. It is shown in Fig. 7 the framework of deformable convolution.

Figure 6.

The sampling method of normal convolution and deformable convolution with a convolution kernel size of 3 $\times$ 3.

Figure 7.

Illustration of deformable convolution.

The above convolution is used to output the offset, the length and width of the output are the same as the input feature map, and the dimension is twice the input feature map (the offsets in the x direction and y direction are stored separately).

We define the 3 $\times$ 3 convolution kernel $K=\{(-1,-1),(-1,0),\ldots,(0,1),(1,1)\}$ . The forward operation of traditional convolution is

$\displaystyle y(p_{0})=\sum_{p_{n}\in K}w(p_{n})\cdot x(p_{0}+p_{n}),$ (1)

where $w$ is the weight of the convolution kernel, $x$ is the input feature map, $p_{0}$ is the point of the output feature map $y$ , and $p_{n}$ is all the sampling points in $K$ . Deformable convolution displaces every point of $K$ by learning an offset $\{\Delta p_{n}|n=1,2,\ldots,N\}$ , whose mathematical expression is as follows:

$\displaystyle y(p_{0})=\sum_{p_{n}\in K}w(p_{n})\cdot x(p_{0}+p_{n}+\Delta p_{% n}),$ (2) $\displaystyle x(p)=\sum_{q}G(q,p)\cdot x(q),$ (3) $\displaystyle G(q,p)=g(q_{x},p_{x})\cdot g(q_{y},p_{y}),$ (4) $\displaystyle g(a,b)=\max(0,1-|a-b|),$ (5)

Since $\Delta p$ is usually a decimal, the position of the feature point of the deviation layer is discontinuous. We adopt the bilinear interpolation of Eqs (3)–(5) to calculate the value of the offset position, where $p$ represents a discrete sampling point and $q$ is all points on the feature map $x$ .

In YOLO-OC, we measure the improvement of network performance by DCNv1 and DCNv2. Experimental results show that DCNv2 brought greater precision improvement. The biggest improvement achieved by DCNv2 over DCNv1 is the addition of weights for each sampling point. It gives the convolution kernel a larger change space. For some unwanted sampling points, the weight can be learned as 0.

$\displaystyle y(p_{0})=\sum_{p_{n}\in K}w(p_{n})\cdot x(p_{0}+p_{n}+\Delta p_{% n})\cdot\Delta m_{n},$ (6)

where $\Delta m_{n}$ is the weight of each sampling point obtained through learning.

3.2.4 Attention mechanism and spatial pyramid pooling

Squeeze-and-Excitation (SE block) is a channel attention module, which can automatically obtain the importance of each feature channel through learning. We add SE blocks between each residual block of Darknet53 to increase the weight of important feature maps. As shown in Fig. 8, the SE block is composed of Squeeze, Excitation, and Reweight. Squeeze corresponds to global average pooling, which converts the input of H $\times$ W $\times$ C (H: height, W: width, C: channel) into the output of 1 $\times$ 1 $\times$ C.

Figure 8.

The process of SE block.

Excitation is similar to the gate mechanism in Recurrent Neural Network (RNN), which generates weights for each feature channel through parameters. Excitation contains two fully connected layers (respectively used for reducing dimensions and ascending dimensions), which is a RELU (Rectified Linear Unit) activation layer and a Sigmoid gate. This design can reduce the amount of calculations and parameters as well as increase the nonlinearity of the module. We regard the weight of the Sigmoid gate output as the importance of each feature channel after feature selection. The weight is multiplied channel by channel to the previous feature to complete the Reweight operation.

YOLO-OC draws lessons from the idea of spatial pyramid pooling to realize the fusion of local features and global features through SPP module. As shown in Fig. 9, SPP module is composed of four parallel branches, which are the maximum pooling of kernel size 1 $\times$ 1, 5 $\times$ 5, 9 $\times$ 9, 13 $\times$ 13. YOLO-OC add SPP modules in front of the three scale detection ports to extract effective features.

Figure 9.

Internal details of the spatial pyramid pooling module.

4. Results

4.1 Parameters and evaluation metrics

Data experiments are run on Ubuntu 16.04 OS with 6 $\times$ Intel Xeon (R) CPUs, using a NVIDIA GeForce RTX 2080 Ti GPU for training. The test experiment was run on a laptop with Intel (R) Cor e(TM) i7-8750H CPU and NVIDIA GeForce GTX 1050 Ti GPU. The network architecture is built by Pytorch1.5 launched by Facebook AI Research (FAIR). The input images are uniformly scaled to 416 $\times$ 416 pixels. And the initial learning rate is set to be 0.01, which decayed with the number of trainings. The SGD momentum is set to be 0.937, weight decay is with value 0.0005, and batch size is set to be 8. Due to the limitation of GPU memory, the maximum batchsize of YOLO-OC is set to 8. In order to unify the experimental parameters, all experiments adopt this value. And the number of training for all experiments is 150 epochs.

The accuracy evaluation indicators of all models are in mAP@.5, mAP@.75 and mAP@[.5,.95]. mAP@.5 and mAP@.75 corresponded to the average detection precision of the IOU threshold of 0.5 and 0.75, respectively. We denoted mAP@[.5,.95] by the average mAP at different IOU thresholds (0.5 to 0.95, step size: 0.05). IOU is defined as the intersection of the detection result and ground truth divided by their union. TP, FP, FN and TN represent true positive, false positive, false negative and true negative respectively. Precision is the ratio of true positives in the identified samples. Recall is the proportion of all positive samples that are correctly identified as positive samples. Changing the recognition threshold will cause the precision and recall values to change, resulting in a P-R curve. AP is the area under the P-R curve. Generally speaking, the better the classifier, the higher the AP value. mAP is the average of AP for multiple categories.

$\displaystyle\textit{IOU}=\frac{\textit{Area of Overlap}}{\textit{Area of % Union}}$ (7) $\displaystyle\textit{precision}=\frac{TP}{TP+FP}$ (8) $\displaystyle\textit{recall}=\frac{TP}{TP+FN}$ (9) $\displaystyle AP=\int_{0}^{1}{p(r)dr}$ (10)

4.2 Ablation experiment

4.2.1 Kmeans++

Choosing the appropriate prior anchor box can speed up the regression of the bounding box during the training process. We employed Kmeans++ to get anchor boxes with a higher average IOU. The comparison experiment was repeated 10 times to obtain the highest results. It is achieved avg-IOU by 82.33%, which is better than the highest one of the original method with avg-IOU 81.87%. The three scale anchor boxes in the experiment are set as {[71,73], [98,109], [118,179]}, {[136,116], [144,151], [173,227]} and {[175,172], [193,132], [249,246]}.

4.2.2 Deformable convolution

Compared with the conventional convolution, deformable convolution has an additional step of calculating the offset, which causes the back propagation speed to slow down. To balance the efficiency and effectiveness, we use 3 $\times$ 3 deformable convolutions in the last 4 residual units of Darknet53. Both versions of deformable convolution can improve the mAP value of the detection network. It is shown in Table 3 that the effect of DCNv2 is better. After the backbone joined DCNv2, the network performance improved significantly, among which mAP@[.5,.95] increased by 1.21%.

Table 3
Performance comparison between conventional convolution and deformable convolution. ‘G.A. Box’ means ‘Generate Anchor Box’

G.A. Box	Backbone	Params	mAP@.5	mAP@.75	mAP@[.5,.95]
Kmeans++	Darknet53	61.54M	87.66%	82.33%	70.53%
Kmeans++	Darknet53+DCNv1	62.39M	88.69%	83.17%	71.26%
Kmeans++	Darknet53+DCNv2	62.55M	89.35%	83.73%	71.74%

4.2.3 The SE block and spatial pyramid pooling

It is shown in Table 4 the improvement in detection performance after adding the SE block and SPP modules. The SE block is added after each shortcut layer of the residual block. SPP1 means to add SPP module only at the first detection port, while SPP3 means to add SPP module at all three detection ports. Attention mechanism and spatial pyramid pooling are two different ways to enhance feature extraction. After joining the SE block, the mAP of the network is increased by 0.75%, and the addition of three SPP modules simultaneously is improved by 1.13%.

Table 4
Performance improvements brought by SE block and SPP modules

Network	Params	mAP@.5	mAP@.75	mAP@[.5,.95]
YOLOv3	61.54M	87.03%	81.81%	70.23%
YOLOv3+SE	62.40M	88.05%	82.66%	70.98%
YOLOv3+SPP1	62.59M	88.04%	82.50%	70.74%
YOLOv3+SPP3	62.92M	88.93%	83.23%	71.36%

4.3 Comprehensive comparison

In Table 5, it demonstrates the impact of using different network improvement strategies on network parameters and detection results. By combining these four strategies, we can improve the mAP value of the detection network. There will be no performance degradation due to module conflicts. The last row of Table 5 is the structure of YOLO-OC. Compared with YOLOv3, which increases the value of mAP by 3.59% under the condition of only increasing the parameter amount by 3.27M. Table 6 illustrates the performance of YOLO-OC and several common object detection networks (Faster R-CNN, SSD and RetinaNet) on our test set.

Table 5
Impact of improvements on YOLO-OC

Backbone	Kmeans++	DCNv2	SE	SPP3	Params	mAP@[.5,.95]
Darknet53	$\surd$				61.54M	70.53%
Darknet53	$\surd$	$\surd$			62.55M	71.74%
Darknet53	$\surd$	$\surd$	$\surd$		63.42M	72.60%
Darknet53	$\surd$	$\surd$	$\surd$	$\surd$	64.81M	73.82%

Table 6

Performance comparison between YOLO-OC and other models

Model	mAP@.5	mAP@.75	mAP@[.5,.95]	Infer time	FPS
Faster R-CNN	89.64%	82.84%	71.56%	1264ms	0.8
SSD	88.30%	79.32%	68.79%	192.3ms	5.2
RetinaNet	90.81%	83.40%	71.97%	231.9ms	4.3
YOLO-OC	91.83%	85.66%	73.82%	91.7ms	10.9

Experimental results show that compared with other networks, the proposed YOLO-OC network achieves the best detection effect on ovarian cancer. The detection results of YOLO-OC are shown in Fig. 10. YOLO-OC can accurately locate and classify different types of ovarian tumors, which indicates that the model proposed in this paper can assist radiologists in making accurate diagnosis.

Figure 10.

Examples of YOLO-OC detection effects: (a) Endometrioid carcinoma; (b) mucinous cystadenoma carcinoma; (c) serous cystadenoma carcinoma; (d) clear cell carcinoma.

5. Conclusions

This study is dedicated to applying deep learning to the accurate detection of ovarian cancer. For this reason, we proposed YOLO-OC on the basis of YOLOv3 to enhance its ability to extract features of ovarian malignant tumors. The ablation experiment verified the effectiveness of the above modules, and comparison experiments with other object detection networks indicated that YOLO-OC had the best detection effect, whose mAP value reached 73.82%.

In future work, it is worthy to investigate the manner to reduce the amount of network parameters and improve the detection speed without losing the detection accuracy. As well, the detection of ovarian cancer will learn from some special methods [32, 33].

Footnotes

Acknowledgments

This work was supported by National Natural Science Foundation of China (Grant Nos. 61873280, 61672033, 61672248, 61972416, 61772376), Taishan Scholarship (tsqn201812029), Major projects of the National Natural Science Foundation of China (Grant No. 41890851), Natural Science Foundation of Shandong Province (No. ZR2019MF012), Fundamental Research Funds for the Central Universities (18CX02152A, 19CX05003A-6).

References

Lheureux

Braunstein

and Oza

A.M.

, Epithelial ovarian cancer: Evolution of management in the era of precision medicine, Ca A Cancer Journal for Clinicians 69(10) (2019).

Mccloskey

C.W.

Goldberg

R.L.

Carter

L.E.

et al., A New Spontaneously Transformed Syngeneic Model of High-Grade Serous Ovarian Cancer with a Tumor-Initiating Cell Population, Frontiers in Oncology, 2014.

Sultana

Sufian

and Dutta

, Advancements in Image Classification using Convolutional Neural Network, in: International Conference on Research in Computational Intelligence and Communication Networks, 2018, pp. 122–129.

Long

Shelhamer

and Darrell

, Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4) (2015), 640–651.

Zhou

Wei

W.L.

et al., Application of deep learning in object detection, in: IEEE/ACIS International Conference on Computer & Information Science, 2017, pp. 631–634.

Pang

S.C.

Ding

Qiao

S.B.

et al., A novel YOLOv3-arch model for identifying cholelithiasis and classifying gallstones on CT images, PloS One 14(6) (2019).

Qiao

S.B.

Pang

S.C.

Luo

et al., Automatic Detection of Cardiac Chambers Using an Attention-based YOLOv4 Framework from Four-chamber View of Fetal Echocardiography, 2020.

Pang

S.C.

Meng

Wang

et al., VGG16-T: A Novel Deep Convolutional Neural Network with Boosting to Identify Pathological Type of Lung Cancer in Early Stage by CT Images, International Journal of Computational Intelligence Systems, 2020.

Wang

S.D.

Dong

L.Y.

Wang

et al., Classification of Pathological Types of Lung Cancer from CT Images by Deep Residual Neural Networks with Transfer Learning Strategy, Open Medicine, 2020.

10.

Song

Meng

and Rodríguez-Patón

, U-Next: A Novel Convolution Neural Network With an Aggregation U-Net Architecture for Gallstone Segmentation in CT Images, IEEE Access, 2019.

11.

Pang

S.C.

Zhang

Ding

et al., A deep model for lung cancer type identification by densely connected convolutional networks and adaptive boosting, IEEE Access, 2019, PP(99). 1-1.

12.

Wang

Lei

et al., Tuning to optimize SVM approach for assisting ovarian cancer diagnosis with photoacoustic imaging, Bio-medical materials and engineering, 2015.

13.

Yan

C.B.

Liu

et al., Automatic Classification of Ovarian Cancer Types from Cytological Images Using Deep Convolutional Neural Networks, Bioscience Reports, 2018.

14.

C.Z.

Wang

Y.M.

and Wang

, Deep Learning for Ovarian Tumor Classification with Ultrasound Images, Springer, Cham, 2018.

15.

Zhang

Huang

and Liu

, Improved deep learning network based in combination with cost-sensitive learning for early detection of ovarian cancer in color ultrasound detecting system, Journal of Medical Systems 43(8) (2019), 251–.

16.

Dai

Xiong

et al., Deformable Convolutional Networks, in: IEEE International Conference on Computer Vision, 2017, pp. 764–773.

17.

Zhu

Lin

et al., Deformable ConvNets v2: More Deformable, Better Results, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9308–9316.

18.

Jie

Gang

et al., Squeeze-and-Excitation Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 7132–7141.

19.

Liu

Anguelov

Erhan

et al., SSD: Single Shot MultiBox Detector, in: European Conference on Computer Vision, 2016.

20.

Girshick

Donahue

and Darrell

, Rich feature hierarchies for accurate object detection and semantic segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.

21.

Girshick

, Fast R-CNN, in: IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.

22.

Ren

S.Q.

K.M.

Girshick

et al., Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis & Machine Intelligence 39(6) (2017), 1137–1149.

23.

Cai

and Vasconcelos

, Cascade r-cnn: Delving into high quality object detection, in: IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6154–6162.

24.

Zhang

Chang

et al., Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training, in: European Conference on Computer Vision, 2020, pp. 260–275.

25.

Redmon

Divvala

Girshick

et al., You Only Look Once: Unified, Real-Time Object Detection, in: IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.

26.

Redmon

and Farhadi

, YOLO9000: Better, Faster, Stronger, in: IEEE Conference on Computer Vision & Pattern Recognition, 2017, pp. 6517–6525.

27.

Redmon

and Farhadi

, YOLOv3: An Incremental Improvement, in: IEEE Conference on Computer Vision & Pattern Recognition, 2018.

28.

Lin

T.Y.

Goyal

Girshick

et al., Focal Loss for Dense Object Detection, IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 2999–3007.

29.

Law

and Deng

, Cornernet: Detecting objects as paired keypoints, in: the European Conference on Computer Vision, 2018, pp. 734–750.

30.

Zhou

Wang

and Krähenbühl

, Objects as points, 2019.

31.

Lin

T.Y.

Maire

Belongie

et al., Microsoft COCO: Common Objects in Context, in: European Conference on Computer Vision, 2014, pp. 740–755.

32.

Song

Rodríguez-Patón

Zheng

et al., Spiking Neural P Systems With Colored Spikes, IEEE Transactions on Cognitive and Developmental Systems 10(4) (2018), 1106–1115.

33.

Song

Pan

et al., Spiking Neural P Systems with Learning Functions, IEEE Transactions on NanoBioence, 2019, PP(2): 1-1.

An improved YOLOv3 model for detecting location information of ovarian cancer from CT images

Abstract

Keywords

1. Introduction

2.1 Two-stage detectors

2.2 One-stage detectors

3. Materials and methods

3.1 Materials

Table 1 Number of data used for training and testing

3.2.1 The YOLO-OC network architecture

Table 2 The process of Kmeans++ algorithm

4.1 Parameters and evaluation metrics

4.2.1 Kmeans++

4.2.2 Deformable convolution

Table 3 Performance comparison between conventional convolution and deformable convolution. ‘G.A. Box’ means ‘Generate Anchor Box’

Table 4 Performance improvements brought by SE block and SPP modules

Table 5 Impact of improvements on YOLO-OC

Footnotes

Acknowledgments

References

Table 1
Number of data used for training and testing

Table 2
The process of Kmeans++ algorithm

Table 3
Performance comparison between conventional convolution and deformable convolution. ‘G.A. Box’ means ‘Generate Anchor Box’

Table 4
Performance improvements brought by SE block and SPP modules

Table 5
Impact of improvements on YOLO-OC