Abstract
Recently, lung cancer has been paid more and more attention. People have reached a consensus that early detection and early treatment can improve the survival rate of patients. Among them, pulmonary nodules are the important reference for doctors to determine the lung health. With the continuous improvement of CT image resolution, more suspected pulmonary nodule information appears from the impact of chest CT. How to relatively and accurately locate the suspected nodule location from a large number of CT images has brought challenges to the doctor’s daily diagnosis. To solve the problem that the original DBSCAN clustering algorithm needs manual setting of the threshold, this paper proposes a region growing algorithm and an adaptive DBSCAN clustering algorithm to improve the accuracy of pulmonary nodule detection. The image is roughly processed and ROI (Regions of Interest) region is roughly extracted by CLAHE transform. The region growing algorithm is used to roughly process the adjacent region’s expansibility and the suspected region in ROI, and mark the center point in the region and the boundary point of its point set. The mean value of region range is taken as the threshold value of DBSCAN clustering algorithm. The center of the point domain is used as the starting point of clustering, and the rough set of points is used as the MinPts threshold. Finally, the clustering results are labeled in the initial CT image. Experiments show that the pulmonary nodule detection method proposed in this paper effectively improves the accuracy of the detection results.
Introduction
According to the data provided by the latest edition of the Global Cancer Statistics Report published by the International Cancer Research Institute in 2019 [1], as of 2018, the number of new cancer cases worldwide has reached 18.1 million, while the number of deaths has reached 9.6 million. The incidence and mortality rates in Asia accounted for 48.4% and 57.3% respectively. Among them, the number of new cases of lung cancer and breast cancer both topped 11.6%, while among the deaths, lung cancer ranked first with 18.4%.
Compared with other cancers, lung cancer has the characteristics of shorter onset time, higher malignancy and less accurate diagnosis at the initial stage. Therefore, many patients have reached the advanced stage at the time of onset, as a result of missing the best time for treatment. The 5-year survival rate of patients with advanced lung cancer is only less than 15%, while the 5-year survival rate of patients with early-stage lung cancer can reach 40%∼ 70%. Therefore, early detection, early diagnosis and early treatment of lung cancer is an important means to improve the survival rate of patients. The clinical manifestations of early symptoms of lung cancer are pulmonary nodules, the detection of pulmonary nodules is an important method for early detection of lung cancer. CT tomography is a powerful method to analyze pulmonary nodules. With the rapid development of CT imaging technology, the shape and location of lung lesions can be visually and clearly observed through CT images, which improves the detection rate of pulmonary nodules. However, each patients CT scan can produce hundreds or even thousands of slices, and the radiologist must look through each slice individually. Each slice contains local information of different tissues in the chest, and the radiologist must carefully analyze the information to exclude the trachea. With the interference of blood vessels and other tissues, it is a serious challenge for radiologists to identify the lesions and make a diagnosis. At the same time, the tremendous workload and long-term work will inevitably cause fatigue and distraction of doctors, resulting in missed and misdetected nodules. And the diagnosis of pulmonary nodules by radiologists is a subjective judgment, which is easily limited by doctors’ own knowledge level and diagnostic experience. With the rapid increase in the amount of CT image data, the serious shortage of experienced radiologists is becoming increasingly apparent. In order to improve the accuracy and objectivity of lung cancer diagnosis, and at the same time reduce the burden on doctors to read films, computer-aided detection systems have become hot research issues. It can help radiologists detect pulmonary nodules from a large number of CT images quickly and effectively.
Contribution points of this paper: First, an adaptive threshold DBSCAN clustering method is proposed to detect nodules in CT images of the lung, which reduces the error caused by artificial threshold settings and significantly improves the segmentation of lung nodules PrecisionSecond, by comparing the algorithm proposed in this paper with five common image enhancement methods, the CLAHE algorithm has a better auxiliary effect on the method proposed in this paper and has played an indelible role in improving the accuracy of the detection results of this algorithm
Related work
Since pulmonary nodules are the early clinical manifestations of lung cancer, accurate detection of pulmonary nodules is very important for the early detection and treatment of lung cancer, and computer-aided detection system plays a very important role in the detection rate of pulmonary nodules and improves the work efficiency of doctors. Therefore, scholars at home and abroad have done a lot of explorations and researches on the detection algorithm of pulmonary nodules.
In early studies of pulmonary nodule detection, researchers mainly used methods based on image processing and traditional machine learning. Ye et al. [2] first segmented the lung parenchyma using the fuzzy threshold method, then extracted the local shape features and local divergence information to describe the pulmonary nodules, and finally used the weighted support vector machine for classification. This method has a good detection effect on both ground glass nodules and solid nodules. Yoshida et al. [3] first extracted the candidate nodule area and corresponded to the same position as the image of the normal lung, and then registered it with a wavelet multi-resolution image, performed subtraction, and finally judged whether there were nodules in the candidate area according to its difference. Choi WJ et al. [4] first performed three-dimensional segmentation of the lung parenchyma, then used a multi-scale point enhancement filtering algorithm to detect candidate nodules in the segmented lung parenchyma, and finally extracted a new three-dimensional shape feature descriptor, and used the Support Vector Machine(SVM) classifier to classify the candidate nodules. Messay et al. [5] first used morphological operations and strong threshold method to perform segmentation and detection of candidate nodules at the same time, then extracted features from candidate nodules. Each nodule extracted 245 morphological features, and finally used Fisher classifier to classify candidate nodules. Tan et al. [6] first used an enhancement filter to locate the candidate nodules, then extracted the features of the candidate nodules, and finally used the classifier combined with artificial neural networks and genetic algorithms to classify the candidate nodules. Jacobs et al. [7] proposed a computer-aided detection system mainly for subnodules, because subnodules have a high probability of being malignant. It defines 128 features including shape, texture,intensity and a new contextual feature for subnodules to improve the classification effect. Krishnamurthy et al. [8] adopted a K-Means clustering algorithm, which can automatically find seed points, obtain histograms, and apply them to candidate nodule segmentation, extract texture and shape features of candidate nodule after segmentation to filter false-positive nodule. Finally, artificial neural network was used as a classifier to classify the candidate nodules. Setio et al. [9] proposed a computer-aided detection system specifically for solid nodules with a diameter of more than 10mm. Firstly, an optimized three-dimensional segmentation algorithm was used to segment candidate nodules, and then the candidate nodules after segmentation were segmented. The features including spots, shape, intensity, and spatial context were calculated to obtain 24 features. Finally, Support Vector Machine based on radial basis kernel functions were used to classify candidate nodules. All of the above methods are computer-aided detection methods based on image processing and traditional machine learning. All of them first obtain candidate nodules, then design and manually extract features. Finally, the classifiers in machine learning are used to classify the candidate nodules. Compared to artificially designed and extracted features, convolutional neural networks in deep learning can automatically and efficiently learn features from CT images. Features are automatically learned by these neural networks are also more discriminative than artificially designed features.
In recent years, convolutional neural networks have been playing a cornerstone role in the field of imaging. There have been many researches [10–12] who apply deep learning to the field of medical image. Tajbakhsh et al. [13] performed a comparative test on convolutional neural network and artificial neural network respectively for pulmonary nodule detection and benign and malignant classification tasks that received much attention in CT images of the lungs. The classification performance of the method is based on convolutional neural network is much better than the method based on artificial neural network. Ginneken et al. [14] extracted the coronal, sagittal, and axial faces for each candidate nodule detected by the candidate nodule detection module of the OverFeat model, then used a convolutional neural network to extract features and extract the resulting. The 4096 feature vectors were used as the input value of the support vector machine to classify the candidate nodules, and finally achieved a sensitivity of 76% at the level of four false positive nodules per scanning. Setio et al. [15] proposed a multi-view convolutional neural network, which collected 2D slices in 9 directions for each candidate nodule, and used multiple 2D convolutional neural networks to extract features and classify them. Zhao Pengfei et al. [16] first used the two-dimensional Gaussian probability density function and the fast edge detection algorithm to construct a candidate nodule region template, then obtained the candidate nodule region from the CT image, and used it as the input data. Finally, used the judgment threshold to realize the labeling of the nodule suspected area.
CT images are essentially three-dimensional, and the 2D model-based detection results generally have a high false positive rate. Recently, lung nodule detection models based on 3D convolutional neural networks have become the focus of researchers’ attention. Ding et al. [17] based on a deep convolutional neural network for pulmonary nodule detection, adding a deconvolution layer to the Faster R-CNN’s convolutional neural network for candidate nodule detection, and then using a 9-layer 3D convolutional neural networks to perform false positive filtering. Dou Q et al. [18] proposed a new 3D convolutional neural network framework. Firstly, a 3D fully convolutional neural network is established to filter candidate pulmonary nodules using online sample filtering. Then, a mixed residual network is used to improve the accuracy of nodule recognition using nodules size and location information.
Preprocessing and detection algorithms
In image analysis, the quality of the image directly affects the design of the recognition algorithm and the accuracy of the effect, eliminates irrelevant information in the image, restores useful real information, enhances the detectability of the information, minimizes the data, and improves the features Reliability of extraction, image segmentation, matching and recognition. In addition to lung nodules, there are also factors that interfere with the accuracy of detection in the chest CT image. Although the traditional DBSCAN clustering algorithm has the advantages of being insensitive to noise points and flexible in cluster shape, it has the disadvantages of manually adjusting the threshold. This chapter discusses common image preprocessing methods and the improved algorithms proposed in this paper. The experimental process of this paper is shown in Fig. 1.

Ad-DBSCAN algorithm experimental flowchart.
In image analysis, the quality of the image directly affects the design of the recognition algorithm and the accuracy of the effect. Therefore, preprocessing is required before the image is analyzed and detected (feature extraction, segmentation, matching, and recognition). The main purpose of image preprocessing is to eliminate irrelevant information in the image, restore useful real information, enhance the detectability of the relevant information, and simplify the data as much as possible, so as to improve the reliability of feature extraction, image segmentation, matching and recognition. There are a large number of elements unrelated to the detection point in the lung CT image itself, and there are many interference factors in the area to be detected, such as muscle tissue, skeleton, spine, trachea, etc. So it is necessary to enhance the image of regions other than ROI (regions of interest) to reduce the impact of interference factors on the detection efficiency and results. Generally, the basic flow of image preprocessing is: a. Picture graying; b. Geometric transformation; c. Image enhancement.
When processing a color image, it is necessary to process three channels of the color image in sequence, and the time cost of this process will be relatively large. In order to improve the processing efficiency of the whole image detection, the first process of image preprocessing is to use the image graying process. In reality, medical images are usually based on the principle of X-ray imaging, and computer X-ray photography (CR) technology is used to record the information of X-rays through the human body on the imaging board. After reading by the reading device,it is stored in the form of digitized image information, and then converted to digital tissue density (gray scale) information by a Digital/Analog (D/A) converter, and finally the final image is obtained on the display. Therefore,we can skip the step of image graying directly.
The image geometric transformation is also the transformation of the picture space, which uses geometric transformations such as translation, transposition, mirroring, rotation, and scaling to process the acquired images, and is used to correct system errors and instrument positions (imaging angle, perspective relationship and even the reason of the lens itself) of the image acquisition system. CT images we got at the end have been completed by our professional medical imaging specialists for us, and we can get high-quality images to the greatest extent.
To enhance the useful information in the image, it can be a process of distortion, and its purpose is to improve the visual effect of the image. For a given image usage scenario, to focus on the overall or local characteristics of the image, to make the original image which does not meet the expected clarity clear or highlight some interested features, and to enhance the difference between objects with different features in the image, so as to improve the image quality, enhance the effect of image interpretation and detection, and meet the needs of analysis. So the image enhancement is very important. At present, the commonly used image enhancement methods include Adaptive histogram equalization(AHE) [21], Contrast Limited Adaptive histogram equalization(CLAHE) [22], Laplace operator enhancement (Laplace), logarithmic transform and gamma transform. In this paper, the above algorithms are implemented and compared.
Detection of algorithm
The traditional DBSCAN is widely used in various research fields because of its advantages of fast clustering speed, no restrictions on shape, and no need to preset the number of clusters. Owing to the problems of manually setting the distance threshold Eps and the minimum reachable point MinPts of a core point, this algorithm is controversial. The region growing algorithm is to merge the points similar to the seed points, and finally form the growing region. After the preprocessing of the growth algorithm, the growth set and the boundary point set are obtained. The Euclidean mean of the center point and the boundary point of each subset in the growth set, and the Euclidean distance mean of all the boundary points in the boundary point subset corresponding to each subset are selected. Then the mean value of the two means is calculated as the key parameter of the clustering distance threshold Eps. The number of elements in the subset is used as the minimum number of reachable points MinPts to achieve the adaptive effect of the threshold corresponding to the center points of the different subsets. The parameters used in the original DBSCAN clustering algorithm and their corresponding meanings are shown in Table 1, and the parameters used in the algorithm proposed in this paper and their corresponding meanings are shown in Table 2.
Frequently Used Symbols
Frequently Used Symbols
Frequently Used Symbols
DBSCAN [20] is a spatial clustering algorithm based on density. This algorithm usually uses the density of the point set in the sample as the basis to determine the category of the clustering result. Generally, the position relationship between the samples of the same category should be closely connected. When the preset density threshold is reached, the algorithm divides the point block composed of this point set into multiple regions. In the final divided region, it can be found that there are more clustering blocks with uncertain shape besides the required clustering blocks. The greatest strength of this algorithm is that all points in the data set are treated equally, and there is no clustering result affected by the sensitivity to outliers. In DBSCAN algorithm, neighborhood Eps is the radius of clustering in the execution of the algorithm. The core point (object) refers to the point objects in the spatial data set that meet the MinPts points in their neighborhood Eps. These point objects are called Eps neighbors of the core point, which can be expressed as Neps (p) (Formula 1). Direct density-reachability refers to any two points in the neighborhood of spatial data set Eps, only one of which is the core object. If object q and object p are direct density-reachable, the condition it needs to satisfy is Formula 1. Density- reachability is that if two adjacents points in the point set, which is an ordered set of points in the spatial data set, are direct density reachable, the starting point and the end point are density reachable. Density-connection is that in the spatial data set, if the point o has been determined and the other two points in the set are density reachable, the two points are density connected. Two parameters are needed to implement the algorithm: neighborhood Eps and minimum neighborhood MinPts. While, q ∈ NEps(p),|NEps(p)|≥ Minpts.
The traditional DBSCAN algorithm is described as Algorithm 1.
1: Mark all objects in N D as unvisited;
2: Do;
3: Random select unvisited P i in N D ;
4: Mark P i as visited;
5: Set N as the number of points in ɛ-neighborhood of P i ;
6: If(N ≥ MinPts)
7: Create new cluster C,P i → C;
8: Set N P to the set of points in ɛ-neighborhood of P i ;
9: For each unvisited point P x in N P ;
10: If (P x is not visited)
11: Mark P x as visited;
12: Set n as the number of points in the ɛ-neighborhood of P x ;
13: If(n ≥MinPts)
14: If(P x ∉ C)
15: P x → C;
16: End if
17: End if
18: End if
19: End for;
20: Output C;
21: Else
22: Mark P i as Noise point;
23: End if
24: Until All points in N D have been visited;
After the CLAHE transform of the CT image, the original organs with interference factors have been weakened to the maximum extent. We can intuitively observe the area of the main points to be detected in the picture, but how to accurately determine the node cluster is still a problem to be solved. Although the original DBSCAN algorithm does not need to input the category number k, it can find not only the convex shape of the sample clusters, but also the arbitrary shape of the sample clusters, it has been criticized that it needs to preset the distance threshold (neighborhood) Eps and the minimum neighborhood MinPts artificially. The Clustering results will change with the different preset results.Therefore, this paper proposes that rough processing of samples by region growing as the basis for dynamic adjustment of neighborhood Eps and minimum neighborhood MinPts, so as to improve the accuracy of DBSCAN clustering results and avoid the impact of human intervention on experimental results.
Region growing algorithm usually uses the traditional 8-neighborhood to combine pixels with similar properties as shown in the Fig. 2. For each region, first specify a seed point as the starting point of growth, that is, the origin in the figure, and then compare the pixel points and seed points in the area around the seed point (as shown in the figure below). The points with similar properties are combined into a set and stored in the processed set C. The points that are not used as seed points in the set continue to grow outwards until pixels that do not meet the constraints are included. The growth of such an area is completed. Repeat the process until all the areas have been traversed. The algorithm is as Algorithm 2 and Algorithm 3.

The model of 8-neighbourhood.
1: Initialize N R =ø, N C =ø, N B =ø;
2: Do
3: Random select unvisited P i in N D ;
4: P i → N R ; //GenerateP i set N R that considers a P i center point;
5: Mark P i as visited;
6: Do
7: Random select unvisited P x inN R ;
8: If | P i .gray-P x .gray| ≤ T then
9: (P i , P x )→N Ci ; //Put P i and P x into the setN Ci
10: Else
11: Mark P x as visited;
12: P x → N Bi ; //Save P x to N Bi corresponding to N Ci
13: End If
14: Until All points in N R have been visited;
15: Extend Region Growing(N Ci , N R , N Bi );
16: Mark the center and boundary points in N Ci ;
17: Until All points in N D have been visited;
1: Do
2: Random select unvisited P nx in N Ci as new P i ;
3: P i →N R ; //Generate P i setN R that considers a P i center point;
4: Mark P i as visited;
5: Do
6: Random select unvisited P x inN R ;
7: If | P i .gray-P x .gray| ≤ T then;
8: (P i , P x )→ N Ci ; //Put P i and P x into the setN Ci ;
9: Else
10: Mark P x as visited;
11: P x → N Bi ; //Save P x to N Bi corresponding to N Ci ;
12: End If
13: Until All points in N R have been visited;
14: Until All points in N Ci have been visited;
In this process, three key problems need to be solved: the selection of seed points, the admission principle of adjacent pixels in the growing process and the suspensive condition of growth. Generally, the selection of seed points is based on the method of human interaction to implement or find objects and extract the internal points of objects as seed points. In this paper, since the image scanning process has completed the classification of each pixel point in the CT image based on the gray difference and stored in the corresponding set, the selection of seed points is selected from the set in the way of random number. The admission rule is set as the Euclidean distance and pixel difference of coordinate points. Referring to the rules of the Suzuki [19] contour tracking algorithm, each contour point of each connected domain is found as the suspensive condition of growth. Finally, the center point and boundary point of each subset of the growth result set are marked one by one.
1: Initialize N T =ø, Make all points in N C and N B unvisited;
2: For each N Ci in N C do;
3: Select the P c (x c , y c ) of N Ci ;
4: For each P bi (x bi , y bi ) in N Bi corresponding to N Ci do;
5:
6:
7: MinEpts
i
=Dist=
8: Mark P c (x c , y c ) and P bi (x bi , y bi ) as visited;
9: End for
10: Set Cou ci as the number of points in N Ci , Cou ci → N Ti ;
11: Gray
i
=
12: Mark N Ci as visited;
13: End for
1: N B , N C ⇐ Improved Region Growing(N D , T);
2: N T ⇐ Determination of threshold value(N C , N B );
3: N S ⇐N C ;
4: Initialize N SVisited =ø, N PVisited =ø, N Nighbor =ø, N ROI =ø;
5: Do
6: Random select unvisited N Si in N S ;
7: Take the subset N Ti corresponding to N Si from the threshold N T ;
8: Use the P c of the N Si as P st of the clustering;
9: Do
10: Random select unvisited P Si in N Si ;
11: P Si →N PVisited ;
12: Dist
p
=
13: Sub
p
=
14: If(Dist p ≥Cou Ci && Sub p ≤Gray i ) then
15: P Si →N ROIi ;
16; End if
17: Set N as the number of points in N ROIi ;
18: If(N≥ MinEpts i ) then
19: N ROIi →N Tem , N ROIi →N ROI ;
20: End if
21: Until all point in N Si have been visited
22: N Ext ⇐Extended set processing(P st , N W );
23: N Nighbori ⇐N Tem ∪N Ext ;
24: N Ext ⇐Neighborhood Set Processing(N Nighbori , N W );
25: N Ext →N Nighbori ;
26: N PVisited ⇐{p i |p i ∈ N Nighbori , i ≤ n };
27: Until All Point in N Nighbori have been putted in N PVisited
28: N Nighbori →N Nighbor ;
29: P S →N PVisited , P S ∈N Si ,N Si →N SVisited ;
30: Until all subsets in the startingN S have been putted in N SVisited ;
1: Do
2: Set P st as P s ; //Let the core point P st be the sub-center point P s
3: select unvisited P Si in N D ;
4: Dist
p
=
5: Sub
p
=
6: If(Dist p ≥Cou Ci && Sub p ≤Gray i ) then
7: P Si →N spi
8: End if
9: Set N as the number of points in N spi ;
10: If(N≥ MinEpts i ) then
11: N spi →N Ext
12: End if
13: Until All Subset in N D have been traversed;
1: Do
2: Random select unvisited P x in N Nighbori ;
3: Set P x as P st ; //Use this data point P x as the starting point P st of the cluster
4: Do
5: Set P st as P s ; //Let the core point P st be the sub-center point P s
6: select unvisited P Si in N D ;
7: Dist
p
=
8: Sub
p
=
9: If(Dist p ≥Cou Ci && Sub p ≤Gray i ) then
10: P Si →N spi
11: End if
12: Set N as the number of points in N ROIi ;
13: If(N≥ MinEpts i ) then
14: N spi →N Ext
15: End if
16: Until All Subset in N D have been traversed;
A pre-processed growth set is obtained as the clustering starting set by implementing the Improved Region Growing algorithm, and then a Determination of threshold value algorithm is executed to obtain a threshold set, which consists of the cluster adaptive distance threshold, the minimum reachable number of core points and mean gray difference. The visited set, the visited point set and the neighborhood set are set null. An unvisited subset is randomly taken from the starting set, and the corresponding subset is taken from the threshold set. The center of the initial subset is used as the starting point of clustering. The starting point is taken as the center point, the corresponding distance threshold Eps and the minimum reachable point threshold MinPts as the clustering parameters, and clustered. In this subset, the data points are selected, which satisfy the distance threshold Eps and the minimum reachable point threshold MinPts, to form a cluster ROI. The points in the cluster range ROI that meet the absolute gray value in the initial subset are put into the temporary domain set. The starting point is taken as the sub-center point. Selecting the points that satisfy not only the distance threshold Eps and the minimum reachable point threshold MinPts from all the points set including the starting subset, but also the absolute value of gray difference with the center point is less than the average gray difference point Gray, putting them into the neighborhood set corresponding to the set where the starting point is located, and merging the neighborhood set and neighborhood set to get the final neighborhood set. Taking out any unvisited data point in the neighborhood set, using the data point as the starting point of clustering, and using the same method as the center point of the subset T i to process the starting point of the neighborhood set. The obtained qualified data point are stored in the current neighborhood set, and this point is put into the visited point set. The loop can be ended until all data points in the current neighborhood set have been put into visited set, and the current neighborhood set are stored in the neighborhood set. The center point of the initial subset is put into the set of visited points, and the subset is put into the set of visited points. The algorithm ends until all the subsets in the initial set have been put into the visited set. The pulmonary nodule detection region is obtained.
Compared with the traditional DBSCAN clustering algorithm, the advantages of the improved DBSCAN image clustering algorithm are as follows: First, the starting point of the clustering algorithm is no longer the processing of a single pixel point. Instead, the region growing algorithm is used to process the pixel point of images, which are processed by gamma transform. Second, the center point of each subset preliminarily arranged is as the starting point of clustering, so as to achieve clustering only for nodular regions, which significantly improves the efficiency of clustering operation and avoids the impact of the remaining clustering regions on the clustering results of nodular regions. Third, the maximum value of the Euclidean distance from the center point to the boundary point of each subset was taken as the distance threshold of the algorithm. It can achieve the effect of self-adaptive clustering threshold, reduce the error caused by artificial threshold, and significantly improve the accuracy of pulmonary nodule segmentation.Through analysis of the algorithm, we finally conclude that the time complexity of the algorithm proposed in this paper is N2.
Experiment
Experimental environment
The main software and hardware environment used in this experiment is: Win10 (x64), IntelliJ IDEA 2018.2.4 x64 environment, Inter(R) Core(TM) i7-7700HQ 2.80GHz CPU, 16.0GB of memory. The data source used in this paper is the Lung Image Database Consortium image collection (LIDC-IDRI). This data set was initiated by the National Cancer Institute (NCI) and further promoted by the National Institutes of Health (FNIH) Foundation. The collection was completed with the active participation of the Food and Drug Administration (FDA). According to the official, the data set contains a total of 244,617 (125 GB) of 1010 patients, including diagnostic and lung cancer screening chest tomography (CT) scan images. In this paper, CT images of 90 patients (16,338 in total, of which 10,441 are pictures with nodules and 5,808 are pictures without nodules) are selected for training and inspection algorithms. CT images of 50 patients are used for training algorithms and collection of nodule distribution positions (a total of 8,675 pictures, of which 5,544 are pictures of nodules and 3,131 are pictures of nodules). We chose to use the CT images of the remaining 40 patients to detect the effectiveness of the algorithm (a total of 7,663 images, of which 4,897 were nodules and 2,677 were non-nodules).
Data preprocessing
(1) Contrast Limited Adaptive histgram equalization(CLAHE)
CLAHE differs from ordinary adaptive histogram equalization mainly in its contrast limit. This feature can also be applied to global histogram equalization, which constitutes the so-called limited contrast histogram equalization (CLAHE), but this is rarely used in practice.
In CLAHE, contrast clipping must be used for each small area. CLAHE is mainly used to overcome the problem of excessive amplification noise of AHE. This is mainly achieved by limiting the degree of contrast improvement of the AHE algorithm. The contrast magnification around a specified pixel value is mainly determined by the slope of the transformation function. This slope is proportional to the slope of the cumulative histogram of the domain. CLAHE cuts the histogram by using a predefined threshold before calculating the CDF to limit the magnification. This limits the slope of the CDF. Therefore, the slope of the transformation function is also limited. The value that the histogram is clipped, which is called the clipping limit, depends on the distribution of the histogram and therefore on the value of the field size.
In general, it is not good to directly ignore those parts that exceed the histogram clipping limit, but you should evenly distribute these cropped parts to other parts of the histogram. See Fig. 3.

CLAHE sample diagram.
To sum up, images with different enhancement effects can be obtained by changing the maximum slope S max of mapping function and the corresponding maximum histogram height H max . By limiting the height of the local histogram, CLAHE can limit the enhancement of the local contrast, so as to limit the amplification of noise and the over enhancement of the local contrast.
(2) Image enhancement contrast
The comparison of the processing results of the five image enhancement methods is as follows:
In Fig. 4, (a) the raw CT image without processing; (b) the CT image after Adaptive histogram equalization (AHE) processing; (c) the CT image after Contrast Limited Adaptive histgram equalization (CLAHE) processing; (d) the CT image after Laplace operator enhancement; (e) the CT image after Laplace operator enhancement; (f) the CT image after gamma transform enhancement processing. By comparing the above five experimental images, the contrast of the image (c) is obviously better than several other methods, so this paper uses the CLAHE transform method in the image preprocessing part. When the value of CLAHE is different, the enhancement effect of the image is also different.

Comparison of the processing results.
After processing the CT image with CLAHE taking different parameters, we can find that CLAHE taking 4 can better achieve the effect of enhancing the detection area, so CLAHE = 4 in the experiment in this paper.
(3) Fixed Threshold Segmentation and Connected Region Analysis
CLAHE image enhancement processing using fixed threshold segmentation, the results are shown in Fig. 6.

Comparison of image processing results of CLAHE algorithm under different values.

Fixed Threshold Segmentation.
Then we perform connected area analysis on the obtained image,the results are shown in Fig. 7.

Fixed Threshold Segmentation.
From Figs. 6 and 7, we can clearly get that after fixed threshold segmentation and connection area analysis, we have basically obtained CT images for nodule detection.
First, a data model for each pixel is established, including position information in the image, pixel color information, whether it is accessed, the collection where the pixel is first classified, gray value, boundary point status, center point marker bit, and the final collection. The pixels in the loaded CT image are read one by one, and the relevant information of the pixels is stored in the established model. As shown in Fig. 8.

Data model definition for a single pixel.
Secondly, all scanned pixels are classified for the first time according to the gray value. The scanned pixel access flag bit (the default is 0) becomes 1, and the corresponding tag of the classified collection (the default is 0) becomes 1. The data storage of the pixel models in a single collection is shown in the following Fig. 9.

Rough classification result set.
Finally, the coarsely classified data is detected and classified using the algorithm proposed in this paper, and a subset of the clustering results is stored as a set to facilitate the prediction of the locations of knot nodes in subsequent detections. The clustering result subset storage is shown in the following Fig. 10. And the clustering result subset is labeled on the CT image, as shown in the Fig. 11.

Data in the subset of cluster results after single CT image training.

Data in the subset of cluster results after single CT image training.
Experimental results and analysis in this paper For independent and vascular conglutinated nodules, this paper compares the original DBSCAN detection results in the same region with the detection results of the Ad-DBSCAN algorithm proposed in this paper, as shown in Figs. 12 and 13.

Segmentation results of solitary nodule by this method.

Segmentation results of vascular adhesion nodules by this method.
Among Fig. 12, column (a) is the raw CT image without processing; column (b) is the result of (a) local magnification; column (c) is the result of detection using traditional DBSCAN algorithm; column (d) is the result of detection using the Ad-DBSCAN algorithm proposed in this paper; column (e) is the result of manual segmentation by relevant experts.
Among Fig. 13, column (a) is the raw CT image without processing; column (b) is the result of (a) local magnification; column (c) is the result of detection using ttable raditional DBSCAN algorithm; column (d) is the result of detection using the Ad-DBSCAN algorithm proposed in this paper; column (e) is the result of manual segmentation by relevant experts.
By using the algorithm proposed in this paper, 247,607 nodal nodes included in 4,897 CT images containing pulmonary nodules in 40 patients were tested, of which 167 were nodules with a diameter greater than or equal to 20 mm. There are 44,898 between 10 and 20mm, 113,595 between 5 and 10mm in diameter, and 88,683 between 3 and 5mm in diameter. The test results we obtained are shown in Table 3.
Examination of pulmonary nodules
Further observation of undetected lung nodules can be seen. First, small diameter lung nodules are easy to miss, especially those small nodules that adhere to the alveolar wall; second, those with higher transparency, It is also called ground glass density nodule. Because it has a small density in the image and the image is blurred, it is easy to miss detection. The Fig. 14 below shows some undetected lung nodules. These undetected lung nodules are about 5 mm in diameter.

Fixed Threshold Segmentation.
The indicators for evaluating the pulmonary nodule CAD system in this section are: sensitivity and average false positives.
The calculation formula for sensitivity is:
TP stands for true positive, FP stands for false positive, TN stands for true negative, and FN stands for false negative. Sensitivity is usually used to measure the proportion of detected true positives among all positive samples. The average number of false positives (FP / scan) is the total number of false positives detected divided by the number of pictures, which is used to measure the number of falsely positive lung nodules in a unit picture.
Based on the above two evaluation indicators, we compared the experimental results of the detection method proposed in this paper with the pulmonary nodule detection methods proposed in other literatures. The comparison results are shown in Table 4. These methods are based on the LIDC-IDRI dataset to study lung nodules. They are the same as the dataset used in this paper, so they are relatively comparable to each other.
Comparison of different lung nodule detection algorithms on the LIDC-IDEI dataset
Comparison of different lung nodule detection algorithms on the LIDC-IDEI dataset
Literatures [23–25] have adopted neural network methods to automatically extract features and automatically detect pulmonary nodules. [23] designed a multi-view convolutional neural network to detect lung nodules. The sensitivity of this method is 90.5%, and the average number of false positives is 4.0. Literature [24] designed a method based on geometric and statistical features to extract candidate lung nodules, and then used multiple artificial neural networks to detect it. Using bagging method on multiple neural networks, the resulting model sensitivity was finally obtained. It is 89.4%, and the average number of false positives is 2.0. The literature [25] has improved the classic AlexNet to detect pulmonary nodules, and the sensitivity obtained is 95.0%, and the average number of false positives is 5.62. These neural networks are used Based on a large amount of data in the data set for training and prediction, about 1000 CT images were generally selected for training and testing. Therefore, the obtained results are more reliable and suitable for wide application in clinical medicine.
The sensitivity of the proposed method is 93.075%, and the average number of false positives is 5.73. Compared with other methods that use neural networks for lung nodule detection, the Ad-DBSCAN used in this paper can store and use the location information of previously detected nodule nodes. For new CT pictures that have not been detected, priority is given to the previous There are regions with high frequency for detection, and then the remaining undetected regions are detected, which improves the detection efficiency. Therefore, better results can be obtained in sensitivity or average false positives, which indicates that the lung nodule detection method proposed in this paper has achieved the expected results.
In this paper, the research content is the detection of pulmonary nodules in chest CT images. The region growing algorithm and the adaptive DBSCAN clustering algorithm are combined to improve the accuracy of pulmonary nodule detection. The image is roughly processed and ROI region is roughly extracted by CLAHE transform. The region growing algorithm is used to roughen the expansion of adjacent regions and suspected points in ROI. The center points of the region and the boundary point of its point set are marked. The mean value of the region range is taken as the growth range threshold of DBSCAN clustering algorithm,the center point of the region is taken as the starting point of clustering, and the rough set is used as MinPts threshold. Finally, the clustering results are labeled in the initial CT image. The analysis shows that the algorithm proposed in this paper is relatively accurate and reliable through the establishment of comparative experiment of the detection of tracheal adhesions and solitary nodules.
Footnotes
Acknowledgments
This work was supported by National Natural Science Foundation of China (No. 61602323, 61873174), National Postdoctoral Fundation of China (No. 2016M591455), Youth Seedling Foundation of Liaoning Province (No. lnqn201913) and Natural Science Funds of Liaoning Province (No. 2019MS264, 20180550019).
