Improved neighborhood covering algorithm and its lung cancer staging prediction

Abstract

Considering the complexity of diagnosing lung cancer, a novel neighborhood covering algorithm based on the stages and diagnosis of lung cancer was put forward. The paper used to process a quantity of data, 1074 cases of lung cancer diagnosed from 2013 to 2017 in the Second Hospital of Anhui Medical University. Modeling and processing the data set by using the improved neighborhood covering algorithm, so as to mine and use the potential information in the data set as much as possible. The Objective is to predict the lung cancer patients reasonably. Experimental results show that the improved algorithm is effective in lung cancer staging prediction, and has faster training speed and better performance. The new method for staging diagnosis of lung cancer and a new reference for the clinical treatment of lung cancer research is provided.

Keywords

Neighborhood covering algorithm stages of lung cancer lung cancer staging prediction

1. Introduction

Lung cancer is a kind of malignant tumor which severely threatens human health and life in today’s the world, and the morbidity is still on the rise in many countries. In recent years, in many large and medium cities of China, due to industrial development, environmental pollution and haze, lung cancer morbidity ranks the first among all malignant tumors. The long-term survival rate of lung cancer is extremely low and the five-year survival rate of China was about 8% in 2002 according to the report; moreover, lung cancer has become the first malignant tumor death cause in China and the morbidity and mortality keep rising quickly. In the past three decades, the lung cancer morality has increased by 465% and the morbidity rises by 26.9% every year. According to the prediction of WHO, by 2025, the number of Chinese lung cancer patients will reach one million and China will become the world’s first lung cancer country [1, 2].

Lung cancer is divided into small cell lung cancer and non-small cell lung cancer which have completely different stages. Staging diagnosis is key to successful treatment. From the perspective of clinical diagnosis, lung cancer staging and clinical diagnosis prediction is divided into two methods: first, staging of non-small cell lung cancer, TNM staging method [3] being adopted in the world; second, staging of small cell lung cancer, “limited stage” and “diffusion stage” being divided for small cells. In literature [4], the author proposed a 2D motion prediction algorithm for lung tumors applied to dynamic MRI images that combines interactional tumor deformations. The algorithm evaluates uniform and patient-specific margins about the gross tumor volume to optimize the tumor coverage. O’Connell et al. [5] put forward a model that uses PET-CT N stage, patient age, location of the tumor (central vs. peripheral), and histology was able to accurately predict the probability of N2 or N3 disease being identified by EBUSTBNA in patients with NSCLC. Xu et al. [6] and Wu et al. [7] analyzed MRI features of lung cancer, and explored the application value of MRI in lung cancer diagnosis and TNM staging.

The above methods are based on clinical experience to judge and predict staging without considering numerous factors influencing lung cancer diagnosis and their mutual influences. Besides, there are many studies on computer-aided diagnosis systems such as decision-making tree and neural network approach, KDD technology, Logistic regression analysis, Bayesian theory, KNN classification method and Fisher linear discriminant analysis, etc. [8, 9]. All these methods are about application of data mining technology in lung cancer diagnosis but fail to take into account influences of multiple-factor and invisible factors over lung cancer diagnosis in clinical diagnosis.

In recent years, the kernel function method has become a research focus of machine learning. It is a relatively complete system method to find rules from small sample data, which is mainly used to solve the pattern recognition and classification problem of limited samples. But the SVM Kernel function method is difficult to determine Kernel function parameter. The problem of high computational complexity fails to obtain the solution of the breakthrough. In this paper, improved neighborhood covering algorithm (Constructive Kernel Covering Algorithm) is applied in predicting lung cancer staging. The kernel function method is combined with the covering algorithm of constructively learning. This paper proposes heterogeneous data sets of kernel cover lung cancer staging prediction method, and overcome the shortcomings of the kernel function of SVM method. This method ensures that the algorithm still has better classification accuracy and less computational cost in the case of insufficient prior knowledge and small samples. The aim is to make lung cancer stage have better prediction effect. Finally, the paper set up a lung cancer stating diagnosis prediction model and tests prediction effect, and providing valuable references for lung cancer clinical diagnosis.

Figure 1.

Sphere domains and projection map of sample points.

2. Covering algorithm and its improvement

2.1 Covering algorithm

The covering algorithm (CA) was proposed by Professor L. Zhang and B. Zhang, which provided a new method for classified machine learning [10, 11]. The algorithm and a series of improved algorithms have been successfully applied in many aspects such as intrusion detection [12, 13], email filtering [14, 15], signal style identification [16], financial prediction [17] and vehicle license plate recognition [18]. Literature [19] echoes nerve cell and the sphere field on the surface to constitute a neural network and thus realize the design of classifier [11], as show in Fig. 1. This paper, based on covering algorithm theory, proposed to separate samples of the same kind through spherical projection from low-dimensional space to high-dimensional space and hyper plane and realize lung cancer staging prediction and diagnosis so as to open up a brand new research field in lung cancer clinical diagnosis.

2.2 Algorithm improvement measures

In literature [11], take two types of learning samples for example. They are classified by the general domain covering algorithm. Formula is as follows:

$\displaystyle d^{1}(k)=\mathop{\text{max}}\limits_{r\in X_{k}}\{\langle a_{i},% x\rangle\}$ (1) $\displaystyle d^{2}(k)=\mathop{\text{min}}\limits\limits_{r\in X_{k}}\{\langle a% _{i},x\rangle|\langle a^{i},x\rangle>d^{1}(k)\}$ (2) $\displaystyle d(k)=\frac{1}{2}(d^{1}(k)+d^{2}(k))$ (3)

To calculate the threshold $\theta=d(k)$ , where $\langle x,y\rangle$ is dot product of $x$ and $y$ . ${d}^{1}(x)$ is minimum distance $a^{i}$ and different sample $x$ . ${d}^{2}(x)$ is maximum distance $a^{i}$ and similar sample $x$ . $a^{i}$ is sample point. Taking cover of sample points in two-dimensional space as an example, two kinds of learning samples are illustrated (represented by means of $\Box$ and $\Delta$ ). The results of general coverage algorithm to separate two types of samples are shown in Fig. 2a. In practice, the classification of multiple learning samples occurs. If the above algorithm is still used, the samples can not be separated well. Figure 2b is three types of sample classification and the effect is unstable. By using the above algorithm, some test sets cannot be classified. To resolve this problem, redefine the threshold $\theta$ .

$\displaystyle\theta=\mathop{\text{min}}\limits_{r\in X_{k}}\{\langle a_{i},x\rangle\}$ (4)

Compared with the algorithm in reference [17], the threshold is reduced. The sample points in each type of spherical field unchanged, but the radius is reduced to the minimum in Fig. 2c. The algorithm improvement measures can avoid the situation in Fig. 2b. The training sample can correspond to a unique spherical field.

Figure 2.

(a) The general coverage algorithm separates two types of samples; (b) The general covering algorithm separates three types of samples; (c) The improved neighborhood coverage algorithm separates three types of samples.

2.3 Improved neighborhood covering algorithm (ICA) based on lung cancer staging diagnosis

To improve covering algorithm is to project plenty of sample data onto the high-dimensional spherical field, then obtain samples of the same kind on the hyper plane of the spherical field and realize lung cancer diagnosis prediction staging. Detailed algorithm is as follows:

Initializing suppose the input set $K=\left\{{\left({x^{1},y^{1}}\right),\left({x^{2},y^{2}}\right),\left({x^{3},y% ^{3}}\right),\ldots,\left({x^{m},y^{m}}\right)}\right\}$ . The set $K$ has m patient samples and each input patient sample $x^{i}\left({i=1,2,\ldots,m}\right)$ has $n$ dimensional reference index attributes. $y^{i}$ is the corresponding lung cancer diagnosis staging output of the input sample $x^{i}$ . Suppose that different $y^{i}$ diagnosis output results are different in different outputs of $m$ samples. There are $s$ kinds.

Suppose $I\left(t\right)=\left\{{\left.i\right|y^{i}=y^{t},i=1,2,\ldots,m}\right\}\left% ({t=1,2,\ldots,s}\right)$ respectively represents the set of all sample labels of the sample output $y^{t}$ ; suppose the corresponding input set is $M\left(t\right),M\left(t\right)=\left\{{x^{i}|i\in I\left(t\right)}\right\}$ .

Step 1.
Suppose the input patient training sample set is $X=\left\{{x^{1},x^{2},\ldots,x^{m}}\right\}$ , being divided into $s$ categories.
Step 2.
The maximized norm of samples in the sample set is $r\left({r=\max\left\{{\left|{x^{k}}\right||x^{k}\in X}\right\}}\right)$ . All sample points in $X$ are projected onto the spherical surface with its central point being in the origin and its radius being $R\left({R>r}\right)$ according to the conversion way of $T\left({x^{k}}\right)=\left({x^{k},\sqrt{R^{2}-\left|{x^{k}}\right|^{2}}}\right)$ . The projected sample set is still expressed as $X$ .
Step 3.
The patient sample set $X$ after projection is $I\left(t\right),M\left(t\right)\left({t=1,2,\ldots,s}\right)$ . The initial value $i=1,j=1,t=1$ .
Step 4.
A sample point $x^{k}$ which is not covered in $M\left(t\right)$ is randomly selected, supposing

Minimum value of heterogeneous distance: $d^{1}\left(k\right)=\mathop{\text{max}}\limits_{m\notin I(t)}\left\{{\langle x% ^{k},x^{m}\rangle}\right\}$

Maximum value of homogeneous distance: $d^{2}\left(k\right)=\mathop{\text{min}}\limits_{m\in I\left(t\right)}\left\{{% \langle x^{k},x^{m}\rangle\left|{\langle x^{k},x^{m}\rangle>d^{1}\left(k\right% )}\right.}\right\}$

Center of the optimal classification facet: $d\left(k\right)=\frac{d^{1}\left(k\right)+d^{2}\left(k\right)}{2}$

Radius of the optimal classification facet (the threshold $\theta$ ): $\theta=\min_{r\in X_{k}}\{\langle a_{i},x\rangle\}$
Step 5.
$x^{k}$ is the center of the spherical field and the threshold value $C_{j}^{i}(i=1,2,\ldots,s;j=1,2,\ldots,n_{i})$ . $C_{j}^{i}$ indicates the number covered by the $i$ kind of sample points.
Step 6.
Covered points in $M\left(t\right)$ are marked first to judge whether all points in the training set $X$ are marked or not. If all are marked, the algorithm is ended; otherwise, judgment will be continued to check whether all sample points in $M\left(t\right)$ are covered and market. If yes, then $i\leftarrow i+1,j=1,t\leftarrow t+1$ and back to Step 4; otherwise, suppose $j\leftarrow j+1$ and back to Step 4.

The algorithm covers all samples, with $d\left(k\right)$ as the center, $\theta=\min_{r\in X_{k}}\{\langle a_{i},x\rangle\}$ as the radius and $C_{j}^{i}$ as the threshold value; $\langle x^{k},x^{m}\rangle$ represents inner product.

In the lung cancer staging diagnosis prediction test experiment, the training sample dataset and test sample dataset were respectively and randomly selected out of sample data on 1074 cases. If a test sample belonged to the spherical field with categorized coverage of a certain training sample, the test sample was divided as such; otherwise, the test sample couldn’t be divided into any spherical field and be marked as “rejection” [20, 21].

Compared with SVM algorithm, this algorithm has the following features:

(1)
For any given sample set, the algorithm can construct a kernel function that can be accurately divided into sample sets at one time.
(2)
The general coverage number of SVM is much less than the number of sample points, and the calculation is very large. The algorithm is less calculation than SVM.
(3)
After the first covering, the optimal solution can be solved on this basis, which is much faster than the initial solution from any given initial point.

3. Application of algorithm in clinical diagnosis analysis

It’s easier to build a classifier with covering algorithm so prediction of lung cancer staging diagnosis is mainly demonstrated in the following process: firstly, learning dataset on lung cancer diagnosis indexes, factors and symptoms, applying the lung cancer classification level in medical field, correctly categorizing obtained results and realizing data classifier; second, analyzing later-stage data on feature attributes, comparing them with data which has been correctly classified, categorizing them correspondingly and completing prediction [22, 23]. Data processing through covering algorithm can transform prediction of high-dimensional attribute goal results with difficulty in confirming lung cancer level into data which is easy to predict with low-dimensional feature attributes so as to compare them with data in the classifier and predict the results.

Table 1
The raw data of characterizing attribute in stages and diagnosis of lung cancer

Patients	1	2	3	4	5	…	56	Diagnosis
	(Gender)	(Age)	(Occupational)	(Smoking/year)	(Family medical history)		(CEA)
1	M	58	Teacher	20	Yes	…	27.5	Lung cancer
2	F	71	Farming	No	No	…	97.3	Lung cancer
3	M	37	Painter	18	No	…	130.6	Lung cancer
4	M	47	Miner	26	Yes	…	56	Lung cancer
5	M	59	Farming	35	No	…	113.7	Lung cancer
…	…	…	…	…	…	…		…

3.1 Data set preprocessing

This paper regards lung cancer diagnosis results from December, 2013 to December, 2017 provided by the Respiratory Medicine of a level three and class A hospital in Anhui as its experiment data to test the diagnosis ability and effect of the covering algorithm in case of its application in lung cancer diagnosis.

The experiment data is the data in “56 Lung Cancer Characteristic Quantity Data” (see Table 1) which is preprocessed data on lung cancer patients in the past seven years sorted out by the Respiratory Medicine of a hospital in Anhui. It contains 56 indexes of 1,074 cases in the past five years. The data is used in the covering algorithm to gain data on feature attributes. Fifty-six indexes include the following items: gender, age, occupational danger, smoking history, family medical history, air pollution, indoor environmental pollution, nutrition status, genetic factor, chronic lung disease, sixteen clinical features (coughing, hemoptoe, stridor, choking sensation in chest, anhelation, chest pain, precava blocking, compression esophagus, etc.), four diagnostic imaging items (chest X-ray, chest CT, MRI and hepatobiliary scintigraphy), fifteen pathological examination items (exfoliative cell examination of sputum, transbronchial lung biopsy, alveolar wash, skin lung biopsy, pleural biopsy, pulmonary lesion or metastases needle aspiration examination under ultrasonic guidance, etc.), eight cancer markers examination (tissue polypeptide antigen, carcino-embryonic antigen, squamous carcinoma antigen, CYFRA21-1 and neuron specific enolase, etc.) and three lung cancer examination items, etc. The calculation decision-making attributes for lung cancer staging diagnosis compromises primary tumor maximum diameter, lymphatic metastasis and distant metastasis. See Table 2.

Table 2
The raw data of decision attribute in stages and diagnosis of lung cancer

Patients	Tumor size/cm (T)	Lymphatic metastasis (N)	Distant metastasis (M)
1	1.45	N0	M0
2	4.76	N1	M0
3	8.53	N2	M1
4	4.55	N0	M0
5	7.62	N2	M1
…	…	…	…

See Fig. 3 for the application of covering algorithm for data processing in lung cancer diagnosis analysis.

Figure 3.

Data processing of covering algorithm.

Feature extracting is conducted for general records about 56 indexes of patients and data of lung cancer staging in the same level is covered. In case of network identification, the network will conduct feature extracting on the general records about 56 indexes of a given patient, then identify and categorize such features to obtain the lung cancer staging level for prediction. Detailed processing of 56 indexes is as follows: Step 1, confirming feature attributes of samples, that is, input vector in the covering algorithm, and decision-making attributes, that is, categorization in the covering algorithm; in the experiment process, 56 indexes of 1074 patients influencing lung cancer staging (1,074 $\times$ 56 data) is selected as the input vector set of the covering algorithm and as the training sample. The lung cancer staging categorization of corresponding patients is deemed as the decision-making attribute. There are 1,074 samples in total as the test samples. Step 2, preprocessing of original data. Fifty-six indexes used for judging lung cancer staging influencing factors are used as the original data on feature attributes. Patients are different in these 56 indexes. When professional medical staff conduct diagnosis prediction on patients, major factors which influence lung cancer staging are selected according to clinical experience and thus realize diagnosis prediction.

In the experiment, in order to conduct lung cancer diagnosis prediction for patients, feature attributes which determine lung cancer staging need to be sorted out: merging 1,074 $\times$ 56 data and forming the required feature attribute sequence, 1,074 items in total.

Due to individual difference of patients, differences exist between patients’ primary tumor size and status T (staging: T ${}_{\text{X}}$ , T ${}_{0}$ , T ${}_{\text{is}}$ , T ${}_{\text{1a}}$ , T ${}_{\text{1b}}$ , T ${}_{\text{2a}}$ , T ${}_{\text{2b}}$ , T ${}_{3}$ and T ${}_{4}$ ), regional lymphatic metastasis status N (staging: N ${}_{\text{x}}$ , N ${}_{0}$ , N ${}_{1}$ , N ${}_{2}$ and N ${}_{3}$ ) and whether distant transfer M or not (staging: M ${}_{\text{x}}$ , M ${}_{0}$ , M ${}_{\text{1a}}$ and M ${}_{\text{1b}}$ ). The clinical diagnosis execution follows the seventh edition of lung cancer TNM staging standards issued in 2009 [1]. After processing, Table 3 is set to level lung cancer staging conditions.

Table 3

The standard of stages and diagnosis of lung cancer

Determining criterion	Category description	Indicate
T ${}_{\text{x}}$ N ${}_{0}$ M ${}_{0}$	Concealed phase	$-$ 1
T ${}_{\text{is}}$ N ${}_{0}$ M ${}_{0}$	0	0
T ${}_{\text{2a}}$ N ${}_{0}$ M ${}_{0}$ , T ${}_{1}$ N ${}_{0}$ M ${}_{0}$	I	1
T ${}_{3}$ N ${}_{0}$ M ${}_{0}$ , T ${}_{\text{2b}}$ N ${}_{1}$ M ${}_{0}$ , T ${}_{\text{2a}}$ N ${}_{1}$ M ${}_{0}$ , T ${}_{1}$ N ${}_{1}$ M ${}_{0}$ , T ${}_{\text{2b}}$ N ${}_{0}$ M ${}_{0}$	II	2
T ${}_{\text{any}}$ N ${}_{3}$ M ${}_{0}$ , T ${}_{4}$ N ${}_{2}$ M ${}_{0}$ , T ${}_{4}$ N ${}_{0-1}$ M ${}_{0}$ , T ${}_{3}$ N ${}_{1-2}$ M ${}_{0}$ , T ${}_{1-2}$ N ${}_{2}$ M ${}_{0}$	III	3
T ${}_{\text{any}}$ N ${}_{\text{any}}$ M ${}_{1}$	IV	4

Thereinto: referring to lung cancer TNM staging standards for judgment of different stages for primary tumor size and status T, regional lymphatic metastasis status N and whether distant transfer M or not; processing original data on decision-making attributes in Table 2 with application of division standards in Table 3 to gain decision-making attributes in Table 4. After processing, the distribution of sample data is obtained in Table 5.

Table 4

The decision attribute

Patients	Tumor size/cm (T)	Lymphatic metastasis (N)	Distant metastasis (M)	Staging (decision property)
1	1.45	N0	M0	1
2	4.76	N1	M0	2
3	8.53	N2	M1	4
4	4.55	N0	M0	1
5	7.62	N2	M1	4
…	…	…	…	…

Table 5

The distribution of training samples

Indicate	Sample quantity	Staging
$-$ 1	15	Concealed phase
0	53	0
1	77	I
2	157	II
3	287	III
4	185	IV

3.2 Experiment results

Lung cancer patients go for clinical examination and diagnosis after having obvious chest pain and it’s usually middle and late period, that is, II stage, III stage or IV stage in the staging division so clinical sample data concentrate on these three stages. In the experiment process of this paper, three different combinations are selected for multiple tests. Combination I: 799 samples in hidden stage, 0 stage, I stage, II stage and III stage are selected in the experiment and 275 samples in IV stage are chosen as test samples; combination II: 594 samples in stage II and stage III are selected as training samples, and 480 samples in hidden stage, 0 stage, I stage and IV stage are selected as test samples; combination III: 662 samples in stage III and stage IV are selected as training samples, and 412 samples in hidden stage, 0 stage, I stage and II stage are selected as test samples. The network is set based on the covering algorithm for lung cancer staging diagnosis and repeated test is conducted on prediction samples. See Table 6 for test results. See Table 7 for the identification result performance comparison of other classifiers. All experiments are conducted under the programming environment of MATLAB 2017 with CPU as Intel I7 3.6 GHz and a memory of 8.0 GB.

Table 6
The test results of improved neighborhood covering algorithm (ICA)

Methods	Training	Testing	The correct	Covering	Average training	Average testing	Average
	samples	samples	recognition amount	numbers	time/s	time/s	accuracy/%
1	589	185	171	114	3.378	2.590	92.432
2	444	330	298	102	2.546	4.581	91.603
3	472	302	276	107	2.707	4.228	91.391

Table 7

The recognition result of other classifier

Methods	Training samples	Testing samples	Average accuracy/%
			NB	CBA	C4.5	SVM	CA	ICA
1	589	185	87.1	79.5	79.3	89.6	90.4	92.432
2	444	330	73.2	71.4	75.8	83.4	90.3	91.603
3	472	302	75.4	68.3	77.6	87.2	91.2	91.391

Figure 4.

Recognition rate of different classifiers.

In terms of identification result performance comparison between other classifiers in Table 7, four typical classification algorithms are selected, respectively: NB, Naive Bayesian Classifier [24]; CBA, classifier based on association rules [25]; C4.5, decision-making tree classifier proposed by Quinlan [26]; SVM, support vector machine [27]. Results in classifiers in Table 7 and Fig. 4 are all average accuracy.

Experiment results show that in case of improved neighborhood covering algorithm application in feasibility, effectiveness and superiority of lung cancer diagnosis staging prediction, all prediction results have an accuracy rate of over 90% after repeated experiment, which is high practical value for clinical analysis and diagnosis of lung cancer.

3.3 Experiment analysis

With improved neighbor-hood covering algorithm principles is the theoretical instruction in the paper, and lung cancer diagnosis data from December, 2013 to December, 2017 of a certain hospital in Anhui as the experiment case, modeling and analysis are conducted to fully explore and use hidden decision-making information in mass data to analyze and predict lung cancer staging diagnosis. The experiment results find that the application of improved neighborhood covering algorithm in lung cancer staging diagnosis analysis can bring about more satisfying prediction results and help lung cancer patients with further clinical treatment. It provides a brand new method for lung cancer staging diagnosis and a brand new research model for clinical research, and is high practical clinical value.

By comparison with other classification algorithms in terms of efficiency and precision, improved neighborhood covering algorithm has obvious advantages. However, the experiment results also show that high classification precision is obtained, and meanwhile it increases expenses in the test time of training samples.

4. Conclusion

Lung cancer is one of malignant tumors which cause the severest harm to human health and life. With expedited industrialization of China and increased in smoking rate, lung cancer morbidity is rising quickly in China and lung cancer has become the first cause for malignant tumor death of Chinese urban population. In clinical diagnosis, diagnosis of lung cancer staging is crucial to clinical treatment. Considering great individual difference in lung cancer patients and numerous factors influencing diagnosis, certain error exists in clinical diagnosis. The paper goes beyond the conventional TNM staging prediction thought, applies the improved neighborhood covering algorithm in lung cancer staging prediction in a pioneering way. Experiment results demonstrate the superiority of feature attributes and decision-making attributes in the algorithm and provide brand new research reference value for lung cancer clinical treatment research.

Footnotes

Acknowledgments

The authors would like to thank the National Nature Science Foundation of China (Grant No. 61372137), Key Disciplines of Hefei University (2018xk03), the Natural Science Foundation of Anhui Provincial Education Department (No. KJ2015A164, KJ2016A608), the Quality Engineering Project of Anhui Province (No. 2015sxzx018, No. 2015ckjh062, No. 2016jyxm0884, No. 2016jyxm0873, No. 2017jxtd035, No. 2018hfjyxm05, No. 2018hfmooc05) for financial support.

References

Cai

P.Q.

and Li

L.Y.

, PUMC Respirology, Peking Union Medical College Press, Beijing, 2012, 171–195.

Bai

Zhao

J.H.

Zhao

Y.X.

et al., Analysis of gender differences of clinical and pathological characteristics in 568 lung cancer patients, Practical Oncology Journal 26 (2012), 490–494.

Ramiporta

et al., Lung cancer staging: a concise update, European Respiratory Journal 51 (2018), 1800190.

Bourque

A.E.

Carrier

J.F.

Filion

É.

et al., A particle filter motion prediction algorithm based on an autoregressive model for real-time MRI-guided radiotherapy of lung cancer, Biomedical Physics & Engineering Express 3 (2017), 035001.

O’ Connell

O.J.

Almeida

F.A.

Simoff

M.J.

et al., A Prediction model to help with the assessment of adenopathy in lung cancer (HAL), American Journal of Respiratory & Critical Care Medicine 195(12) (2016), 1651.

Q.X.

Zhan

H.H.

and Yang

, Diagnostic value of MRI for TNM staging in lung cancer, Journal of Henan University of Science & Technology (Medical Science) 32(1) (2014), 20–22. (in Chinese)

G.W.

Tao

and Wang

, Kernel covering algorithm and a design principle for feed-forward neural networks, in: Proceedings of the 9th International Conference on Neural Information Processing, Singapore, 2002, pp. 1064–1068.

Tao

Wang

J.Q.

G.W.

et al., The theoretical analysis of kernel technique and its applications, in: Proceedings of the 2002 International Joint Conference on Neural Networks, Honolulu, Hawaii, 2002, pp. 571–576.

Liu

Hsu

and Ma

, Integrating Classification and Association Rule Mining, in: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, New York: AAAI Press, 1998, pp. 80–86.

10.

Zhang

and Zhang

, A geometrical representation of McCulloch-neural model and its application, IEEE Transactions on Neural Networks 10(4) (1999), 925–929.

11.

Zhang

and Zhang

, A geometrical representation of M-P neural model and it’s applications, Journal of Software 9 (1998), 334–338.

12.

Zhao

Zhang

Y.P.

Zhang

et al., The intrusion detection based on the alternative covering algorithm, Computer Engineering and Applications 41 (2005), 141–143. (in Chinese)

13.

Wen

Z.C.

and Tang

, Quantitative assessment for network security situation based on weighted factors, Journal of Computational Methods in Sciences & Engineering 16(4) (2016), 1–13.

14.

Duan

Wang

Q.Q.

Zhang

Y.P.

et al., Spam filtering based on covering algorithm, Computer Science 36 (2009), 217–219. (in Chinese)

15.

Liu

Yan

et al., A frequent itemset mining algorithm based on composite granular computing, Journal of Computational Methods in Sciences & Engineering 18(1) (2018), 1–11.

16.

Friedman

Dan

and Goldszmidt

, Bayesian network classifiers, Machine Learning 29(2-3) (1997), 131–163.

17.

Zhang

Y.P.

Zhang

et al., A structural learning algorithm based on covering algorithm and its application in stock forecasting, Journal of Computer Research and Development 6(41) (2004), 979–984. (in Chinese)

18.

Zhang

Y.P.

Zhang

and Duan

, A constructive kernel covering algorithm and applying it to image recognition, Journal of Image and Graphics 9(11) (2004), 1304–1308. (in Chinese)

19.

McCulloch

W.S.

and Pitts

, A logic calculus of the ideas immanent in nervous activity, MIT Press 5(4) (1988), 115–133.

20.

Zhang

and Zhang

, Study on the method of knowledge discover based on the structured covering algorithm, Journal of Electronics & Information Technology 28(7) (2006), 1322–1326. (in Chinese)

21.

Zhang

Y.P.

Zhang

et al., Machine Learning Theory and Algorithm, Science Press, Beijing, 2012, 212–268. (in Chinese)

22.

J.B.

and He

F.G.

, Improved covering algorithm based on LVQ neural network, Computer Engineering and Applications 48 (2012), 165–169. (in Chinese)

23.

Zhao

Shi

Yang

X.J.

et al., Application of covering algorithm to prediction of precipitation, Computer Engineering and Applications 44(9) (2008), 232–234. (in Chinese)

24.

Zaidi

N.A.

Webb

G.I.

Carman

M.J.

et al., Efficient parameter learning of bayesian network classifiers, Machine Learning (2017), 1–41.

25.

Kargarfard

Sami

and Ebrahimie

, Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm, Journal of Biomedical Informatics 57(C) (2015), 181–188.

26.

Nakajima

and Hai

N.B.

, Dataset Coverage for Testing Machine Learning Computer Programs, in: Software Engineering Conference IEEE, 2017, pp. 297–304.

27.

Xiao

Wang

and Xu

, Parameter selection of gaussian kernel for one-class SVM, IEEE Transactions on Cybernetics 45(5) (2017), 941–953.

Improved neighborhood covering algorithm and its lung cancer staging prediction

Abstract

Keywords

1. Introduction

2.1 Covering algorithm

2.2 Algorithm improvement measures

Table 1 The raw data of characterizing attribute in stages and diagnosis of lung cancer

Table 2 The raw data of decision attribute in stages and diagnosis of lung cancer

Table 6 The test results of improved neighborhood covering algorithm (ICA)

4. Conclusion

Footnotes

Acknowledgments

References

Table 1
The raw data of characterizing attribute in stages and diagnosis of lung cancer

Table 2
The raw data of decision attribute in stages and diagnosis of lung cancer

Table 6
The test results of improved neighborhood covering algorithm (ICA)