Abstract
This paper proposes a particle swarm K-means optimization (PSKO)-based granular computing (GrC) model to preprocess skewed class distribution in order to enhance the classification accuracy for the class imbalance problem. The GrC model obtains knowledge from information granules rather than from numerical data. It also processes multi-dimensional and sparse data by using singular value decomposition and latent semantic indexing (LSI). The data possessing features of multiple dimensions and scarcity can be preprocessed using LSI in order to reduce the number of data dimensions as well as records. Ten benchmark data sets are employed to demonstrate the effectiveness of the proposed model. Experiment results indicate that the proposed model has better classification performance with both imbalanced and balanced data. In addition, the computational result for prostate cancer prognosis reveals that the proposed model really can support physicians in judging the condition of prostate cancer patients with a more accurate survival rate estimation.
Keywords
Introduction
When learning imbalanced or skewed data, in which almost all instances are labeled as one class, while a few instances are labeled as other classes, traditional data mining approaches such as neural networks (NN), decision trees (DT), and support vector machines (SVM) tend to produce high accuracy over the majority class but poor predictive accuracy over the minority class. However, this minority class is usually the important one, like medical diagnoses examples or abnormal products of finished-goods inspection data. A variety of methods have been proposed to cope with imbalanced data problems, such as methods of sampling [1, 2], adjusting the cost-matrices, and moving the decision thresholds [2]. However, these techniques have some disadvantages. For example, the computational load is increased and overtraining may occur owing to the replicated samples in the case of over-sampling. Under-sampling does not take into account all available training data, which results in a loss of available information.
Another approach to dealing with imbalanced data is the use of granular computing. Granular computing represents the data pattern in some subsets called information granules (IG). An IG is a group of objects which have similar functions and are indistinguishable. In imbalanced data, normal data or the majority data have more similar functions, while the minority class has a more unique condition. By constructing the IGs based on data similarity, the number of IGs in the majority class is reduced to less than those in the minority class. By considering the IGs, the proportion of the minority class can be increased. This addresses the imbalance condition and helps classifiers.
The main purpose of this paper is to develop a novel granular computing methodology for tackling imbalanced data. This proposed method extracts information from IGs for discrete or continuous data by employing a metaheuristic-based method and sub-attributes. Then, a singular value decomposition and Latent Semantic Indexing (LSI) are applied to process multi-dimensional and sparse data. These procedures help the classifier to classify the data.
Furthermore, the proposed algorithm is applied to prostate cancer data. Prostate cancer is one of the most common causes of death in men in most industrialized countries [3]. Many studies have considered the introduction of diagnosis for prostate cancer, but only few studies have examined prognosis methods for prostate cancer. Some studies have evaluated the use of artificial NN to increase prostate cancer detection rate and reduce unnecessary biopsies by using a neuro-fuzzy system based on both serum data (total prostate-specific antigen; tPSA, percent free PSA), and clinical data (age) to enhance the performance of tPSA to discriminate prostate cancer. However, none of these studies explore prognosis for prostate cancer. In addition, these methods cannot handle data with imbalanced characteristics. Therefore, this paper proposes an algorithm that can perform better with imbalanced datasets like prostate cancer data.
In this paper, the proposed algorithms are compared with the cluster-based GrC model and the improved cluster-based GrC model of the IG-based method. Finally, real medical inspection data of prostate cancer in Taiwan is used to evaluate the effectiveness of the proposed method.
The remainder of this paper is arranged as follows. Section two presents a survey of literature related to this paper. Section three proposes the developed model, while the experimental results are provided in Section four. Section five shows the case study for the prostate cancer prognosis system. Finally, concluding remarks are given in Section six.
Literature survey
This section briefly discusses the related background of this paper, including prostate cancer, classification, granular computing, class imbalance problems and the particle swarm optimization algorithm.
Prostate cancer
Prostate cancer is a significant cause of morbidity and mortality in western counties. According to autopsy data, approximately 42% of men with prostate cancer over the age of 50 die of other causes [4]. In the United States, prostate cancer is the most common cancer and the second most common cause of cancer related death [5]. In 2011, an estimated 240,890 new cases and 33,720 deaths caused by prostate cancer were recorded. Similar data have been reported in Europe and Canada. In Taiwan, the morbidity of prostate cancer is one fifth of male cancer patients, and the mortality is one seventh. According to the data published by the Department of Health, Executive Yuan, there were 3,603 cases and 1,052 deaths caused by prostate cancer in 2008, and the mean age of prostate cancer patients was 75. This indicates that older men have a higher risk of developing prostate cancer. Miller et al. [6] identified 24,405 men with lower-risk prostate cancer using data from 13 Surveillance, and discovered that 55% of these men were potentially over treated in an appropriate initial expectant management. The survival rate of patients after surgical treatment has been previously evaluated using competing-risk analysis. However, the diagnosis of prostate cancer (PCa) does not assess the combined effect of age and comorbidities in patients with high-risk PCa.
In the initial screening process, doctors usually use digital rectal examination (DRE) and test for prostate specific antigen (PSA). If the DRE or PSA result is abnormal, the doctor will advise the patient to undergo biopsy testing. If adenocarcinoma of the prostate is confirmed by microscopic examination of the biopsy specimen, the next treatment will be decided. Treatment options include definitive, curative, systemic, palliative, or salvage therapy.
Due to the high number of prostate cancer patients, many studies have tried to develop an early diagnosis system without biopsy testing. Proposed expert systems have been developed based on patient data including weight, height, body mass index (BMI), prostate-specific antigen (PSA), Free PSA, age, prostate volume, density, smoking, systolic, diastolic, pulse, and Gleason score [7]. Classification or forecasting algorithms were then proposed to analyze these data. Keles et al. [3] proposed a neuron-fuzzy classification algorithm (NEFCLASS). This algorithm aims to determine whether a patient has prostate cancer or benign prostatic hyperplasia (BPH). Chen and Lin [8] focused on identifying principal genes and then used these genes to classify cancers either by classifiers such as support vector machines (SVM) or back-propagation neural networking (BPNN) to extract significant samples. Furthermore, an artificial neural network algorithm was also proposed [9]. Although these algorithms do not give an absolutely correct diagnosis, their results can provide more information for doctors before they undertake further medical treatment.
Classification
Classification aims to build a model of a class label by training a data set such that the model can be used to classify new data whose class labels are unknown [10]. In recent years, many approaches such as artificial intelligence, neural networks, rough set, fuzzy set, and many others have been applied by classification algorithm [11]. Classification problems have numerous practical applications. For instance, patients can be classified into disease groups on the basis of their symptoms for medical diagnosis [12–15]. Through examination of the physical characteristics of objects or individuals, they can be classified into appropriate classes [16, 17]. Letter recognition is also one of the best examples in this field. Moreover, in the field of human resources management, enterprises assign personnel to appropriate occupation groups according to their qualifications by classification tools [18]; Production systems management and technical diagnosis monitor the operation of complex production systems for fault diagnosis purposes [19]; marketing research is aimed at customer satisfaction measurement, analysis of the characteristics of different groups of customers, development of market penetration strategies, etc. [20]; the field of financial management and economics is mainly focused on business failure prediction, credit risk assessment for firms and consumers, stock evaluation and classification, country risk assessment, bond rating, etc. [11]. Other research areas, like environmental, energy management and ecology research analyze and measure the environmental impact of different energy policies and investigate the efficiency of energy policies at a national level [21].
Class imbalance problems
Imbalanced data can be found in many applications, and a number of approaches have been proposed to cope with imbalanced data sets. A basic approach is a sampling method. Sampling methods reduce the imbalance in the data set by removing (down-sampling) instances from the majority class or duplicating (up-sampling) instances from the minority class until the ratio between major and minor classes balance [1, 2]. The second approach is to adjust the cost (weight) of each class [1]. The third approach adapts the decision threshold to impose bias on the minority class. These approaches have been applied in many studies, however, they also have some drawbacks. For instance, the high computation load caused by replicating instances. On the other hand, removing some instances might also remove important information inside the data.
Further studies have improved these basic approaches. Sáez et al. [22] improved the sampling method by combining an iterative ensemble-based noise filter with Synthetic Minority Over-sampling Techniques (SMOTE-IPF). Krawczyk et al. [23] employed evolutionary under-sampling with a boosting method to improve the sampling method. The proposed algorithm was also applied to solving the classification of cancer data. Further extensions of the sampling method were studied by Charte et al. [24]. Ramentol et al. [25] applied a fuzzy rough set theory to develop a fuzzy-rough ordered weighted average approach for imbalanced classification. Embedding fuzzy theory in classification algorithms has also been conducted in some previous researches [26]. In addition, some improvements were made based on Granular Computing (GrC) [27].
Granular computing
Granular computing (GrC) is an innovative information processing computing model. It is a collective term referring to theories, methodologies, techniques, and tools for the analysis of information granules (IGs), e.g., groups, classes, intervals or clusters, encountered in problem solving [28–31]. Generally speaking, GrC is a process of complex information entities called information granules which arise in the process of data abstraction and derivation of knowledge from information. The idea of information granularity has been explored in a number of fields such as rough sets, fuzzy sets, cluster analysis, databases, machine learning and data mining [32].
The main issues in granular computing are how to construct the IGs and to describe IGs [28]. The process of constructing IGs was first proposed by Zadeh [30]. In order to construct IGs more efficiently and more feasibly, some approaches such as the Self Organizing Map (SOM) network, Fuzzy C-means (FCM), rough sets, and Fuzzy Adaptive Resonance Theory (ART) have been proposed [28, 32]. They aim to divide IGs into different levels of granularity [28, 32]. Level of granularity is correlated to the data variants. More detailed information requires smaller IGs. In the process of representing IGs and determining the level of granularity, Bargiela and Pedrycz [32] proposed a “hyperbox” and “inclusion and compatibility” to measure IGs. The Granular computing model copies the human instinct in information processing. Thus, it can improve classification performance in imbalanced data [33]. In classifying a dataset, IGs represent a collection of objects arranged based on their similarity, functional adjacency and indistinguishability [34].
Particle swarm optimization
Particle swarm optimization (PSO) is a modern evolutionary algorithms based on swarm behavior [35]. The algorithm forages for the optimal solution by moving its particles in the solution space [36]. Compared with other artificial intelligence techniques such as genetic algorithm (GA), tabu search (TS), or simulated annealing (SA), PSO is much faster [36]. Kennedy [35] investigated the performance of particle swarm optimization incorporating various neighborhood topologies to different types of problems. The result showed that PSO performance is significantly influenced by the parameter settings. In some cases it is also trapped in local optima [37]. However, it can still obtain high-quality solutions within a shorter calculation time, and more stable convergence characteristics than other stochastic methods [37, 38].
Latent semantic indexing
High dimensional data consists of a large number of features which usually contain redundant and irrelevant information. Therefore, some dimensional reduction techniques have been proposed to reduce the dimensions without changing the data pattern. Latent Semantic Indexing (LSI) is one information retrieval method that can automatically model term-term inter-relationships to improve the retrieval outcome. LSI examines the similarity of the “contexts” in which words appear, and creates a reduced-dimension feature-space where words that occur in similar contexts are near each other. LSI applies a method, singular value decomposition (SVD), from linear algebra, to discover important associative relationships.
Methodology
This section presents the proposed algorithm in four parts: (1) data preprocessing, (2) granular construction, (3) feature extraction and knowledge discovery, and (4) classification for imbalanced data. The first part, data preprocessing, includes the missing value processing and data normalization. The majority class data is then reduced using granular construction in the second part. The output of this part is a balanced dataset. Dimension reduction and feature extraction are then implemented. In this paper, the dimension reduction is performed by LSI algorithm. The last part is the data classification. Figure 1 illustrates the proposed algorithm.
A concise procedure is explained as follows: Data preprocessing Data collection Select benchmark data sets with unbalanced characteristics and collect cogent prostate cancer data. Data preprocessing Delete missing values and implement normalization for raw data. Granular construction Granularity selection criteria Determine the thresholds of H-index and U-ratio. Determine the level of granularity for IGs. The number of IGs is determined by H-index as well as U-ratio. Execute granular construction. The construction of IGs is achieved by clustering techniques. Compute H-index and U-ratio of IGs. H-index is used to measure the consistency of the class in one IG, while the H-index is defined as follows:
where m represents the number of all objects in one granule, i is the number of all IGs and n is the number of objects possessing the majority class. U-ratio means the proportion of undistinguishable granules to all IGs. In this paper, U-ratio is defined as follows:
where u represents the number of undistinguishable granules and i represents the quantity of all IGs. Check whether the criteria are satisfied or not If the H-index is larger than or equal to the threshold of H-index and U-ratio is smaller than or equal to the threshold of U-ratio, the answer is “Yes.” Go to Step 6. Otherwise the answer is “No.” Repeat Steps 5 – 7 until criteria are satisfied. Rewrite attributes Divide value interval of attributes into overlapping and non-overlapping areas and sub-attributes. Feature extraction and knowledge acquisition Analysis of sub-attributes Implement singular value decomposition (SVD) in transforming the original feature space to a smaller feature space in order to reduce the dimensionality. Feature extraction Determine the optimal number of features by evaluating efficiency and accuracy. Sub-attribute reduction Reduce the number of dimensions of the sub-attributes to the optimal number. Classification Implement classifiers and calculate the classification accuracy. Validation Validate the classification performance. If the performance is acceptable, terminate the procedure. Otherwise, repeat Steps 9 – 13. Classification Apply classification method, such as neural network or decision tree, to classify the data after granularity.
In this paper, the IG construction process is conducted using a data mining-based approach proposed by Chen [39]. In order to obtain a better result, in this paper, Chen’s original algorithm is combined with a particle swarm K-means optimization (PSKO) algorithm to build IGs. The PSKO algorithm is an improvement of the K-means algorithm [40].
Selection of granularity
The level of granularity in the proposed algorithm is determined based on the H-index and U-ratio [39]. During IG construction using the PSKO algorithm, similar IGs are grouped in a single layer. The similarity is calculated using Euclidean distance. Herein, the pattern with same degree of similarity will be assigned in the same cluster. After the clusters are built, the H-index and U-ratio are calculated based on the clustering result. The levels of granularity will then be adjusted by PSKO algorithm until H-index and U-ratio are satisfied.
Representation of information granules
This paper employs the concept of sub-attributes, called hyperboxes, to represent IGs. Let a hyperbox [b] defined in R n be fully described by lower bound (b–) and upper bound (b+). The set of all points in the n-dimensional space is an important and frequently used universal set. This set is represented as R n . Through b– and b+, the hyperbox can be expressed as [b] = [b–, b+]. Part 1 in Fig. 2 gives an illustrative example to express the implementation procedure of sub-attributes.
In Fig. 2, there are two IGs, A and B, which have only one attribute, X i . In sub-attributes, IGs are represented by the lower and upper limit of the objects. The IGs A and B can be described as [a–, a+] and [b–, b+], respectively. Part 2 in Fig. 2 shows that there are overlaps between granules A and B. This makes it difficult to handle with knowledge acquisition tools. However, data mining cannot discover knowledge from these constructed IGs because most knowledge acquisition algorithms are designed to deal with numeric attributes. In this paper, this problem is tackled by “sub-attributes” which divide the value interval of attributes into overlapping and non-overlapping areas. Next, a Boolean variable, 0 or 1, is used to represent whether the IG contains these intervals or not.
Experimental results and analysis
The proposed PSKO-based GrC model is verified using both balanced and imbalanced datasets. Table 1 lists all the tested datasets. By applying the balanced datasets, the proposed algorithm can be verified for its capability in extracting data from a variety of datasets, such as a large amount of data, data with multiple categories or high-dimensional data. On the other hand, testing using imbalanced dataset aims to verify the performance of the proposed algorithm in extracting data from a variety of skewed datasets.
The validation process is conducted using 10-fold cross validation. Before implementing the algorithm, all benchmark data sets are preprocessed, and data sets are divided into 90% training and 10% testing sets. For each 10-fold experiments, 30 replications are executed.
Furthermore, the forth part, which is the classification part, is conducted by three different classifiers, namely a feed-forward neural network with back-propagation learning algorithm (BPN), a decision tree (C4.5), and support vector machine (SVM), since they are the most applied classification methods. Finally, computational results obtained by the proposed PSKO-based GrC with classifiers are compared with PSKO-based GrC without classifiers, K-means-based GrC, and numerical computing model [36]. All of these algorithms are built in C language, while the classifiers are programmed via WEKA in Windows 7 using a Core 2 Quad 2.50 GHz CPU with 4 G RAM.
Parameter setting
The proposed algorithm involves some parameters. Different parameter settings may give different results. In this paper, the parameter setting is determined using the Taguchi method. The parameters evaluated using the Taguchi method are the number of particles, learning factors c1 and c2, and inertia weight. Let each parameter be treated as a factor in the Taguchi method, while each factor has three levels. Table 2 shows the tested level for each factor.
In order to reduce computation time, each parameter combination is run using 150 iterations and 10 repetitions. The Taguchi analysis is conducted using MINITAB. The best parameter setting according to the Taguchi results is: 40 particles, 1.47 c1, 0.5 1.47 c2, and 0.5 inertia weight.
In this paper, the proposed algorithm is also compared with a basic classification algorithm called back propagation neural network (BPN) without GrC. This comparison is made in order to evaluate the efficacy of using GrC in classification. The parameter setting for the BPN algorithm is 0.2 learning rate, 0.8 momentum and 5000 iterations. The network structure for each dataset is listed in Table 3. This parameter setting follows the parameter setting in [39]. For some datasets which are not evaluated in [39], the network structure is subjectively determined based on the dataset.
Computational results
The computational results are summarized in Tables 4–7. The results obtained by BPN are used to analyze the effectiveness of using granular computing in the classification problem. The BPN results for balanced datasets listed in Table 4 are no better than the results of other algorithms using granular computing. The same result is also shown for the imbalanced datasets. The results of the GrC-based algorithm for imbalanced datasets are also better than the BPN result. This proves that IGs help classifiers to obtain more information to distinguish different classes.
Further comparison is made within the GrC-based algorithms. Table 5 summarizes the computational results for balanced datasets. According to this result, the GrC-based algorithm using BPN as the classifier is relatively better than those with C4.5 and SVM as the classifier. The results also show that between PSKO-based GrC with classified, PSKO-based GrC without classified, K-means-based GrC and the numerical method, the results obtained by PSKO-based GrC with classified model are better than those of the other algorithms. However, if the algorithms use C4.5 or SVM as the classifier, K-means-based GrC and PSKO-based GrC without classified have relatively better performance.
For the imbalanced datasets, using BPN as the classifier is also better than using C4.5 and SVM as the classifier. This is shown by the results summarized in Table 7. Compared with C4.5 and SVM, the proposed algorithm using BPN as the classifier can obtain higher accuracy for both training and testing.
The comparison between algorithms using BPN as the classifier for imbalanced data indicates that for PSKO-based GrC with and without classified data is better than K-means-based GrC and the numerical method. However, by using C4.5 and SVM as the classifiers, K-means-based GrC can obtain relatively better results.
Statistical hypothesis
The computational result for balanced datasets indicates that the PSKO-based GrC with classified data has a relatively better result. Therefore, a further evaluation using non-parametric statistical tests is conducted to find the significance between the PSKO-based GrC with classified data with other algorithms. Herein, the hypothesis is as follows:
where μPSKO GrC* is the mean of PSKO-based GrC with classified data using BPN classifier, and μ i is the mean of other algorithms. The statistical test is conducted using SPSS. Table 8 summaries the p-value with a 95% confidence interval. This shows that the results obtained by PSKO-based GrC with classified data do not significantly differ from those obtained by PSKO-based GrC and K-means-based GrC for some datasets. However, compared with other numerical algorithms using the BPN classifier and algorithms using C4.5 and SVM classifiers, the results are significantly different.
Furthermore, for the imbalanced dataset, the algorithms using BPN as the classifier have better results than those using C4.5 and SVM. However, the performances of algorithms using BPN are relatively similar. Therefore, in order to analyze the differences between algorithms using BPN classification, a non-parametric statistical test is conducted with the following hypotheses:
Test 1
Test 2
Test 3
Model evaluation results and discussion
The computational results presented in Section 4 prove that the proposed classification using granular computation with PSKO algorithm in constructing IGs has a better classification result. Therefore, in this section, the proposed algorithm is applied to real cancer prognosis data.
Prognosis data overview
The prostate cancer related data were collected from a well-known teaching-oriented hospital located in Taipei. It consists of 176 data. The data can be divided into two classes, 1 or 0. Class 1 indicates that the patients have died due to prostate cancer. Moreover, these patients can be separated into four categories according to patients’ survival lengths. There are 61 patients belonging to this class, and each record is determined as category 1, 2, 3 or 4 by patients’ survival lengths. Class 0 indicates that the patients are still alive after more than 5 years. There are 115 patients in this category. These patients are noted as category 5. Figure 3 shows the class distribution.
The original data collected from the hospital contains many features since it includes patient biographies and medical records. This paper does not include all of these features in the calculation. Therefore, two feature selection steps are applied. In the first step, the important features are chosen by an expert (doctor) in prostate cancer. From this step, there are six chosen features as listed in Table 10. In the second step, the feature selection is conducted using stepwise regression. As a result, there are three models suggested by the stepwise regression. Table 11 shows all the suggested models. Herein, the last model with three features is chosen for further evaluation using the proposed algorithm. These three features are biopsy Gleason score, initial prostate specific antigen (iPSA), and digital rectal examination (DRE).
Computational results
The processed prostate cancer data is now evaluated using the proposed algorithm. In this experiment, the parameter setting for the proposed algorithms is also determined using the Taguchi method. Herein, the tested parameter settings are the same as those in Section 4. The best parameter setting for the real dataset based on the Taguchi method. There are 40 particles. The c1, c2, and learning rate are 1.47, 0.5, and 0.5, respectively. For the BPN algorithm, the parameter setting for learning rate is 0.2, momentum is 0.8, and 5000 iterations are run. The network structure is 3-7-5.
Table 12 shows the BPN result. It shows that classification using the BPN algorithm without GrC yields a 67.11% accuracy for training and 59.33% accuracy for testing data. The proposed algorithms using three different classifiers, BPN, C4.5 and SVM are then applied to evaluate the prostate cancer data. There are 30 replications for each 10-fold validation of each tested algorithm. The results are summarized in Table 13. It shows that in general, PSKO-based GrC algorithms have better performance than K-means-based GrC and numerical methods. This result also shows that although the PSKO-based GrC algorithm without classified dataset can obtain better accuracy for the training set, the accuracy for the testing set is significantly reduced. On the other hand, the ratio between training and testing accuracy obtained by PSKO-based GrC with classified data do not as big as the ratio in PSKO-based GrC without classified data. This indicates that the PSKO-based GrC without classified data might have an overfitting problem. For instance, the result obtained by PSKO-based GrC with C4.5. Therefore, this paper prefers the result obtained by the PSKO-based GrC with BPN classifier as the best result. Although it does not yield the best testing accuracy, it obtains the best testing accuracy. In addition, the ratio between testing and training accuracy is quite small.
Statistical analysis
In order to analyze the differences between each algorithm, a statistical test is conducted. Herein, a non-parametric Mann-Whitney statistic test is applied for each pair of algorithms. The hypothesis is as follows,
Conclusions
This paper proposes novel classification algorithms for imbalanced datasets. The proposed algorithms improve the sampling method with granular computation (GrC). The GrC groups similar objects in clusters called information granules (IGs). Since “normal” instances have greater similarity than minority classes, the number of IGs for the majority class will be smaller than the minority class. Thus, by considering the IGs, classifier algorithms can more easily see data patterns. In constructing the IGs, this paper employed the PSKO algorithm. After the IGs were constructed, a dimensional reduction using LSI was applied to eliminate unnecessary information in the data and reduce the computation load. Then, three different classifiers were employed to classify the processed data. This paper applies BPN, C4.5 and SVM as the classifiers.
The proposed algorithms were verified using balanced and imbalanced datasets. The results show that in general, the proposed PSKO-based GrC with and without classified data using BPN classifier has better results than the other algorithms. It also reveals that the proposed algorithms with BPN as the classifiers can perform better than those using C4.5 and SVM as the classifiers. The comparison between algorithms with GrC and without GrC also proves that GrC with IGs successfully improves the classification algorithm since it can provide more information about the data patterns. The IGs constructed by PSKO algorithm help to reduce the imbalance in the dataset.
Furthermore, the proposed algorithm is applied to prostate cancer data. This data consists of different patient conditions. The aim of this classification is to provide early detection of prostate cancer based on patients’ medical records. Therefore, finding potential prostate cancer, which makes up the smaller portion of the patient datasets, is very important. This paper applies the proposed imbalanced data classification to analyze this dataset. The experiment results show that the proposed algorithm has high accuracy, greater than 70%. The experiment results also show that the best algorithm is PSKO-based GrC without classified data using BPN and C4.5 as the classifiers.
In the future, other soft computing techniques should be integrated into granular computing in order to extract more representative granular information. In order to construct a better IG set, other granular selection techniques should be evaluated. Different criteria for granularity selection should also be tested. Future prospects of the proposed algorithm can also be applied to different prognostic problems, like liver cancer, which is very common in Taiwan. In addition, it is necessary to find criteria which are more correlated to prognosis problems.
Footnotes
Acknowledgments
This paper is partially supported by the Ministry of Science and Technology of Taiwan under contract number NSC102-2410-H-011-017-MY3. This support is much appreciated.
