Abstract
Selecting model between recognition rate of “large” class and recognition rate of “small” class in imbalanced data is often a serious trade-off. Most approaches emphasize the accuracy of “large” class. The drawback is that potentially informative “small” class may be overlooked and even make an overfitting model. In this paper, we propose an alternative approach based on fuzzy system for classification problems with imbalanced data, called receive feedback model (RFM). It works by starting with a maximal attribution ratio probability that includes all observations for each class, and then gradually reclassify “unlabeled” samples if they succeed in minimal risk evaluation of a certain class. To exploit the RFM of classification problems, we further introduce probably approximately correct of the model and the convergence of our procedure. Extensive experiments using public data sets and the results of statistical tests have shown that the proposed RFM significantly outperforms other approaches in term of the appropriate trade-off both recognition rates of “large” class and “small” class.
Introduction
Classifying objects into different categories is one of the key problems in pattern recognition and data mining, which has received much attention over the past decades [1–8]. For classification problems, some uncertain approaches such as the fuzzy set [9–11], rough set [13] and bayes [14–16] etc., are widely used. Kuncheva propose that fuzzy rule based classifiers show the higher classification accuracy and fully comprehensible in his paper on How Good Are Fuzzy If-Then Classifiers? [17]. For example, they can be built into based on tree classifiers or fuzzy rule based regression model. Fuzzy rule-based classifiers are a popular counterpart of fuzzy control systems, there are numerous studies discussing the practical design of such classifiers, among which are neuro-fuzzy models [18, 19] and fuzzy systems constructed using genetic algorithms [20] etc. The fuzzy rule also may be of interest in their own right, as it can characterise distinctions between models in a simple and interpretable way. Instead of traditional decision trees, in which only a feature is taken account at each node, based on trees fuzzy classifier presented decision tree involves a fuzzy rule which involves multiple features.
A typical tree-based fuzzy classifier such as FRDT [21] based on decision tree [22] works by building a decision tree based on fuzzy rules. But a precondition has been proposed to class distribution of relative balance data by decision tree. Although the FRDT approach exhibits the better performance in term of between accuracy and the size of the produced tree, its drawback is that the lower recognition rate for dealing with the imbalanced data and even make an overfitting model. Recently, there is a hot research topic, in which input space may be pervaded with imbalanced data [23, 24] and people are interested in classification tasks as a tool for the analysis of this type of objects in several fields. Undoubtedly, it is widely used in several aspects. For example, inspection data to check the patient’s disease [25], automatic text classification [26] and identify malicious call [27] etc.
In the paper, we consider classification problems with imbalanced data, our aim is to enhance recognition rate for the “small” class of imbalanced sample and can not make an overfitting model. Extensive experiments using public data sets and the results of statistical tests have shown that the proposed RFM significantly outperforms other approaches in term of the appropriate trade-off both recognition rates of “large” class and “small” class. For this purpose, a fuzzy rule (to calculate a attribute ratio probability) defined on the input space is computed from training samples. The maximum value of attribute ratio probabilities is used for predicting a label. And then gradually reclassify “unlabeled” samples if they succeed in minimal risk evaluation of a certain class. Our method is used for classification based on the principle of the “maximum attribute ratio and minimum risk evaluation”.
The paper looks at a novel method for imbalanced data, which called receive feedback model. Rather than learning a model through samples directly, our method works by looking for fuzzy rules including finite fuzzy numbers to parallel computing the attribute ratio for each class. In Section 2, we propose a version of receive feedback model and exploit the model of data classification. Further more, we introduce probably approximately correct of the model and the convergence of our procedure. Some numerical examples are given in Section 3. Finally, we conclude with a brief discussion and future work in Section 4.
Receive feedback model
An overall architecture of the RFM
In this section, we consider an overall architecture of the proposed approach for classification problems with imbalanced data. We suppose that the data can be written in the form (X i , Y i ) for observation i = 1, 2,. . . , n; X i = {xi1, xi2,. . . , x im } is the set of active predictors for observation i (out of a total of m predictors) and Y i is the class label. An important example of this type of problem is that of automatic text classification, where X i is a set of frequently appearing words for document i, and Y i indicates whether the document belongs to a certain class. In this case, although the size of a text can be of the order of several thousand or more, useful or interesting information accounts for only a small part and its size significantly less than the samples. More generally, if the samples with imbalance distribution of class are available, they can be divided into “large” class which includes the majority of patterns (data points), on the contrary, called “small” class.
Our aim here is to develop classification model for imbalanced data. More precisely, we are interested in discovering attribution ratio probability and risk evaluation without requiring any of hypothesis that balance distribution of class.
Let us summarize the notations to be used throughout the paper. Let X being an n × m matrix, x
ij
be the j-th feature value of the pattern X
i
. The size of this sample is n, corresponds to each row. The number of the feature is m, corresponds to each column. We have
A pure brute force search examines each fuzzy rule of a finite length to check whether it satisfies (3). Restricting the number of a fuzzy rule , in order to control appropriately a trade-off between recognition rates of “large” class and “small” class.
In this section, we present an alternative approach based on fuzzy system for classification problems with imbalanced data in Fig. 1. This type of data classification process involves three steps: it works by starting with a maximal attribution ratio probability that includes all samples for each class (at “Receive” stage); Gradually, reclassify “unlabeled” samples if they succeed in minimal risk evaluation of a certain class (at “Feedback” stage); Finally, hierarchical thinking is proposed at “Update” stage.
Above all, the model arises as a novel architecture of the fuzzy classifier based on imbalanced data. In Fig. 1, the root node “Data” denoted an interesting sample, we can classify the sample by attribute ratio probability based on fuzzy rules to form from “Class 1” node to “Class n” node for each class. The samples which are not classified (also called “unlabel” samples) by the attribute ratio probability of first layer are arranged in a “Risk Evaluation” node (including a mixture of data are from different classes), then “reclassify” to the “unlabel” samples. If the “Risk Evaluation” node is not empty, the growth of the RFM (the process of update in Fig. 1) is realized by expanding the “Risk Evaluation” node as illustrated in Fig. 2.
Let AR t be the attribute ratio of the t-th layer and is the p-th fuzzy number of the feature f j for the class C k . Let indicate that the probabilities are attribution ratio probability for the class C k . More formally, we describe the sample asfollow
According to the “Receive” stage, given samples will be classified into a class of interest. But some samples are not classified or assigned the several different classes. That is to say, the samples which are not classified (also called “unlabel” samples) by the attribute ratio probability of this layer are arranged in a “Risk Evaluation” node, then “reclassify” to the “unlabel” samples.
With regard to the Equation (7) is satisfied, the “unlabel” samples should be classified to a class of interest. In general, this process will be terminated until each pattern will be classified. Therefore, the two stage of “Receive” and “Feedback” are fundamental steps what we should do.
Update-stage
This method is proposed to modify the structural model (RFM) of a dynamic finite process. Updating stage is crucial for obtaining accurate models while the measured structural responses (receive feedback model) are available. Inevitably, it has a disadvantages that result to the size of the model is enlarged.
The algorithm of RFM
The algorithm of RFM
Our method searches for the “maximum attribute ratio and minimum risk evaluation” by looking at fuzzy rules of given classes. We start with the samples of each class as a block and then respectively parameters to corresponding fuzzy rules. Each process of finding a fuzzy rule, we just keep fuzzy rule that are present in a different with others. Then we repeat with the last process until all fuzzy rules will be founded. If a pattern X i (i = 1, 2,. . . , n) has high attribute ratio probability for a certain class in the first layer (t = 1) of the RFM, that is is the maximum. And then the “unlabel” sample X i has the lower risk evaluation for a certain class in the first layer of the model, it also will be reclassified in the given a class of interest with the minimum risk evaluation.
To describe the details of our algorithm, we redefine a parameter associated with the model will be needed later. Let RE(T) mean the T-th update of the model.
1 RE(0) ← X, T ← 0, i ← 1;
2
3
4
5 ▹ Or continue to perform the following procedure
6 transfer AREA algorithm [21]
7 parallel compute (k from 1 to c) corresponding
8 the Equation (4)
9 find the label of X i is empty or classified different classes
10 ▹ that is the “unlabel” samples
11 get risk evaluation on the basis of the Equation (7)
12
13
14 i ← i + 1, T ← T + 1 (means update above process)
15
Algorithm 1 describes a generalization version of receive feedback model procedure. This process is repeated until one of the following three conditions have been satisfied: The number of the “unlabel” samples is equal to a set of empty, that is all of the samples can be classified by Algorithm 1; Remain “unlabel” samples belong to the same class of interest; Remain each “unlabel” sample can not classified by attribute ratio of the same layer and these samples are discarded.
And we first analyze the time complexity of our algorithm. The maximum length of the AREA algorithm [21] which generated fuzzy rule for each class is MaxL, therefore, the time complexity of AREA algorithm is O (MaxL * c * n). RFM algorithm is a non-iterative process, the number of layers is n in the worst situation, what is each layer of the learned tree for the training patterns can classify only a single pattern. Thus, the total time complexity of the RFM method is O (MaxL * c * n * n).
The theoretical results of RFM
In supervised learning, we define 𝒳 as the input space and {1, 2,. . . , c} as a set of labels. Suppose that the training samples are randomly drawn from an underlying probability distribution P. We have a hypothesis that training data are independent identically distributed (i.i.d) sample from P: S n = {(X i , Y i )} ∈ {𝒳 × {1, 2,. . . , c}} n , X i ∈ 𝒳 ⊂ R n . It is to estimate a decision function such that the sign of f (X) provides an accurate prediction of the unknown labels associated with the input X i under the probability distribution P.
Our goal is to construct prediction function loss over D: E(X,Y)∼D (f (X) ≠ Y). Let represents a collection of all RFM learning algorithms. Let be learned prediction rule from training sample S n = (X i , Y i ) i=1,2,...,n. Then, the training error of a classification algorithm as follow
The average prediction performance of the decision function f is evaluate performance of a learning algorithm on test sample by loss over D, that is
To confirm our approach is how efficient a learning algorithm and to theoretically justify it, the minimize training error as follow
According to the above Equation (17) the training error of can be observed, we will discuss to give probably approximately correct by how to estimate test error.
≤1 - ɛ, or
For any random selected S n holds with probability . In order to estimate the value of ɛ, just to make
The only need to take
In this section, we give an example and several numerical public data sets in Table 1 and Fig. 4 to provide further insight into the performance of our method. The first is about an example to analysis a procedure that we should be step-by-step about our method.
Step 1. “Receive stage”: Due to the number of classes k = 2, there are two fuzzy numbers for each feature. Respectively, feature f1 corresponding f1,1 and f2,1, feature f2 corresponding f1,2 and f2,2, feature f3 corresponding f1,3 and f2,3. Then, we consider the total number of the parameters mk,j from the Equation (2).
In this step, the values of the parameters mk,j are taken as the mean values of all patterns falling within the class C k while considering the feature f j . From Equation (4), we can compute attribute ratio of each class. Finally, patterns will be classified by the maximum attribute ratio probability in first level.
Step 2. “Feedback stage”: According to the first stage, haberman data set will be classified. But some patterns are not classified or assigned the several different classes. That is to say, the ones which are not classified (“unlabel” samples) by the attribute ratio probability of this layer are arranged in a “Risk Evaluation” node. In step 2, we should “reclassify” to the “unlabel” samples. Finally, each pattern will be found a class of interest.
Specifically, we prepare to learning the winning combinations for some well-known classifiers and Fuzzy rule based on decision tree [21]. We test the performance of the algorithm on data sets being from the UCI Repository of machine learning. We can see the Tables 2, 3 and Figs. 5 and 6.
Tables 2 and 3 show the test errors of BFT, C4.5, LAD, SC, NBT, FRDT and the proposed learning method using the mean coming with a ten times 10- fold cross validation. The tables show that the RFM has an higher recognition ratio for imbalanced samples, that is, the case of red font. Overall, the proposed method is better than the other learning algorithm. Indeed, the difference between it and RFM is statistically significant. On the other hand, the balanced data does not significantly affect the experimental results. Hence, the revision of fuzzy rule based decision tree can improved the prediction accuracy of imbalanced-data-based learning.
In this experiment, we propose a comparative analysis while using above traditional classifiers. The result presented in Figs. 5 and 6 offer several insight into the performance of the algorithms, the best scores are shown in “red” dotted line. The misclassified testing data and standard deviation for the experiments on several different samples are given in Tables 2 and 3. The fuzzy classifier based on imbalanced data sets, that is RFM produces high accuracy and outperforms the other approaches in test data.
In Section 2, we proved the probably approximately correct of the model and the convergence of our procedure. The numerical experiments described in this section indicate that learning algorithm derived from revised fuzzy rule based decision tree are an alternative for solving classification problems involving imbalanced data. Finally, we will extend the current results for fuzzy dynamic models with distributed parameters framework [29, 30], and then for the underlying systems under the network-based environment with time-delays, packet dropouts, and/or quantization [31, 32] for this issue.
In the paper, we consider classification problems with imbalanced data, our aim is to enhance recognition rate for the “small” class of imbalanced sample and can not make an overfitting model. Extensive experiments using public data sets and the results of statistical tests have shown that the proposed algorithm significantly outperforms other approaches in term of the appropriate trade-off both recognition rates of “large” class and “small”class.
Conclusion
We studied an improved fuzzy classifier for imbalanced data in classification problems. As we all known, most approaches emphasize the accuracy of “large” class. The drawback is that potentially informative “small” class may be overlooked and even make an overfitting model. In the paper, we proposed a receive feedback model, which exists a superiority to enhance the recognition rate of imbalanced data based on the principle of the “maximum attribute ratio and minimum risk evaluation”. The algorithm is proposed to modify the structural model (RFM) of a dynamic finite process. Updating stage is crucial for obtaining accurate models while the measured structural responses (receive feedback model) are available. Inevitably, it has a disadvantages that result to the size of the model is enlarged. Numerical experiments showed that learning algorithm derived from revised fuzzy rule based decision tree are alternatives methods for solving classification problems involving imbalanced data. We believe developments along these lines (for imbalanced data) will prove to be fruitful directions for future research. Further, we also plan to generalise the idea to categorical and sparse predictor variables, such as computer vision area (image classification, text classification).
Footnotes
Acknowledgments
The author thank the editors and the anonymous reviewers for helpful comments and suggestions. The research was supported by the National Natural Science Foundation of China (Grant Nos. 61573266 and 11301408).
