Abstract
In this paper, we propose NPC c , a new Naïve Possibilistic Classifier for categorical data. The proposed classifier relies on the Bayesian structure of the Naïve Bayes Classifier for categorical data (NBC c ) which stands for an interesting pattern when dealing with discrete attributes. However, unlike NBC c , the proposed NPC c is based on the possibilistic formalism as an efficient fuzzy-sets-based alternative to the probabilistic one when handling uncertain data. Distinctively, we use the possibilistic approach to estimate beliefs from categorical data and a Generalized Minimum-based classification algorithm (G-Min) as a novel algorithm to make decision from possibilistic beliefs. Experimental evaluations on 12 datasets taken from University of California Irvine (UCI) and containing all categorical data, confirm the effectiveness of the proposed new G-Min-based NPC c . With the used datasets, the proposed classifier outperforms the commonly-used classifiers for categorical data including NBC c , C4.5-based decision tree and RIPPER-based classifier. Moreover, it outperforms the two versions of NPC c using commonly-used possibilistic classification algorithms which are based on respectively, the product and the minimum operators. Consequently, we prove the efficiency of the possibilistic approach together with the G-Min algorithm for the classification of categorical data.
Keywords
Introduction
Specifications of input data are among main criteria which are used to select the appropriate classification technique for a decision-making problem in real-world applications. Among these specifications, one can find the type of input data which may be either categorical, numerical or mixed. Categorical data, as the name implies, reflects data which are grouped into meaningful categories [1].
Bayesian-like classifiers, namely naïve Bayes and naïve possibilistic classifiers are simple classifiers that assume the independence of input attributes. Despite their strong assumption, Bayesian-like classifiers can often outperform more sophisticated classification models [2]. Moreover, two versions may be found for each Bayesian-like classifier: one for discrete attributes and another for numerical ones [3]. In particular, it has been claimed in [4] that the Bayesian-like classification structure is suitable to make decision from discrete variables.
In recent years, naïve possibilistic classifier begins to attract attention among Bayesian-like classification models. In fact, this classifier is based on the possibility theory which rests on fuzzy sets theory [5, 6]. Therefore, the possibilistic procedure used in this classifier reflects a strong fuzzy-sets-based alternative to the probabilistic approach when handling data pervaded with uncertainty [7–9]. Different versions of naïve possibilistic classifier which are based on the probability-possibility transformation rule of Dubois et al. [10] have been recently proposed. Indeed, two versions based on this transformation in the continuous case and dealing with numerical features, namely, naïve possibilistic for numerical data and flexible naïve possibilistic classifier for numerical data, have been suggested in [11] and [12]. Experimental results in [11] and [12] have shown that the flexible naïve possibilistic classifier outperforms all Bayesian-like classifiers for numerical data. Further, a version of naïve possibilistic classifier based on the transformation rule of Dubois et al. in the discrete case [10] and devoted to categorical data has been successfully used in [13] to make decision from the medical data of the UCI lymphography dataset.
This study is motivated on the one hand, by the proven efficiency of the Bayesian-like classification pattern when dealing with categorical data; on the other hand, by the recent emergence of new probability-to-possibility transform-based possibilistic classifiers which have shown good performance when dealing with uncertainty from either numerical data [11, 12] or categorical lymphatic data [13]. In this context, we propose NPC c , a Naïve Possibilistic Classifier for categorical data which combines the naïve Bayes structure with the possibilistic classification. In particular, this classifier relies on the probability-possibility transformation rule of Dubois et al. [10] to estimate beliefs. Then, it makes use of a novel G-Min-based classification algorithm for decision-making from possibilistic estimates.
The rest of the paper is structured as follows. Related works are discussed in Section 2. Then, in Section 3, we restate the theoretical background of the possibilistic classification. Next, Section 4 introduces the proposed G-Min-based NPC c . The experimentation results are reported in Section 5. Finally, Section 6 concludes and proposes some directions for future research.
Related work
Three main approaches may be found in the literature with regard to the use of possibilistic data representation in classification methods. These approaches include decision trees, case-based and Bayesian-like approaches. If the two first approaches deal with discrete attributes, the third one is appropriate for both categorical and numerical data. In the following, we start with decision trees and case-based approaches. Afterward, we discuss Bayesian-like possibilistic classifiers in more detail.
For possibilistic classification based on decision trees, we can mention the work of Denoeux and Zouhal [14]. In this work, the authors model uncertain labels in the training set using the possibility theory. To perform this, the authors allocate a possibility belief to each possible class label in order to express possibility that the given instance belongs to this class. Moreover, Ben Amor et al. [15] have proposed a qualitative method based on decision trees in order to classify instances having uncertain attribute values. To do this, the authors use possibility distributions given by an expert in order to represent uncertainty on attribute values. In [16], the authors developed three approaches for possibilistic decision trees in order to deal with certain categorical attributes and vague classes. The first one, based on possibility distributions at each step of the tree construction, define an attribute selection measure by means of a measure of non-specificity in possibility theory. The two remaining approaches support data uncertainty by extending the C4.5 algorithm through the use of the notion of similarity between possibility distributions.
For case-based classification techniques which make use of possibility theory, we can particularly focus on the possibilistic instance-based learning approach which has been proposed in [17]. In this work, the author establishes a possibilistic version of the classical instance-based learning model using similarity measures. Indeed, the method rests on a general possibilistic extrapolation principle which classifies a new instance which is similar to a known training example to the same class of the example. In [17], the same idea is refined by evaluating the plausibility that a new instance belongs to the same class through an interval whose lower bound stands for the guaranteed possibility of the class, and the upper bound the extent to which this class is not impossible. Later, in the work of [18], the authors propose a bipolar possibilistic approach for case-based learning and prediction. In a more recent work [19], the authors use the similarity basis to define the plausibility of a particular attribute value for a given class, and then apply a Bayesian-like pattern for classification.
As mentioned earlier, proposed Bayesian-like possibilistic classifiers are either suitable for categorical or numerical data. A first version of Bayesian-like possibilistic classifier for categorical data was proposed in [20] in order to deal with imprecise training data. However, no efficient method was proposed in this work for the elicitation of possibilistic distributions. Indeed, it was stated in [20] that the procedure proposed to estimate possibility distributions which is based on the computation of the maximum-based projection [21], may construct pathological cases [20]. Later, Haouari et al. [22] have proposed a naïve possibilistic network classifier which models imperfect attribute values when no prior knowledge is available. In this particular context, possibility distributions are built using the partial ignorance which is expressed by an expert. In [23], authors propose an efficient algorithm which makes use of the Jeffrey’s rule in order to reconsider possibilistic knowledge included by a naïve product-based possibilistic network classifier on the basis of uncertain inputs. The proposed algorithm presents the main advantage of being able to process the classification task in polynomial time with respect to the number of features. In [24], the authors propose a new Bayesian classifier for uncertain categorical or continuous attributes which integrates uncertainty in the Bayesian theorem and makes use of a new parameter estimation method. Moreover, in [25], authors developed a classification algorithm which can generate rules from uncertain continuous data. For the two works [25] and [24], authors use intervals in order to model uncertainty over continuous attribute values. Recently, in [11], the authors propose two kinds of possibilistic classifiers for numerical data. In the first one, authors extend the classical and flexible Bayesian classifiers by applying a probability-possibility transformation to Gaussian distributions. In the second one, authors directly express data in possibilistic formats using the idea of proximity between data values. Experiments in [11] have shown the good performance of the flexible possibilistic classifier when dealing with numerical data. In a more recent work [12], the authors extend the probability-to-possibility transform based classifiers which were proposed in [11] in order to deal with uncertainty and imprecision in data modeling. Experimental results reported in [12] have confirmed the robust behavior of the probability-to-possibility transform-based classifiers when they treat imperfect data.
Possibilistic classification: Theoretical background
In order to recall some basics of the possibilistic framework, the following notations are considered in this section as well as in the remainder of thispaper:
C = {c1, c2, . . . , c j , . . . , c C }: an exhaustive and exclusive universe of discourse of classes.
A = {a1, a2, . . . , a i , . . . , a A }: a set of attributes.
V i = {vi1, vi2, . . . , v ik , . . . , v iz }: the set of possible categorical values of a given attribute a i .
Possibility theory
Possibility theory, introduced by Zadeh [5] and then developed by Dubois and Prade [6] is a fusion theory based on fuzzy sets theory and devoted to represent and combine imperfect information in a qualitative or quantitative way. Information imperfections treated by possibility theory may represent the uncertainty due to variability of observations, the uncertainty due to incomplete information, the information ambiguity, the information imprecision, etc. [8].
At the semantic level, the basic function in possibility theory is a possibility distribution denoted as π which assigns to each possible class c j from C a value in [0, 1]. The possibility value assigned to a class c j stands for plausibility i.e. the belief degree that this class is the right one.
By convention, π (c j ) =1 means that c j is totally possible and if π (c j ) =0, c j is considered as impossible. Intermediary values distinguish values which are more possible than others. Finally, note that in the normalization version of a possibility distribution, we require that at least one class of C is totally possible.
From the possibility distribution, two dual measures, namely possibility Π and necessity N, may be assigned to each subset E in the power set 2 C as follows:
Between these two measures there is a duality relation defined as:
Π (E) =1 corresponds to the situation where at least one singleton c
j
∈ E has a possibility degree equal to 1 and it is likely to be the right class but without being necessary. N (E) =1 corresponds to the situation where
On the other hand, as being based on fuzzy sets theory, possibility theory may recall all the panoply of fuzzy combination rules in order to fuse possibility estimates of a given class.
Possibilistic conditioning corresponds to revising an initial possibility distribution when a new information becomes available. In possibility theory, conditioning may be defined through a counterpart of the Bayes rule. Indeed, it can be stated for two subsets E and F in C by:
Similarly to Bayesian classification (see the “Appendix” for a reminder), possibilistic classification is based on the aforementioned possibilistic version of the Bayes rule and can be defined by:
By assuming that there is no a priori knowledge about classes and the input vector to be classified, we can take π (c j ) =1 and π (a1 = v1k, . . . , a A = v Ak ) =1. Moreover, as in naïve Bayesian combination, naïve possibilistic classification assumes that attributes {a i } (∀ 1 ≤ i ≤ A) are independent. In this case, the conditional joint possibility π (a1 = v1k, . . . , a A = v Ak |c j ) is equal to the fusion of conditional possibility estimations stemming from each single attribute a i . Therefore, in a context characterized by no a priori knowledge about classes and input vector as well as independent attributes, equation 4 becomes [20]:
As stated for the conditioning rule, * may be taken as either the product or the minimum [27, 28].
In practice, given a new instantiation {a1 = v1k, . . . , a
A
= v
Ak
}, we must establish a matrix Π of possibilistic estimates in order to perform the product-based or the minimum-based classification. This matrix is defined as follows:
Lastly, the final decision stands for the class c∗ for which Equation 5 yields the highest degree of possibility:
In this section, we first explain how to establish a possibility distribution from categorical data. Later, we present a novel algorithm for possibilistic classification which improves the classical minimum-based classification method by finding a more reliable final decision.
Elicitation of the possibility distribution
In order to construct possibility distributions from categorical data, we start by computing the conditional probability measures {p (a
i
= v
ik
|c
j
)} (∀ 1 ≤ k ≤ z) for each attribute a
i
with respect to each class c
j
using the maximum likelihood estimation (see the “Appendix”). Later, we use the probability-possibility transformation of Dubois et al. in the discrete case [10] in order to express possibilistic estimates. This transformation is definedby [10]:
Two reasons are likely to make the transformation of Dubois et al. highly interesting. First, it is based on a view of possibilities as upper bounds of probabilities [10] and hence it permits to converge from the classical frequency based modeling of uncertain events using probability to the more realistic representation of uncertainty as plausibility [29] using possibility. Second, being based on fuzzy sets theory, possibility theory offers more flexibility for decision-making when compared to probability theory and its “strict” Bayes rule.
As mentioned in Section 3, the minimum-based classification algorithm allocates the final decision to the class which satisfies:
In many cases, the final class issued from the minimum-based possibilistic classification may have very close possibility estimate to other alternatives. In such situation, the final class may not be the right one and therefore the quality of decision is likely to be seriously altered. In this context, we propose the G-Min algorithm in order to avoid the ambiguity between the final decision and the rest of hypotheses and hence to find a decision with a possibility estimate widely away from other alternatives.
The G-Min algorithm requires the matrix Π of possibilistic estimates and it is based on two main steps. The first aims to establish a set of possible decisions whereas the second aims to filter this set in order to find a reliable final class.
In the first step, columns of the matrix of possibilistic estimations Π are sorted increasingly and links with attributes are omitted. Afterward, rows of the resulting matrix are sorted decreasingly and the argument (the class) related to each value is kept. Figure 1 illustrates a numerical example for the application of these two sorting operations on a 3 x 3 matrix Π of possibilistic estimations. Later, a decision set dset is established using the first row of the resulting matrix Π after the two sorting steps. In the case where π (1, 1) is far from all other possibility beliefs with a value higher or equal to α. (α is a threshold close to zero), the decision set is a singleton formed of the argument of π (1, 1). For this case, the argument of π (1, 1) is taken as the final decision and the algorithm falls in the classical minimum-based possibilistic classification method. However, in the case where some of possibilistic estimates of the first row are close to π (1, 1) with a value lower than α, arguments of these values are taken as elements of the decision set dset and we have to carry on with the second step.

A numerical example of the two sorting operations required by the G-Min algorithm.
In the second step, we treat the next row of Π by picking possibility values of respectively each of the selected classes on dset. Based on the new possibility values, we remove each class of the decision set dset which have not a possibility value within the threshold α with respect to the maximum of the updated possibility values. Each time we find dset different from a singleton, we deal with the next row of Π in order to try to further filter the decision set. We repeat the second step until the decision set dset is a singleton or we are in the last row of the matrix Π. The final class is then given by the decision having the maximum value of possibility in the decision set dset.
More details about the proposed algorithm may be found in Algorithm 1.
Π ← sort (Π, ′rows′, ′increase′)
Π ← sort (Π, ′columns′, ′decrease′)
t ← 1
dset [t] ← arg (π (1, 1))
c∗ ← arg (π (1, 1))
t ← t + 1
dset [t] ← arg (π (1, j))
card ← cardinality (dset)
i ← 2
poss [t] ← arg-1 (dset [t] , i)
max ← maximum (poss)
dset ← dset . remove (arg (poss [t])
c∗ ← max
card ← cardinality (dset)
i ← i + 1
In order to demonstrate the effectiveness of the proposed G-Min-based NPC c , we present comparison results between this classifier and commonly-used classifiers for categorical data. Compared classifiers include NBC c (see the “Appendix”), C4.5-based decision tree [30] and RIPPER-based classifier [31] (for more details about adequacy of these classifiers to handle categorical data, please refer to [4]). Moreover, comparison is carried out with the two versions of NPC c which are based on the commonly-used classification algorithms using respectively, the product and the minimum as combination operators.
Experiments are conducted on 12 datasets from UCI machine learning repository [32] containing all categorical data. Characteristics of these datasets are illustrated in Table 1. As shown in Table 1, the number of cases ranges from 90 to 8124, the number of attributes from 4 to 36 and the number of classes from 2 to 22. Moreover, some datasets contain missing values and others no. Therefore, a wide variety of problems is represented.
Distribution of cloud droplet contribution range
Distribution of cloud droplet contribution range
During experiments, we have carried a 10-cross validation and as in [12] and [11], we have used the standard Percent of Correct Classification (PCC) to assess performances of different Bayesian-like classifiers for categorical data. PCC is defined as:
Moreover, we have taken α = 0.1 in order to apply the G-Min classification algorithm.
Experimental results obtained with various classifiers for categorical data are shown in Table 2.
Experimental results obtained with various classifiers for categorical data
From Table 2, we can see that: The proposed classifier is the best in terms of average rank among all the commonly-used classifiers for categorical data. The proposed classifier is the best-ranked on six among the twelve benchmarks. The proposed classifier outperforms the NBC
c
on ten among the twelve datasets. The proposed classifier outperforms the C4.5-based decision tree on ten among the twelve benchmarks. The proposed classifier outperforms the RIPPER-based classifier on ten among the twelve datasets. The proposed classifier is better than the product-based NPC
c
on eleven among the twelve datasets. The proposed classifier is more accurate than the minimum-based NPC
c
on eleven among the twelve datasets. In particular, these comparison results without and with the use of the G-Min classification algorithm demonstrate the efficiency of the proposed novel algorithm with respect to the classical minimum method. The product-based NPC
c
and the minimum-based NPC
c
are the worst in terms of average rank among all the classifiers for categorical data. These results reveal that the efficiency of the proposed G-Min-based NPC
c
is not due to the only use of the possibilitic estimation approach but rather to the simultaneous use of this approach with the G-Min algorithm.
On the other side, for a deeper comparison between the proposed classifier and the rest of Bayesian-like classifiers in terms of PCC, we have used the Wilcoxon Matched-Pairs Signed-Ranks Test as detailed in [33]. It is a non-parametric test which is devoted to compare two classifiers over multiple data sets. Comparison results based on this test are given in Table 3 and they show that the proposed G-Min-based NPC c always outperforms the five compared classifiers (p - value < 0.05).
Results for the Wilcoxon Matched-Pairs Signed-Ranks Test
The appeal of the proposed work is related to the need to involve categorical data in real-world applications and hence the necessity to find techniques which perform well when they cope with this type of data.
In this context, we have proposed the G-Min-based NPC c as a proposed technique to deal with uncertainty stemming from categorical data. Based on experimentation and comparison results with each one of commonly-used classifiers for categorical data, we have proven the efficiency of different choices which are made in the proposed classifier namely, the possibilistic estimation method and the G-Min possibilistic classification algorithm. This can be explained by the fact that the possibilistic approach in both estimation and classification steps rests on fuzzy logic which is a strong framework to deal with decision-making under uncertainty in real world settings [34].
On the other side, the good behavior of the proposed G-Min-based NPC c may be promising in the sense that this classifier can be joined to one of the recently proposed possibilistic classifiers for numerical data in order to treat uncertainty from mixed categorical and numerical data.
Footnotes
Appendix: Naïve Bayesian classifier for categorical data (NBC c )
Given C = {c1, c2, . . . , c j , . . . , c C } an exhaustive and exclusive universe of discourse of classes, A = {a1, a2, . . . , a i , . . . , a A } a set of attributes and V i = {vi1, vi2, . . . , v ik , . . . , v iz } the set of possible values of each attribute a i (∀ 1 ≤ i ≤ A), naïve Bayes classifier rests on the Bayes rule in order to compute the posterior probability for each class c j in the presence of a new instantiation {a1 = v1k, . . . , a A = v Ak }. Bayes rule is defined by:
The term P (a1 = v1k, . . . , a A = v Ak ) is a normalization factor which can be ignored. Moreover, in naïve Bayesian classification settings, attributes are assumed to be independent so that the Bayes rule becomes:
When dealing with categorical data, naïve Bayes classifier computes probability distributions from training data using the maximum likelihood estimation. Indeed, in the Naïve Bayesian Classifier for categorical data (NBC
c
), the probability of each discrete value v
ik
of a given attribute a
i
is defined by:
Lastly, the final decision stands for the class c∗ with the highest posterior probability:
Acknowledgments
The authors would like to acknowledge the financial support of this work by grants from General Direction of Scientific Research (DGRST), Tunisia, under the ARUB program.
