Evidential classification of incomplete instance based on K-nearest centroid neighbor

Abstract

Classification of incomplete instance is a challenging problem due to the missing features generally cause uncertainty in the classification result. A new evidential classification method of incomplete instance based on adaptive imputation thanks to the framework of evidence theory. Specifically, the missing values of different incomplete instances in test set are adaptively estimated based on Shannon entropy and K-nearest centroid neighbors (KNCNs) technology. The single or multiple edited instances (with estimations) then are classified by the chosen classifier to get single or multiple classification results for the instances with different discounting (weighting) factors, and a new adaptive global fusion method finally is proposed to unify the different discounted results. The proposed method can well capture the imprecision degree of classification by submitting the instances that are difficult to be classified into a specific class to associate the meta-class and effectively reduce the classification error rates. The effectiveness and robustness of the proposed method has been tested through four experiments with artificial and real datasets.

Keywords

Incomplete instance evidence theory classification missing data uncertainty

1 Introduction

Classification is one of the most important tasks in statistics machine learning communities [1, 2]. Many algorithms have emerged, including decision trees, support vector machines, K-nearest neighbors and neural networks, to deal with classification problems. Whereas most of them require complete data and cannot directly and effectively analyze missing data. Unfortunately, incomplete instance, also called missing data, which is a common phenomenon in many real-world domains [3]. The causes of missing values range from human errors to equipment failures. For example, the data collected from the questionnaire is often incomplete because the respondents will reject some privacy issues. Because of the confidentiality of medical data, not all clinical trials of patients can be obtained [4]. As such, a number of methods [5, 6], based on three missing mechanisms: missing completely at random (MCAR), missing at random (MAR), and not missing at random (NMAR), have been developed to address classification problems with missing data. Majority of the existing methods for treating missing data depend on the assumption of MCAR, i.e., missingness does not rely on observed instances, which is also the focus of our work.

In a general way, incomplete instance treatment methods can be divided into two types: discarding and imputation methods. Discarding incomplete instances is a very simple and convenient technique, but it is inefficient especially for the cardinality of incomplete instances is high.

Imputation strategy, in many scenarios, has an attractive property of solving incomplete instance classification [7 –15]. For example, in mean imputation (MI) [7], the missing values can be substituted by the average values of the complete features in the same dimension. However, the main limitation of MI is that replacing missing values with the average values may be imprecise, which causes a distorted estimate of the distribution function. K-nearest neighbor imputation (KNNI) [8, 9] is arguably the most popular method for solving missing values largely due to its ease of use and effectiveness, which uses K-nearest neighbors (KNNs) of instances to estimate missing values. Fuzzy c-means imputation (FCMI) [10], as an imputation method based on machine learning, exploits the clustering centers generated by iteration and the distance between the centers and the instance to impute the missing values. In [11], an interesting estimation method, called support vector regression imputation (SVRI), is proposed to estimate the missing values by minimizing the observed training error. In [12], an imputation method based on self-organizing mapping (SOM) is presented and successfully applied to the prediction of physicochemical parameters of water patterns. Recently, a new method, called linear local approximation (LLA) [13], uses the optimal weights of KNNs obtained by local linear reconstruction to estimate the missing values. In extreme learning machine imputation (ELMI) [14], the missing value is filled by the output of ELM network, which can also handle the problem of missing values. Although the above imputation methods are effective in solving the problem of classifying incomplete patterns in some way, they are all single imputation strategies. In some specific scenarios, however, estimating only one value is unreliable and does not take into account the uncertain caused by missing values. Interestingly, some recent research works have been dedicated to multiple imputation of missing values [16 –18]. Multiple imputation is generally better than single imputation, whereas multiple imputation is often more computationally expensive because it takes time to estimate a set of values for each missing value.

To address these challenges, in this paper, we propose a new adaptive imputation method for incomplete instance evidential classification, which will minimize the negative impact of imputation strategy and adopt a more cautious strategy to rationally characterize the uncertainty and imprecision caused by imputation strategy thanks to the evidence theory.

Evidence theory [19] has gained widespread attention as a powerful tool for modeling uncertain information, which has been successfully applied to data classification [20 –23], data clustering [24 –27], fault diagnosis [28, 29] and information fusion [30 –32]. In evidence theory, the instance may belong to multiple classes with different degrees of belief, called meta-classes, which can well represent the imprecision of classification for uncertain instance. On the basis of evidence theory, some classification of missing data methods have been developed [22, 23]. For example, a new incomplete pattern belief classification (PBC) [22] method, where the incomplete pattern is directly submitted into a singleton class or estimated by KNNs, and then, a possibility distance method to partition the uncertain incomplete pattern under the framework of evidence theory. More recently, a new transfer classification of missing data method, named credal transfer learning (CTL) is proposed in [23] to model the uncertainty of estimations for missing values with multi-mapping and the imprecision of classification results. Whereas these methods use K-nearest neighbor to estimate missing values, but their performance will be affected in the case of fewer training set. The nearest centroid neighbor (NCN) [33] is an alternative and effective method that can replace the nearest neighbor method, and its performance is better than KNN.

Based on the above discussions, we attempt to apply K-nearest centroid neighbor into the estimation of missing data and characterize the uncertainty caused by missing values in the framework of evidence theory. Specifically, we first propose K-nearest centroid neighbor imputation (KNCNI) to adaptively fill the instance with missing values, which not only considers the proximity of KNNs, but also considers their geometric distributions. For a query (incomplete) instance, a more rigorous imputation strategy based on Shannon entropy [34] and K-nearest centroid neighbors (KNCNs) to estimate the missing value adaptively, which both consider the weights of different features and the similarity between instances as a whole. After that, we select a standard classifier (such as SVM [35], K-NN [36], EK-NN [37]) that handles the complete instances and use the labeled instances in the training set to classify the each instance with estimation. Finally, different discount (weighting) factors of multiple classification results can be obtained depending on the distance between corresponding KNNs and the query instance, and then a adaptive fusion method is designed to obtain the final decision. By doing this, the instance that is difficult to be correctly classified into a specific class will be automatically submitted to the reasonable meta-class, which can effectively reduce the error rate and reveal the imprecision of classification.

The rest of this paper is organized as follows. The basic information of evidence theory and K-nearest centroid neighbor is shortly reviewed in Section II. The proposed method is introduced in the Section III. The performance of the proposed method is then tested and compared with several other classical methods in Section IV. The conclusion of this paper is given in Section V.

2 Background

2.1 Basic of evidence theory

Evidence theory (also referred as belief functions theory or D-S theory) [19] is originally introduced by Dempster and developed by Shafer, which is considered as an extension of Bayesian probability theory [38] and has permeated into a wide range of fileds [20 –32]. In evidence theory, the frame of discernment Ω = {ω₁, . . . , ω_c} is extended to power set 2^Ω, which contains all the possible subsets of Ω. For example, for a frame of discernment Ω = {ω₁, ω₂}, the corresponding power set is 2^Ω = {∅ , ω₁, ω₂, ω₁ ∪ ω₂}.

1) Mass function

The basic belief assignment (BBA) on the framework of discernment Ω is a function m (.) from 2^Ω to [0, 1], and satisfying: ${\begin{matrix} \sum_{A \in 2^{Ω}} m (A) = 1 \\ m (\emptyset) = 0 \end{matrix}$ (1) All the elements A ∈ 2^Ω such that m (A) >0 are called the focal elements of m (.).

2) Dempster-Shafer rule of combination

The DS combination of two distinct sources of evidence characterized by the BBA’s m₁ (.) and m₂ (.) over 2^Ω is denoted m = m₁ ⊕ m₂, and it is mathematically defined by m_DS (∅) =0 and for A ≠ ∅ , B, C ∈ 2^Θ by: $m_{DS} (A) = [m_{1} \oplus m_{2}] (A) = \frac{\sum_{B \cap C = A} m_{1} (B) m_{2} (C)}{1 - K}$ (2) with $K = \sum_{B \cap C = \emptyset} m_{1} (B) m_{2} (C)$ (3) where $K$ is the conflict coefficient between m₁ (.) and m₂ (.). $K$ is a negative parameter associated to the degree of conflict. However, DS rule may produce unreasonable results under the high conflicting cases. Many alternative rules of combination are proposed to overcome the limitations of the DS rule, such as the Yager’s rule [39] and Dubois-Prade (DP) rule [40] are found. Particularly, DP rule, which inspires the fusion method introduced in this paper, is briefly reviewed here. The DP rule of the BBA’s m₁ (.) and m₂ (.) over 2^Ω is defined as: $m_{DP} (A) = \sum_{B \cap C = A} m_{1} (B) m_{2} (C) + \sum_{B \cup C = A} m_{1} (B) m_{2} (C)$ (4) In DP rule, one can find that the partial conflicting beliefs are all submitted to the meta-class.

2.2 Overview of K-nearest centroid neighbor

1) Nearest Centroid Neighbor (NCN)

As the alternative of nearest neighbor (NN), NCN [33] displays very good performance. The basic idea of NCN is to find the nearest neighbor by the centroid. For a query instance, thereby, NCN not only requires the centroid neighbors are as close as possible to the instance in distance, but also are distributed as evenly as possible around the instance. For a set of instances X = {x₁, x₂, . . . , x_n}, the centroid $\bar{x}$ is first calculated as follows: $\bar{x} = \frac{1}{n} (x_{1} + x_{2} + . . . + x_{n})$ (5)

The nearest centroid neighbor of the query instance x in X can be obtained by the following steps:

a) Calculate the distance between the query instance x and other ones in X.

b) Obtain the nearest centroid neighbor of x (i.e., its nearest neighbor), denoted as $x_{1}^{NCN}$ .

c) Calculate the i-th nearest centroid neighbor $x_{i}^{NCN}$ to make it as close as possible to the instance x to all previous nearest centroid neighbors.

2) K-Nearest Centroid Neighbor (KNCN)

As an extension of NCN, KNCN [33] is introduced into pattern classification. Different from KNN, KNCN considers the neighbors proximity and geographical placement to classify the query instance. It is mainly classified by the following steps:

a) Find K-nearest centroid neighbors of the query instance x in X, denoted as $X^{NCN} = {x_{1}^{NCN}, x_{2}^{NCN}, . . ., x_{K}^{NCN}}$ .

b) Assign the instance x to the class with a majority of votes from its K-nearest centroid neighbours in the set X^NCN.

3 Evidential classification of incomplete instance

The proposed method is developed for classification of incomplete instance based on evidence theory, which can properly address the classification that test set contains missing values 1 . For each incomplete instance, KNNs are found in the training set depending on the observed features, and the missing values are estimated respectively by a more rigorous strategy based on KNCNs and Shannon entropy, which fully consider the weights of different features and the similarity between instances. Then, these incomplete instances with estimations will be classified by the chosen classifier to obtain one or multiple classification results. Finally, the classification results with different weights (reliability) are submitted to an adaptive fusion system to obtain the final decision-making. For the instance that is difficult to be correctly classified into a specific class, it will be submitted to a corresponding meta-class in order to reduce the error rate. The classification of the uncertain instance in meta-class can be eventually identified using some other (costly) techniques or with extra information sources. So the proposed method avoids us to take erroneous fatal decision by carefully partitioning the classification result.

3.1 Adaptive imputation of incomplete instance

Let us consider a dataset with p-dimensional features including the training set Y = {y₁, y₂, . . . , y_m} and the test set X = {x₁, x₂, . . . , x_n} in the class editing framework Ω = {ω₁, ω₂, . . . , ω_c}, and incomplete instances with missing values in X still contains some observed features assumed to be true and reliable. For a query (incomplete) instance x_i, we adopt imputation strategy to fill missing values in test set based on KNCNs and Shannon entropy, which considers not only the difference of importance between different features of one instance but also the difference of similarity between different instances. Specifically, Shannon entropy is used to obtain the weights of different features in each instance, the Shannon entropy can be denoted as follows: $E_{s} = - \sum_{j = 1}^{m} p_{js} \log_{b} (p_{js})$ (6) where m is the number of instances in the training set Y, b = 2 is the default, or the natural constant e. p_js is the proportion of the j-th instance in the s-th feature.

So, one can get different Shannon entropy of different features by Eq. (6), and Shannon entropy is a measure of uncertainty, so a small value of entropy means that the uncertainty is small and the corresponding amount of information is large. The weight of feature for test set can be obtained by Shannon entropy, denoted as follows: $λ_{s} = \frac{1 - H_{s}}{n - \sum_{s = 1}^{p} H_{s}}$ (7) where p denotes the number of feature.

After obtaining different weights of different features of instances, one can calculate the distance between the observed features of x_i and the features corresponding to the training instance y_j in Y, as follows: $| | x_{i} - y_{j} | | = \sqrt{\sum_{\exists s, s = 1}^{t} {\tilde{λ}}_{s} (x_{is} - y_{js})^{2}}$ (8) where x_is and y_js denote the s-th attribute values of the instance, respectively. t is the number of dimensions of observed features in x_i.

Thus, one can obtain the KNNs of incomplete instance x_i in Y, where the KNNs maybe contain singleton class or multiple classes. In the process of imputation, we adopt K-nearest centroid neighbor adaptive estimation for incomplete instance x_i, i.e., if KNNs come from p (1 < p ≤ c) classes, p versions of incomplete instances are estimated by using the KNCNs of these p classes respectively; otherwise the KNCNs in the same class are used to estimate one incomplete instance. Noted that the distance from the incomplete instance x_i to the centroid neighbors is usually different, so it is necessary to use some discounting techniques to weigh the influence of the KNCNs when estimating missing values. In general, a smaller distance from the incomplete instance to the centroid neighbor corresponds to a more reliable estimation.

The aforementioned K-nearest centroid neighbor imputation (KNCNI) method can be performed in four steps:

a) The KNCN set is found in each class according to the observed features, denoted as $Y^{NCN} = {y_{1}^{NCN}, y_{2}^{NCN}, . . ., y_{K}^{NCN}}$ .

b) Calculate the distance between incomplete instance x_i and each centroid neighbor $y_{k}^{NCN} (k = 1, . . ., K)$ by the Euclidean distance $| | x_{i} - y_{k}^{NCN} | |$ according to Eqs. (6)-(8).

c) Calculate the weight factor $α_{i}^{k}$ of each centroid neighbor $y_{k}^{NCN}$ as follows: $α_{i}^{k} = \frac{\overset{- | | x_{i} - y_{k}^{NCN} | |}{e}}{\sum_{k = 1}^{K} \overset{- d | | x_{i} - y_{k}^{NCN} | |}{e}}$ (9)

d) Estimate the missing values ${\tilde{x}}_{io}$ of incomplete instance x_i in the same dimension according to $y_{k}^{NCN}$ as follows: ${\tilde{x}}_{io} = \sum_{k = 1}^{K} α_{i}^{k} \cdot y_{ko}^{NCN}$ (10)

Analysis of KNCNI and KNNI : Both the KNCNI and KNNI method use the NN mode. KNNI estimates the missing values directly by using the neighbors with different weights after obtaining KNNs in each class depending on the observed features. Whereas KNCNI estimates the missing values by using the centroid neighbors with different weights after each class obtains KNCNs. Compared with KNNI, KNCNI employs more hidden information in the neighborhood geometric distribution to estimate the missing values. Being an extension of the KNNI, note that, the new KNCNI is similar to it. When K is set to 1, the results of KNNI and KNCNI methods are the same, because we only take one of the nearest in each class. This means that the nearest neighbor is the nearest centroid neighbor, so the the two methods will be obtain the same estimations.

Example 1 : As a simple example to explain the process of adaptive estimation via using KNCNI. Let us consider that an incomplete instances x₁ = {x₁₁, ? , ? , x₁₄} in framework Ω = {ω₁, ω₂, ω₃}, where ? denotes the unobserved features. Suppose that the three nearest neighbors of x₁ are all from the classes ω₁ and ω₂ respectively, as follows: $y_{1}^{1} = [y_{11}, y_{12}, y_{13}, y_{14}]$

$y_{2}^{2} = [y_{21}, y_{22}, y_{23}, y_{24}]$

$y_{3}^{1} = [y_{31}, y_{32}, y_{33}, y_{34}]$

where $y_{j}^{g}$ is the j-th instance of the class ω_g.

Therefore, we find KNCN set in the first and second classes depending on the corresponding observed features, and then, estimate the missing value of x₁ according to Eqs. (6)-(10). We obtain: $x_{1}^{1} = [x_{11}, x_{12}^{1}, x_{13}^{1}, x_{14}]$

$x_{1}^{2} = [x_{11}, x_{12}^{2}, x_{13}^{2}, x_{14}]$

By doing this, one can get estimations for each incomplete instance in test set X, and complete estimated instances will be classified by standard classifier and submitted to the fusion method to determine the final class in the next part.

3.2 Classification and fusion based on evidence theory

After obtaining each instance with estimation from Part 3.1, the instance in the test set X can obtain single or multiple classification results with different weights (reliability) using the basic classifiers, and the single or multiple pieces of classification results for $x_{i}^{g} \in X$ are given by: $P_{i}^{g} = Φ (x_{i}^{g} | X), g = 1, . . ., c .$ (11) where Φ (.) denotes the standard classifier (such as K-NN, SVM, Adaboost, EK-NN, etc.) used to classify the (incomplete) instance $x_{i}^{g}$ in the test set, and $P_{i}^{g}$ denotes the output classification results of the $x_{i}^{g}$ . Here, the classification result is directly used as the final output if the instance has only single estimated version. Whereas the instance has multiple estimated versions. As mentioned earlier, the distances between the instance and the KNCNs are usually different, so different classification results are considered to have different reliability. Some discounting techniques should adopted before fusion, and a rational method is adopted here to define the discounting factor, expressed as follows: $β_{i}^{g} = \frac{ρ_{i}^{g}}{ρ_{i}^{\max}}$ (12) with $ρ_{i}^{g} = e^{- | | x_{i}^{g}, y_{g}^{NCN} | |}$ (13) where $ρ_{i}^{\max} = max {ρ_{i}^{1}, . . ., ρ_{i}^{c}}$ .

Then, the well known discounted rule introduced by Shafer in [19] is employed here, more precisely, discounted masses of belief are obtained as follows: ${\begin{matrix} m_{i}^{g} (A) = β_{i}^{g} P_{i}^{g} (A), A \neq Ω \\ m_{i}^{g} (Ω) = 1 - \sum_{A \neq Ω} m_{i}^{g} (A) \end{matrix}$ (14) where $m_{i}^{g} (\cdot)$ denotes the basic belief assignments (BBAs) of different classes (focal elements) after discounting the classification results of estimated instance $x_{i}^{g}$ by the discounting factor $β_{i}^{g}$ . By doing this, one can obtain multiple mass functions ( $m_{i}^{g} (\cdot)$ ) for the instance x_i in test set X, and they will be fused by the method we designed to get the final class information of x_i.

As we all know, DS rule has better convergence in the low conflict sources of evidence, which is helpful for decision makers to make correct judgment in time. In the high conflict sources of evidence, however, if the DS rule is directly used to fuse these mass functions, all the conflicting beliefs are proportionally redistributed back to the focal elements, so that most of the instances that are difficult to accurately classify will be misclassified. As the derivative of DS rule, Dubois-Prade (DP) rule can avoid unreasonable phenomena by increasing the uncertainty of fusion results when the high conflict sources of evidence. Whereas all conflict beliefs will be simply preserved and assigned to the meta-class if DP rule is directly employed to fuse. This will cause to a large number of instances are submitted to meta-classes, so that the result of credal classification will be very unclear, which is not an effective classification method.

Based on the above analysis, combined with the characteristics of DS rule and DP rule, we design an adaptive dynamic fusion method, which can select the conflict information conditionally and submit to the corresponding meta-class. When multiple sources of evidence are fused, it can not only avoid the unreasonable fusion results of high conflict sources of evidence, but also ensure the fast convergence of fusion result.

In the proposed method, we can use Eq. (3) to calculate the contradiction factor $K$ between multiple sources of evidence after discounted, and consider setting a meta-class threshold ɛ. If the contradiction factor is less than the meta-class threshold ɛ, it reflects that the degree of conflict between the sources of evidence is small. So we can adopt the classical DS rule and ensure fast convergence. Conversely, if the conflict factor is greater than the meta-class threshold ɛ, it indicates that the conflict information generated by the sources of evidence fusion is large. In this case, we should adopt the DP rule to submit the conflict information to the uncertain focal element (i.e., meta-class), which can effectively avoid the problem of unreasonable fusion results that may be caused by the DS rule.

In the proposed method, for the instance x_i, the adaptive fusion method is defined as follows: ${\begin{matrix} m_{i} (A) = \frac{\sum_{B \cap C = A | B, C \in 2^{Ω}} m_{i}^{g - 1} (B) m_{i}^{g} (C)}{1 - \sum_{B \cap C = \emptyset | B, C \in 2^{Ω}} m_{i}^{g - 1} (B) m_{i}^{g} (C)}, K < ɛ \\ m_{i} (A) = \sum_{B \cap C = A | B, C \in 2^{Ω}} m_{i}^{g - 1} (B) m_{i}^{g} (C) \\ + \sum_{B \cap C = \emptyset | B \cup C = A | B, C \in 2^{Ω}} m_{i}^{g - 1} (B) m_{i}^{g} (C), K \geq ɛ \end{matrix}$ (15)

1) Guideline for Choosing the Parameter ɛ : In the applications, the threshold ɛ of the proposed should be tuned according to the number of instances in the meta-classes. In general, ɛ can also be tuned by a grid-search in [0, 1]. A small ɛ value generally leads to fewer instances in the meta-classes, but it may cause more misclassification for uncertain instances with missing values. A big ɛ value yields more instances in the meta-classes and leads to high imprecision degree, which is not an efficient solution for the classification problem. So ɛ should be tuned according to the imprecision degree that one can accepts.

For the convenience of implementation, the proposed method is outlined in Table 1.

Table 1

Evidential classification of incomplete instance

Input:

Training set:

Y = {y_{1}, . . ., y_{m}} \subset ℝ^{p}

Test set:

X = {x_{1}, . . ., x_{n}} \subset ℝ^{p}

Parameters:

K: the number of neighbors (default K = 11)

ɛ: threshold of meta-class (ɛ ∈ [0, 1])

for

Calculate single or multiple versions of estimations of the

incomplete instance by KNCNI by Eqs. (6)-(10);

Classify single or multiple edited instances with

missing values by Eq. (11);

Discount multiple classification results by Eqs. (12)-(14);

Calculate the contradiction factors by Eq. (3);

Fuse multiple results adaptively by Eq. (15).

end

4 Experiments

In this section, two particular artificial datasets and nine well-known datasets from UCI 2 machine learning repository to test and demonstrate the effectiveness of the proposed method with respect to mean imputation (MI) [7], K-nearest neighbor imputation (KNNI) [8], Fuzzy c-means imputation (FCMI) [10], Local linear approximate (LLA) [13] and Fuzzy c-means centroid based imputation (FCMCI) [15] methods. For convenience, we denote ω_j,...,k ≜ {ω_j, . . . , ω_k}, $ω_{j}^{tr} ≜ ω_{j}^{training}$ and $ω_{j}^{te} ≜ ω_{j}^{test}$ .

In this paper, the Support Vector Machine (SVM) [35], K-Nearest Neighbor (K-NN) [36], Evidence K-Nearest Neighbor (EK-NN) [37], Adaboost [41] and Decision Tree (DT) [42] classifier are employed as basic classifiers, respectively. K = 11 is the default in KNNs, K-NN and EK-NN. In the proposed method, the meta-class parameter ɛ is optimized by grid search using training instances. Generally, a better optimized value corresponds to the acceptable imprecision rate. In the proposed method, the class of each instance is decided by the maximum mass of belief. Thereby, the classification results include singleton classes and meta-class. To fully evaluate the performance of the proposed method, the error rate (Re) (respectively, imprecision rate (Ri)) (In %) [23], Precision (P), Recall (R), F-Measure (FM) [43] and Rand Index (RI) [44] will be used as indexes 3 . These indexes can be defined as: $Re = \frac{Ne}{N} Ri = \frac{Ni}{N}$

$P = \frac{A}{A + C} R = \frac{A}{A + D}$

$FM = \frac{2 \times P \times R}{P + R} RI = \frac{2 \times (A + B)}{N (N - 1)}$

where Ne (respectively, Ni) is number of instances submitted to error classes (respectively, meta-classes), and N is the number of instances under test. A (respectively, B) denotes the number of instance pairs that are simultaneously submitted to the same classes (respectively, different classes) by the ground truth and classification results. C (respectively, D) denotes the number of dissimilar (respectively, similar) instance pairs that are submitted to the same classes (respectively, different classes). Note that a smaller value of Re corresponds to the better classification performance, whereas the larger values of P, R, FM, and RI correspond to the better classification performance.

4.1 Experiment 1

Let’s consider a 3-class square dataset to test the performance of the proposed method with respect to others methods, as shown in Fig. 1. Each class contains 300 training instances and test instances, the distribution intervals of the 3-class of dataset is as follows:

	x-label interval	y-label interval
ω₁	(0, 3)	(4.5, 7.5)
ω₂	(2.25, 5.25)	(2.25, 5.25)
ω₃	(4.5, 7.5)	(0, 3)

Fig. 1

Classification results on square data set via different methods: (a) Training and test instances. (b) Classification result via MI (Re = 32.56). (c) Classification result via KNNI (Re = 18.00). (d) Classification result via FCMI (Re = 16.44). (e) Classification result via LLA (Re = 21.33). (f) Classification result via FCMCI (Re = 16.33). (g) Classification result via proposed method with ɛ = 0.1 (Re = 0.44, Ri = 32.89). (h) Classification result via proposed method with ɛ = 0.15 (Re = 1.89, Ri = 29.11). (i) Classification result via proposed method with ɛ = 0.2 (Re = 4.56, Ri = 22.78).

We assume that the values in the first feature (i.e., x-axis) of test instances are all missing, the classification of test instances only depends on values in the second feature (i.e., y-axis). Here, SVM is employed to classified test instances. In the proposed method, three different meta-class thresholds ɛ = 0.1, ɛ = 0.15 and ɛ = 0.2 are selected to test their effects on classification results. The classification results of different methods are shown in Fig. 1(b)-(i).

As can be see from Fig. 1(a), in the case of the missing of feature values in the x-axis, there are local overlaps in the edges of classes ω₁ and ω₂, ω₂ and ω₃ only according to the feature values of the y-axis, and the instances of these overlapping zones are difficult to be submitted to a specific class accurately. Whereas MI, KNNI, FCMI, LLA and FCMCI provide only one estimation for missing values. The classification results submit all instances to a singleton class, which leads to a more misclassification rate for the instances in the overlapping zones. With the proposed method, most instances of overlapping zones are submitted to the corresponding meta-classes ω_1,2 and ω_2,3, which characterize the uncertainty and imprecision of classification caused by missing values. In the proposed method, taking ɛ from ɛ = 0.1 to ɛ = 0.2 as shown in Fig. 1(g)-(i), the degree of imprecision will be reduced but the classification error rate will increase. In applications, the meta-class parameter ɛ should be tuned depending on the imprecision rate the user can accept in the classification.

4.2 Experiment 2

In this experiment, we consider a 3-class parabolic dataset as shown in Fig. 2 with 300 training instances and test instances of each class. K-NN is employed to evaluate the proposed method with respect to KNNI, FCMI, LLA and FCMCI. In the proposed method, the meta-class threshold is ɛ = 0.15. Let assume that the values in the second feature (i.e., y-axis) of test instances are missing, and test instances only one value in the first feature (i.e., x-axis). The error rate Re and imprecision rate Ri (for the proposed method) of these methods are given in the caption of each subfigure.

Fig. 2

Classification results on parabolic data set via different methods: (a) Training and test instances. (b) Classification result via KNNI (Re = 3.70). (c) Classification result via FCMI (Re = 6.50). (d) Classification result via LLA (Re = 9.07). (e) Classification result via FCMCI (Re = 3.92). (f) Classification result via proposed method (Re = 0.56, Ri = 8.40).

In Fig. 2(a), one can clearly see that different classes overlap in the edge zone. Due to the missing y-axis, only according to the x-axis classes ω₁ and ω₂ (respectively, ω₂ and ω₃) overlap in the interval [0.5, 1.5] (respectively, [3.5, 4.5]), and the instances in the overlapping zone are difficult to distinguish for a specific class.

In Fig. 2(b)-(e), KNNI, FCMI, LLA and FCMCI submit instances to a singleton class without considering meta-class in framework of probabilistic, which leads to relatively high error rate. Whereas the proposed method can cautiously submit the instances of overlapping zones to appropriate meta-classes under the framework of evidence theory, thereby the error rate of classification result is greatly reduced but with some imprecision rates, which can well interpret the uncertainty caused by insufficient feature information of the instances.

4.3 Experiment 3

Nine real datasets have been used to test and evaluate the performance of the proposed method with other methods, K-NN is still selected as standard classifier. The basic knowledge of the used real datasets including number of classes, features and instances, are shown in Table 2.

Table 2
Basic information of the used datasets

Data Classes Features Instances

Seeds (Se) 3 7 210

Wine (Wi) 3 13 178

Vehicle (Ve) 4 18 846

Statlog (St) 2 13 270

Red Wine (RW) 6 11 1599

White Wine (WW) 7 11 4898

Segment (Seg) 7 19 2310

Spambase (Sp) 2 57 4597

Contraceptive (Co) 3 9 1473

Data	Classes	Features	Instances
Seeds (Se)	3	7	210
Wine (Wi)	3	13	178
Vehicle (Ve)	4	18	846
Statlog (St)	2	13	270
Red Wine (RW)	6	11	1599
White Wine (WW)	7	11	4898
Segment (Seg)	7	19	2310
Spambase (Sp)	2	57	4597
Contraceptive (Co)	3	9	1473

Each dataset is randomly divided into half training set and half test set. For the test set, MACR mechanism is used to simulate missing values, which τ feature values are missing in each test instance. The average error rate Re and imprecision rate Ri (for the proposed method) of different methods are reported in Table 3.

Table 3

Classification results of different methods (In %)

Data	τ	MI	KNNI	FCMI	LLA	FCMCI	Proposed method
		Re	Re	Re	Re	Re	[Re, Ri]
	1	15.24	5.71	5.71	6.67	5.71	[4.76, 0]
Se	3	26.67	10.48	12.38	12.38	10.48	[8.57, 2.86]
	5	50.48	20.00	24.76	22.86	25.71	[13.33, 9.52]
	3	40.00	27.78	34.44	31.11	31.11	[24.44, 0]
Wi	6	47.78	36.67	41.11	42.22	35.56	[28.89, 4.44]
	9	50.00	36.67	42.22	45.56	38.89	[31.11, 6.66]
	4	50.47	43.63	48.11	43.87	44.58	[41.04, 1.42]
Ve	8	59.91	44.34	52.59	45.75	47.17	[41.51, 2.12]
	12	66.04	49.53	59.67	50.94	53.30	[46.70, 3.30]
	3	31.11	31.11	30.37	28.89	29.63	[26.67, 1.48]
St	6	38.52	34.81	35.56	32.59	35.56	[28.89, 5.93]
	9	38.52	37.04	34.07	37.04	38.52	[37.78, 5.93]
	2	49.43	48.79	48.92	49.05	50.44	[45.87, 3.43]
RW	5	54.76	50.19	52.22	52.10	52.22	[48.28, 5.72]
	8	57.18	55.40	58.20	55.78	53.49	[48.79, 6.48]
	2	54.47	53.53	53.86	54.02	52.87	[50.49, 3.53]
WW	5	58.45	57.59	57.05	58.24	59.52	[50.49, 8.78]
	8	63.62	59.31	59.23	59.35	64.19	[50.21, 10.38]
	4	27.45	15.06	27.71	15.32	23.12	[12.73, 1.56]
Seg	7	44.24	20.52	44.59	20.09	28.92	[14.55, 3.64]
	10	56.45	20.78	56.62	21.13	41.65	[14.11, 7.97]
	10	26.10	24.40	29.88	24.58	27.58	[23.31, 1.00]
Sp	20	38.41	26.05	32.97	28.06	29.32	[23.92, 2.17]
	30	39.02	27.23	39.58	32.84	31.75	[25.27, 4.83]
	2	49.46	51.76	50.14	49.05	49.32	[47.43, 2.98]
Co	4	51.22	53.25	52.30	52.30	54.20	[47.56, 4.34]
	6	58.27	59.35	60.03	60.16	56.10	[52.72, 6.37]
Ave		46.05	36.95	42.38	38.27	39.50	[32.94, 4.33]

The result of Table 3 clearly shows that the proposed method generally produces lower error rate than MI, KNNI, FCMI, LLA and FCMCI methods under different missing features. One can find that the proposed method introduces meta-class in the framework of evidence theory to characterize the uncertainty caused by missing values has been submitted to the appropriate meta-classes for instances that are difficult to classify into specific classes. Note that the increase of the number of missing values (i.e., τ) in test instances results in the increase of error rate and imprecision degree of the proposed method, since the more missing values, the greater the uncertainty of classification.

Figure 3 shows the influence of different K values in K-NN on different methods classification results. The x-axis corresponds to the K value, ranging from 7 to 17, and the y-axis corresponds to the average classification results (including error rate and imprecision rate), which is expressed in [0, 1]. Among them, the nine data sets shown in Fig. 3 are randomly missing different dimensions. One can observe that the error rate of the proposed method is much lower than other methods. The classification results associated with different K values have little change in the proposed method, which further shows that the classification performance is insensitive to the setting of K value. This also explains that the proposed method has strong robustness to the selection of K value, so one can take K from 7 to 17 in practical applications.

Fig. 3

Classification results on UCI data set via different K values.

To further analyze the performance of the proposed method, Fig. 4 displays the mean values of P, R, FM, and RI of the proposed method with respect to KNNI, LLA and FCMCI. Here, we use pignistic transformation to make credal partition generated by the proposed method into crisp partition. Figure 4(a)–(c) shows the P values obtained by the proposed vs. those obtained by KNNI, LLA and FCMCI. One can fins that the proposed method outperforms other methods on nine real-world data sets. Similarly, Fig. 4(d)–(f) and (g)–(i) display the R and FM values of the proposed method with others, it can be seen that the proposed method is better than that of KNNI, LLA and FCMCI. Finally, we can be seen from Fig. 4(j)–(l) that the RI values of the proposed method outperforms KNNI, LLA and FCMCI in most cases.

Fig. 4

The mean P, R, FM, and RI values via different methods: (a), (b), (c) P via Proposed method vs. P via KNNI, LLA, FCMCI. (d), (e), (f) R via Proposed method vs. R via KNNI, LLA, FCMCI. (g), (h), (i) FM via Proposed method vs. FM via KNNI, LLA, FCMCI. (j), (k), (l) RI via Proposed method vs. RI via KNNI, LLA, FCMCI.

4.4 Experiment 4

In this experiment, we also use the nine real data sets to test the performance of the proposed method with respect to other methods. EK-NN, Adaboost, DT are selected here as basic classifiers. The average error rate Re and imprecision rate Ri (for the proposed method) of the different classical method with different basic classifiers (i.e., EK-NN, Adaboost, DT), are given in Table 4.

Table 4
Classification results of different classifiers (In %)

Data τ Classifier MI KNNI FCMI LLA FCMCI Proposed method

Re Re Re Re Re [Re, Ri]

EK-NN 25.71 8.57 12.38 12.38 10.48 [7.62, 0]

Se 3 Adaboost 27.62 6.67 9.52 10.48 8.57 [7.62, 9.52]

DT 18.10 5.71 8.57 7.62 6.67 [5.71, 6.67]

EK-NN 46.67 38.89 41.11 42.22 35.56 [28.89, 4.44]

Wi 6 Adaboost 24.44 22.22 20.00 18.89 17.78 [11.11, 4.44]

DT 24.44 16.67 23.33 18.89 20.00 [15.56, 4.44]

EK-NN 58.96 45.75 54.72 46.46 47.17 [41.75, 3.54]

Ve 8 Adaboost 64.86 59.91 59.20 60.61 63.68 [52.35, 8.73]

DT 53.07 39.62 51.18 41.27 46.23 [34.67, 4.95]

EK-NN 39.26 34.81 36.30 33.33 34.07 [29.63, 5.93]

St 6 Adaboost 29.63 25.93 30.37 25.19 30.37 [22.96, 2.96]

DT 29.63 28.89 29.63 28.15 25.19 [23.70, 9.63]

EK-NN 55.91 51.33 53.24 53.11 52.73 [47.90, 6.73]

RW 5 Adaboost 50.32 47.27 49.05 47.78 48.16 [43.84, 6.23]

DT 50.32 51.08 48.28 50.19 47.65 [44.47, 5.72]

EK-NN 58.45 57.42 57.18 58.37 59.76 [51.97, 6.11]

WW 5 Adaboost 54.68 54.51 54.68 54.80 54.59 [52.99, 2.71]

DT 55.05 55.41 55.58 53.57 56.19 [51.03, 4.51]

EK-NN 56.71 21.30 56.80 21.21 37.75 [13.85, 7.71]

Seg 7 Adaboost 60.26 22.08 60.26 22.60 44.33 [15.73, 6.32]

DT 54.37 17.84 54.37 18.70 39.05 [8.23, 6.32]

EK-NN 38.32 25.92 32.93 28.01 29.06 [21.66, 6.96]

Sp 20 Adaboost 20.70 12.66 20.88 14.22 20.36 [8.83, 5.57]

DT 28.75 18.88 31.06 19.75 30.88 [10.44, 9.96]

EK-NN 51.63 51.63 52.85 53.39 53.79 [46.88, 4.20]

Co 4 Adaboost 54.07 54.07 54.20 52.98 55.01 [51.36, 4.47]

DT 54.47 54.61 55.28 54.34 53.66 [47.98, 4.61]

Ave 43.94 34.43 41.22 35.13 38.10 [29.58, 5.68]

Data	τ	Classifier	MI	KNNI	FCMI	LLA	FCMCI	Proposed method
		EK-NN	25.71	8.57	12.38	12.38	10.48	[7.62, 0]
Se	3	Adaboost	27.62	6.67	9.52	10.48	8.57	[7.62, 9.52]
		DT	18.10	5.71	8.57	7.62	6.67	[5.71, 6.67]
		EK-NN	46.67	38.89	41.11	42.22	35.56	[28.89, 4.44]
Wi	6	Adaboost	24.44	22.22	20.00	18.89	17.78	[11.11, 4.44]
		DT	24.44	16.67	23.33	18.89	20.00	[15.56, 4.44]
		EK-NN	58.96	45.75	54.72	46.46	47.17	[41.75, 3.54]
Ve	8	Adaboost	64.86	59.91	59.20	60.61	63.68	[52.35, 8.73]
		DT	53.07	39.62	51.18	41.27	46.23	[34.67, 4.95]
		EK-NN	39.26	34.81	36.30	33.33	34.07	[29.63, 5.93]
St	6	Adaboost	29.63	25.93	30.37	25.19	30.37	[22.96, 2.96]
		DT	29.63	28.89	29.63	28.15	25.19	[23.70, 9.63]
		EK-NN	55.91	51.33	53.24	53.11	52.73	[47.90, 6.73]
RW	5	Adaboost	50.32	47.27	49.05	47.78	48.16	[43.84, 6.23]
		DT	50.32	51.08	48.28	50.19	47.65	[44.47, 5.72]
		EK-NN	58.45	57.42	57.18	58.37	59.76	[51.97, 6.11]
WW	5	Adaboost	54.68	54.51	54.68	54.80	54.59	[52.99, 2.71]
		DT	55.05	55.41	55.58	53.57	56.19	[51.03, 4.51]
		EK-NN	56.71	21.30	56.80	21.21	37.75	[13.85, 7.71]
Seg	7	Adaboost	60.26	22.08	60.26	22.60	44.33	[15.73, 6.32]
		DT	54.37	17.84	54.37	18.70	39.05	[8.23, 6.32]
		EK-NN	38.32	25.92	32.93	28.01	29.06	[21.66, 6.96]
Sp	20	Adaboost	20.70	12.66	20.88	14.22	20.36	[8.83, 5.57]
		DT	28.75	18.88	31.06	19.75	30.88	[10.44, 9.96]
		EK-NN	51.63	51.63	52.85	53.39	53.79	[46.88, 4.20]
Co	4	Adaboost	54.07	54.07	54.20	52.98	55.01	[51.36, 4.47]
		DT	54.47	54.61	55.28	54.34	53.66	[47.98, 4.61]
Ave			43.94	34.43	41.22	35.13	38.10	[29.58, 5.68]

In Table 4, one can see that error rates of the proposed method with EK-NN, Adaboost and DT are smaller than the other applied methods. Meanwhile, some incomplete instances that are very difficult to classify into a specific class have been submitted to the meta-classes. So the proposed method including meta-class is very useful and efficient here to characterize the imprecision caused by missing values and it can help to reduce the classification error rate. We can also see that the proposed method has good adaptability in three basic classifiers: EK-NN, Adaboost and DT, which is undeniable. That is, the proposed method has good robustness and can be applied to various basic classifiers. However, we find that the DT classifier will consume less time than EK-NN and Adaboost in the case of large amount of instances, since EK-NN and Adaboos method will bring heavy computational burden in this case.

Figure 5 displays the error rates of various methods with different classifiers (i.e., EK-NN, Adaboost, DT) in the case of hard partitioning, where the x-axis denotes different classifiers, and the y-axis represents the error rate. One can intuitively find that the proposed method has better classification performance than other methods in most cases.

Fig. 5

Classification error rates on UCI data set via different classifiers.

5 Conclusion

A new method is proposed to classify incomplete instance thanks to the evidence theory, which can effectively solve the classification problem of missing values. In the proposed method, we first propose a new K-nearest centroid neighborhood imputation (KNCNI) method, which considers the weights of different features. For query (incomplete) instances, a more strict imputation strategy based on Shannon entropy and KNCNs is proposed to adaptively estimate the missing values. Then, we select a standard classifier to process the complete instances, and use the labeled instances in the training set to estimate and classify each instance. Finally, an adaptive method is designed according to the distance between the discount factor and the final decision. In this way, patterns that are difficult to be correctly classified into specific classes will be automatically submitted to reasonable meta-classes, thus effectively reducing the error rate and revealing the imprecision and uncertainty of classification. Four experiments with artificial and real datasets have been done to evaluate the performances of proposed method with respect to other classical and state-of-the-art methods. The results show that the proposed method is able to reduce misclassification rates, and well capture and represent the imprecision of classification caused by missing values. In future, we intend to extend this method to the classification of incomplete multi-view data, which is worthy of further study.

Footnotes

Acknowledgments

This work has been partially supported by the National Key Research and Development Program of China (2019YFC1907105), the Key Research and Development Project of Shaanxi Province (No. 2020GY-186, No. 2020SF-367).

In practical application, if there are incomplete instances in the training set, complete instances of the same class can be used to estimate the missing values. In this paper, we focus on the classification of incomplete instance in the test set, so we assume that the training set is complete.

For the proposed method, we can employ pignistic transformation [] to make the credal partition into fuzzy or hard partition.

References

Jordan

M.I.

and Mitchell

T.M.

, Machine learning: Trends,perspectives, and prospects, Science 349(6245) (2015), 255–260.

Schoot

, Depaoli

, King

, Kramer

, Martens

, Tadesse

M.G.

, Vannucci

, Gelman

, Veen

, Willemsen

, Yau

, Bayesian statistics and modelling, Nature Reviews Methods Primers, (2021). doi:10.1007/978-1-4614-1800-9-17.

Rocher

, Hendrickx

J.M.

and Montjoye

, Estimating the success ofre-identifications in incomplete datasets using generative models, Nature Communications 10(3069) (2019). doi: 10.1038/s41467-019-10933-3.

Chang

, Deng

, Jiang

and Long

, Multiple imputation foranalysis of incomplete data in distributed health data networks, Nature Communications 11(1) (2020). doi: 10.1038/s41467-020-19270-2.

Garc’ıa-Laencina

, Sancho-Gómez

and Figueiras-VidalPattern

, classification with missing data: a review, Neural Computing and Applications 19(2) (2010), 263–282.

Little

R.J.

, Rubin

D.B.

, Statistical analysis with missing data, Hoboken, NJ, USA: Wiley, (2014).

Mundfrom

D.J.

and Whitcomb

, Imputing missing values: the effecton the accuracy of classification, MLRV 25 (1998), 13–19.

Batista

, Monard

M.C.

, A study of k-nearest neighbour as an imputation method, International Conference on Hybrid Intelligent Systems, (2002), 251–260.

Cheng

C.H.

, Chan

C.P.

and Sheu

Y.J.

, A novel purity-based k nearestneighbors imputation method and its application in financialdistress prediction, Engineering Applications of ArtificialIntelligence 81(5) (2019), 283–299.

10.

Luengo

, Saez

J.A.

and Herrera

, Missing data imputation forfuzzy rule-based classification systems, Soft Computing 16(5) (2012), 83–881.

11.

Aydilek

I.B.

and Arslan

, A hybrid method for imputation ofmissing values using optimized fuzzy c-means with support vectorregression and a genetic algorithm, Information Sciences 233 (2013), 25–35.

12.

Folguera

, Zupan

, Cicerone

and Magallanes

J.F.

, Self-organizing maps for imputation of missing data in incompletedata matrices, Chemometrics and Intelligent Laboratory System 143 (2015), 146–151.

13.

Dai

J.H.

, Hu

Q.H.

, Huang

, Zheng

N.G.

and Liu

, Locally linear approximation approach for incomplete data, IEEE Transactions on Cybernetics 48 (2018), 1720–1732.

14.

Huang

G.B.

, Zhou

, Ding

and Zhang

, Extreme learning machinefor regression and multiclass classification, IEEE Transactionson Systems Man Cybernetics: Systems 42(2) (2012), 513–529.

15.

Raja

P.S.

and Thangavel

, Missing value imputation usingunsupervised machine learning techniques, Soft Computing 24(3) (2020), 4361–4392.

16.

Liu

and Brown

S.D.

, Comparison of five iterative imputationmethods for multivariate classification, Chemometrics andIntelligent Laboratory System 120 (2013), 106–115.

17.

Bodt

, Mulders

, Verleysen

and Lee

J.A.

, Nonlineardimensionality reduction with missing data using parametric multiple imputations, IEEE Transactions on Neural Networks and Learning Systems 30 (2019), 1166–1179.

18.

Liu.

, Incomplete big data imputation mining algorithm based on BPneural network, Journal of Intelligent & Fuzzy Systems 37(13) (2019), 1–10.

19.

Shafer

, A Mathematical Theory of Evidence, Princeton, NJ, USA: Princeton Univ. Press, (1976).

20.

, Zheng

, Yang

J.B.

, Xu

D.L.

and Chen

Y.W.

, Dataclassification using evidence reasoning rule, Knowledge-Based Systems 116 (2017), 144–151.

21.

Zhang

, Liu

, Chao

, Zhang

Z.J.

and Zhang

Z.Y.

, Classificationof incomplete data based on evidence theory and an extreme learningmachine in wireless sensor networks, Sensors 18(4) (2018), 1046–1061.

22.

Z.F.

, Tian

H.P.

, Liu

Z.C.

and Zhang

Z.W.

, A new incompletepattern belief classification method with multiple estimations basedon KNN, Applied Soft Computing 90(4) (2020). doi: 10.1016/j.asoc.2020.106175.

23.

Z.F.

, Liu

, Zhang

Y.R.

, Song

and He

J.H.

, Credal transferlearning with multi-estimation for missing data, IEEE Access 8 (2020), 70316–70328.

24.

Z.G.

and Denœux

, BPEC: Belief-peaks evidential clustering, IEEE Transactions on Fuzzy Systems 27(1) (2018), 111–123.

25.

Masson

M.H.

and Denœux

, ECM: An evidential version of thefuzzy c-means algorithm, Pattern Recognition 41(4) (2008), 1384–1397.

26.

Zhang

Z.W.

, Liu

, Martin

, Liu

Z.G.

, Zhou

, Dynamic evidential clustering algorithm, Knowledge-Based Systems, (2021). doi: 10.1016/j.knosys.2020.106643.

27.

Zhang

Z.W.

, Liu

, Ma

Z.F.

, Zhang

Y.R.

, Wang

, A new belief-based incomplete pattern unsupervised classification method, IEEE Transactions on Knowledge and Data Engineering, (2021). doi: 10.1109/TKDE.2021.3049511.

28.

Lin

, Li

, Yin

and Dou

, Multisensor fault diagnosismodeling based on the evidence theory, IEEE Transactions on Reliability 67 (2018), 513–521.

29.

, Wang

, Cao

and Ding

, A genetic-algorithm supportvector machine and D-S evidence theory based fault diagnostic modelfor transmission line, IEEE Transactions on Power Systems 34(99) (2019), 4186–4194.

30.

Zhu

Y.G.

, Duan

H.Y.

, Wang

X.H.

, Zhou

B.K.

, Wang

G.D.

and GrosuGaussian

, convex evidence theory for ordered and fuzzy evidencefusion, Journal of Intelligent & Fuzzy Systems 33(5) (2017), 2843–2849.

31.

Floria

, Leon

and Logofatu

, A model of information diffusionin dynamic social networks based on evidence theory, Journal of Intelligent & Fuzzy Systems 37(6) (2019), 7369–7381.

32.

Xiao

F.Y.

, Generalized belief function in complex evidence theory, Journal of Intelligent & Fuzzy Systems 38(4) (2020), 3665–3673.

33.

Snchez

J.S.

, Pla

and Ferri

, On the use of neighbourhood-basednon-parametric classifiers, Pattern Recognition Letters 18(11) (1997), 1179–1186.

34.

Yin

and Deng

, Toward uncertainty of weighted networks: Anentropy-based model, Physica A: Statistical Mechanics and itsApplications 508 (2018), 176–186.

35.

Mathur

and Foody

G.M.

, Multiclass and binary SVM classification:Iimplications for training and classification users, IEEE Geoscience and Remote Sensing Letters 5(2) (2008), 241–245.

36.

Zhang

, Li

and Zong

, Efficient kNN classification withdifferent numbers of nearest neighbors, IEEE Transactions onNeural Networks and Learning Systems 29 (2018), 1774–1785.

37.

Zouhal

L.M.

and Denœux

, An evidence-theoretic k-NN rule withparameter optimization, IEEE Transactions on Systems Man and Cybernetics Part C Applications & Reviews 28(2) (1998), 263–271.

38.

Ghahramani

, Probabilistic machine learning and artificialintelligence, Nature 521 (2015), 452–459.

39.

Yager

R.R.

, On the Dempster-Shafer framework and new combinationrules, Information Sciences 41(2) (1987), 93–137.

40.

Dubois

and Prade

, Representation and combination ofuncertainty with belief functions and possibility measures, Computational Intelligence 4(3) (1988), 244–264.

41.

Freund

and Schapire

R.E.

, A decision-theoretic generalization ofon-line learning and an application to boosting, Journal ofComputer and System Sciences 55 (1997), 119–139.

42.

Safavian

S.R.

and Landgrebe

, A survey of decision tree classifier methodology, IEEE Transactions on Systems Man and Cybernetics 21 (2002), 660–674.

43.

Yang

, An evaluation of statistical approaches to text categorization, Information Retrieval 1 (1999), 67–88.

44.

Rand

W.M.

, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association 66 (1971), 846–850.

Evidential classification of incomplete instance based on K-nearest centroid neighbor

Abstract

Keywords

1 Introduction

2 Background

2.1 Basic of evidence theory

3.1 Adaptive imputation of incomplete instance

4.1 Experiment 1

Table 2 Basic information of the used datasets Data Classes Features Instances Seeds (Se) 3 7 210 Wine (Wi) 3 13 178 Vehicle (Ve) 4 18 846 Statlog (St) 2 13 270 Red Wine (RW) 6 11 1599 White Wine (WW) 7 11 4898 Segment (Seg) 7 19 2310 Spambase (Sp) 2 57 4597 Contraceptive (Co) 3 9 1473

Footnotes

Acknowledgments

References

Table 2
Basic information of the used datasets

Data Classes Features Instances

Seeds (Se) 3 7 210

Wine (Wi) 3 13 178

Vehicle (Ve) 4 18 846

Statlog (St) 2 13 270

Red Wine (RW) 6 11 1599

White Wine (WW) 7 11 4898

Segment (Seg) 7 19 2310

Spambase (Sp) 2 57 4597

Contraceptive (Co) 3 9 1473