A multi-average based pseudo nearest neighbor classifier

Abstract

Conventional k nearest neighbor (KNN) rule is a simple yet effective method for classification, but its classification performance is easily degraded in the case of small size training samples with existing outliers. To address this issue, A multi-average based pseudo nearest neighbor classifier (MAPNN) rule is proposed. In the proposed MAPNN rule, $k (k - 1) / 2$ ( $k > 1$ ) local mean vectors of each class are obtained by taking the average of two points randomly from k nearest neighbors in every category, and then k pseudo nearest neighbors are chosen from $k (k - 1) / 2$ local mean neighbors of every class to determine the category of a query point. The selected k pseudo nearest neighbors can reduce the negative impact of outliers in some degree. Extensive experiments are carried out on twenty-one numerical real data sets and four artificial data sets by comparing MAPNN to other five KNN-based methods. The experimental results demonstrate that the proposed MAPNN is effective for classification task and achieves better classification results in the small-size samples cases comparing to five relative KNN-based classifiers.

Keywords

Multi-average small size training samples

1. Introduction

K-Nearest neighbor (KNN) rule is easy yet effective classification algorithm. In KNN rule, the asymptotically classification performance can be achieved by using the Bayes method under sufficient conditions [16]. It was first proposed for classification [3]. According to this algorithm, a query sample is assigned to the class represented by the majority of the k nearest neighbors in the training set. It is one of the top ten classification algorithms in data mining [23]. Hence, KNN has attracted wide attention. Many KNN-based classification methods, such as [16,18,22,24], have been reported.

As a nonparametric classifier, KNN classification results are easily affected by outliers, especially in the case of small size training sets [26]. To deal with this problem, some powerful methods have been proposed to improve classification performance. A local mean-based nonparametric classifier (LMKNN) has been proposed [19]. Its classification accuracy is better than the ordinary KNN in the case of small samples with the existing outliers. The category of a query point is classified according to the distance-weighted sum of k nearest points chosen from each class in pseudo nearest neighbor [25] (PNN) rule, which is based on the distance weighted k-nearest neighbor (WKNN) rule [6]. The classification performance is better than KNN in small-size training sample case with outliers. As an extension of both PNN and LMKNN, a local mean-based pseudo nearest neighbor classifier (LMPNN) was proposed [14]. In the LMPNN classifier rule, the class label of a query point is determined by the weighted distance sum between the k pseudo nearest neighbors and the query point. The classification accuracy is higher than other methods such as KNN, LMKNN, and PNN in general. Based on the ideas of LMKNN, PNN, and LMPNN, some new methods have been proposed [9–13,15,17,21,22,27] to improve classification performance.

A number of KNN-based classification methods have been proposed above. Based on this basis, a new multi-average pseudo nearest neighbor classifier (MAPNN) is introduced in this paper. MAPNN calculates $k (k - 1) / 2$ local mean vectors from k nearest neighbors in each class. Then, the test sample is classified according to the distances between the first k pseudo local nearest vectors and the test sample. Different from LMPNN method, the k pseudo nearest neighbors are selected from $k (k - 1) / 2$ local mean nearest neighbors in each class to calculate the distance sum between the sampling point and the pseudo local vector points. The selected k pseudo local nearest neighbors can further reduce the negative impact of outliers. Compared with KNN, WKNN, LMKNN, PNN, and LMPNN, MAPNN can further improve classification performance. The effectiveness and superiority of the proposed method are verified by experiments on numerical real sets and synthetic data sets.

The key work in this paper is summarized as follows:

1. A new multi-average pseudo nearest neighbor classifier is proposed. In this method, $k (k - 1) / 2$ ( $k > 1$ ) local mean vectors are obtained by averaging random two points from k nearest neighbors of the query sample in each class. Then, the first k pseudo local nearest neighbors from $k (k - 1) / 2$ ( $k > 1$ ) local mean vectors for each class are selected in MAPNN rule. The selected k pseudo nearest neighbors are less sensitive to outliers. Hence, compared to conventional KNN, MAPNN can improve classification performance in the case of small-size samples with outliers in some degree.

2. Experiments were conducted to verify the effectiveness and superiority of MAPNN.

The organization of the rest of this article is as follows. In Section 2, we briefly summarize the related work. In Section 3, we introduce the proposed MAPNN method. In Section 4, Differences between MAPNN and LMPNN are described. Section 5 presents extensive experiments on numerical data sets and synthetic data sets. Finally, the conclusions are given in Section 6.

2. Related work

In this section, we briefly review some related typical KNN-based classifiers.

2.1. LMKNN classifier

As a classical extension of KNN, LMKNN [19] is a simple yet effective classification algorithm. It utilizes average method to reduce the influence of outlier points. It’s less sensitive to k than KNN, especially in the case of small size training samples with existing outliers.

In the LMKNN rule, a given query sample $y \in R^{D}$ in a D dimensional feature space is classified into class $ω_{i}$ by the following steps:

Step1. Calculate the euclidean distances between the query sample y and training samples from each class $ω_{i}$ . Then, k nearest neighbors $x_{1}^{i}, x_{2}^{i}, \dots, x_{k}^{i}$ are selected,where i represents a class number, according to the ascending order of the calculated distance.

Step2. A categorical local mean vector $y_{i}$ for every class is calculated as: $\begin{matrix} (1) & y_{i} = \frac{1}{k} \sum_{j = 1}^{k} x_{j}^{i} . \end{matrix}$

Step3. Calculate the distance between the local mean vector $y_{i}$ and the query point y for the class $ω_{i}$ . Then, the pattern of the query sample y is finally classified into the class with the minimum distance between each categorical local mean vector and y among all classes.

2.2. PNN classifier

For the improvement of KNN, The pseudo nearest neighbor classifier(PNN) shows good performance. The algorithm is described as follows:

1. Calculate the distance between a query sample y and the training sample $x_{j}^{i}$ from each class by euclidean distance: $\begin{matrix} (2) & d (y, x_{j}^{i}) = \sqrt{{(y - x_{j}^{x})}^{T} (y - x_{j}^{x})} \end{matrix}$ where $x_{j}^{i}$ represents the jth sample of class $ω_{i}$ , $i = 1, 2, \dots, M$ , M is the number of classes. Then, k nearest neighbors $\overline{x_{1}^{i}}$ , $\overline{x_{2}^{i}}, \dots, \overline{x_{k}^{i}}$ for class $ω_{i}$ can be obtained by sorting $d (x, x_{j}^{i})$ in an ascending order.

2. Assume that $y_{i}^{pnn}$ denotes the pseudo neighbor from class $ω_{i}$ . The distance $d (y_{i}^{pnn}, y)$ can be contained by $\begin{matrix} (3) & d (y_{i}^{pnn}, y) = w_{1} d (y, \overline{x_{1}^{i}}) + w_{2} d (y, \overline{x_{2}^{i}}) + \dots + w_{k} d (y, \overline{x_{k}^{i}}) \end{matrix}$ where $w_{1}, w_{2}, \dots, w_{k}$ can be defined as $w_{i} = 1, 1 / 2, \dots, 1 / k$ according to PNN rule.

3. Finally, the query sample y is classified into the class with the minimum distance of $d (y_{i}^{pnn}, y)$ among all classes.

2.3. LMPNN classifier

Multiple local mean-based pseudo nearest neighbors are utilized to predict the query sample pattern in LMPNN rule. As an extension of the PNN, LMPNN can further improve classification performance in the case of small size training samples with outliers. In the LMPNN rule, the category of an unclassified sample $y \in R^{D}$ is assigned by the following steps:

Step1. Compute the distances between training samples in each class i and point y. $\begin{matrix} (4) & d (y, x_{j}^{i}) = \sqrt{{(y - x_{j}^{i})}^{T} (y - x_{j}^{i})} \end{matrix}$ where $x_{j}^{i}$ represents the jth training sample in class i. Then, the k nearest neighbors ( $x_{1}^{i}, x_{2}^{i}, \dots, x_{k}^{i}$ ) of y for each class can be found.

Step2. Calculate k pseudo local mean vectors for each class by the following formula: $\begin{matrix} (5) & \overline{y_{j}^{i}} = \frac{1}{j} \sum_{f = 1}^{j} x_{f}^{i}, j = 1, 2, \dots, k \end{matrix}$ where $\overline{y_{j}^{i}}$ denotes the jth pseudo local mean vector of class i.

Step3. Let $\overline{y_{i}^{PNN}}$ represents the categorical pseudo nearest neighbor of y from class i. The weighted distances sum between the local mean-based pseudo nearest neighbor $\overline{y_{i}^{PNN}}$ and the unclasified point y among each class are computed as: $\begin{matrix} (6) & d (y, \overline{y_{i}^{PNN}}) = (d (y, \overline{y_{1}^{i}})) + (\frac{1}{2} d (y, \overline{y_{2}^{i}})) + \dots + (\frac{1}{k} d (y, \overline{y_{k}^{i}})) . \end{matrix}$

Step4. The pattern of the query point y is assigned into the class w when the distance $d (y, \overline{y_{w}^{PNN}})$ is minimum among all classes. It should be noted that LMPNN, PNN, LMKNN is equivalent with $k = 1$ .

3. The proposed MAPNN classifier

In this section, we introduce multi-average pseudo nearest neighbor classifier rule, which is based on the ideas of KNN rules. The purpose of MAPNN rule is to improve the KNN-based classification performance in the situation of small size training samples with existing outliers.

3.1. The basic idea

In KNN rule, the classification accuracy is easily affected by outliers, especially in small size training samples cases. The choice of k value has a great impact on the KNN classification effect. If k is too small, the nearest neighbors may be outliers, hence, the classification performance is affected [12]. On the contrary, the classification accuracy may be degraded with large k value due to containing too many points from other classes among k-nearest neighbor points [12]. Furthermore, the farther neighbors has less importance in traditional KNN methods [25]. However, the farther neighbors may have more contributions to classification [14].

PNN, LMKNN, LPMNN have been reported to improve classification performance in the case of small size training samples with outliers. The larger weight coefficients are assigned to the nearer neighbors in PNN. This method reduces the negative impact of outliers, to a certain content. However, the classification performance can also be degraded by the outliers when the value of k is very big. LMKNN can overcome the negative influence of the existing outliers in some degree. However, the same neighbor size for each class, the same weight coefficient of k nearest neighbors may affect classification performance [12,22]. LMPNN utilizes the multiple pseudo local mean vectors based on k nearest neighbors of each class to classify a query point. It captures more class information compared to LMKNN [14]. Hence, the classification performance is better than LMKNN.

Based on the KNN-based algorithms above, a new multi-average based pseudo nearest neighbor classifier is proposed to further reduce the negative influence of existing outliers in the small size training samples cases.

3.2. The MAPNN classification rule

Let $T = {(y_{j}, c_{i})}_{j = 1}^{N}$ be a training set with N training samples for M classes, where $i = 1, 2, \dots, M$ , $y_{j} \in R^{d}$ , d is the feature dimension. Suppose $T_{i} = {(y_{j}^{i}, c_{i})}_{j = 1}^{N_{i}}$ denotes the training sample set of class $c_{i}$ , where $N_{i}$ is the number of the training samples in class $c_{i}$ . In the proposed MAPNN rule, the class label of a query point y can be obtained by the following steps:

1. Compute the distances between the training samples in each class $c_{i}$ and y. $\begin{matrix} (7) & d_{(} y, y_{j}^{i}) = \sqrt{{(y - y_{j}^{i})}^{T} (y - y_{j}^{i})} \end{matrix}$ where $y_{j}^{i}$ represents the jth sample in the class $c_{i}$ . Then, the k nearest neighbors ( $\overline{y_{1}^{i}}$ , $\overline{y_{2}^{i}}, \dots, \overline{y_{k}^{i}}$ ) of y from each class can be selected by sorting $d (y, y_{j}^{i})$ in an ascending order.

2. $k (k - 1) / 2$ ( $k > 1$ ) local mean vectors ( $m_{q}^{i}$ , $q = 1, 2, \dots, k (k - 1) / 2$ ) are obtained by taking the average of two points randomly from k nearest neighbors for each class, note that $m_{1}^{i} = \overline{y_{1}^{i}}$ with $k = 1$ .

3. Calculate the distances between $m_{q}^{i}$ and the test sample y in each class. Then, the first k pseudo nearest neighbors $\overline{m_{1}^{i}}, \overline{m_{2}^{i}}, \dots, \overline{m_{k}^{i}}$ are obtained and the arranged distances in ascending order $d (y, \overline{m_{1}^{i}}), d (y, \overline{m_{2}^{i}}), \dots, d (y, \overline{m_{k}^{i}})$ are also obtained.

4. Let $m_{i}^{PNN}$ denotes the categorical pseudo nearest neighbor of y from class $c_{i}$ . The weighted distances sum between the local mean based pseudo nearest neighbor $\overline{m_{j}^{i}}$ and the query point y among all classes are computed as: $\begin{matrix} (8) & d (y, m_{i}^{PNN}) = \sum_{j = 1}^{k} \frac{1}{j} d (y, \overline{m_{j}^{i}}) . \end{matrix}$

5. Classify y as the class $C_{i}$ with the smallest pseudo-distance: $\begin{matrix} (9) & C_{i} = argmin d (y, m_{i}^{PNN}) . \end{matrix}$ It should be noted that MAPNN, LMPNN, PNN, LMKNN are equivalent with $k = 1$ .

As described above, the proposed MAPNN method is summarized as shown in Algorithm 1.

Algorithm 1

Outline of the Proposed Algorithm

Fig. 1.

The example of two-class classification problem with $k = 7$ .

Fig. 2.

The differences between LMPNN and MAPNN.

4. Difference between MAPNN and LMPNN

As described in 2.1–2.3 and 3.2, LMKNN, PNN, LMPNN, MAPNN are all local classifiers. LMKNN utilizes a local mean vector of every class for classification. As an extension of LMKNN and PNN, k local mean vectors from each class are used in LMPNN rule, more class information can be represented compared to LMKNN and PNN rules [14]. MAPNN selects the k pseudo nearest neighbors from $k (k - 1) / 2$ ( $k > 1$ ) local mean vectors in each class to determine the query point pattern. the k pseudo nearest vectors can further reduce the negative impact of outliers in some degree. Hence, MAPNN can further enhance classification effect.

In order to compare MAPNN to LMPNN intuitively, for instance, a classification task on real numerical data set ‘Hill’ is illustrated. The dimensionality of the samples is reduced from 100 to 2 for visualization by Fisher Criterion [14,19]. The Fisher Criterion $F (f)$ is defined as: $\begin{matrix} (10) & \begin{array}{r} F (f) = \frac{\sum_{j = 1}^{M} \sum_{i = j + 1}^{M} P (c_{j}) P (c_{i}) {(μ_{f j} - μ_{f i})}^{2}}{\sum_{i = 1}^{M} P (c_{i}) σ_{f i}^{2}} \end{array} \end{matrix}$ where $P (c_{i}) = n_{i} / \sum_{j = 1}^{M} n_{j}$ , $n_{i}$ is the number of training samples from class $c_{i}$ in M classes, and $μ_{f i}$ and $σ_{f i}$ are the average and variance on feature f for class $c_{i}$ , respectively. Then, the features are listed in a descent order by the values of $F (f)$ and the first two features are selected for visualization.

A two class classification problem through the Fisher criterion for a query pattern x with $k = 7$ are shown in Fig. 1. The black square denotes a query point x from class 2, green asterisk and blue cross are the marks of the training samples from class 1 and class 2, respectively. For visualization, the orders of nearest neighbors are also marked by red numbers and black numbers. The classification results of the query sample x are displayed in Fig. 2(a) and Fig. 2(b).

Table 1
Distance values between x of class 2 and seven pseudo nearest neighbors in class 2 by using LMPNN and MAPNN

Method Distance

dist1 dist2 dist3 dist4 dist5 dist6 dist7

LMPNN $6.0586 * 10^{- 5}$ $1.1077 * 10^{- 4}$ $1.1699 * 10^{- 4}$ $2.4291 * 10^{- 4}$ $5.9442 * 10^{- 4}$ $2.4527 * 10^{- 4}$ $2.5013 * 10^{- 4}$

MAPNN $4.7795 * 10^{- 5}$ $4.9135 * 10^{- 5}$ $4.9381 * 10^{- 5}$ $5.8242 * 10^{- 5}$ $5.9442 * 10^{- 5}$ $6.8083 * 10^{- 5}$ $7.0208 * 10^{- 5}$

Method	Distance
LMPNN	$6.0586 * 10^{- 5}$	$1.1077 * 10^{- 4}$	$1.1699 * 10^{- 4}$	$2.4291 * 10^{- 4}$	$5.9442 * 10^{- 4}$	$2.4527 * 10^{- 4}$	$2.5013 * 10^{- 4}$
MAPNN	$4.7795 * 10^{- 5}$	$4.9135 * 10^{- 5}$	$4.9381 * 10^{- 5}$	$5.8242 * 10^{- 5}$	$5.9442 * 10^{- 5}$	$6.8083 * 10^{- 5}$	$7.0208 * 10^{- 5}$

Table 2

The data sets used in the experiments

Data	Samples	Attributes	Classes	Source
Vehicle	846	18	4	UCI
Balance	625	4	3	UCI
Blood	748	4	2	UCI
Bupa	345	6	2	UCI
Ionosphere	351	34	2	UCI
Hill	1210	100	2	UCI
Musk-1	476	166	2	UCI
Climate	540	18	2	UCI
Sonar	208	60	2	UCI
Wine	178	13	3	UCI
Seeds	210	7	3	UCI
Cardiotocography	2126	21	10	UCI
Band	365	19	2	KEEL
Wine(keel)	178	13	3	KEEL
Thyroid	7200	21	3	UCI
Segment	2310	19	7	UCI
Arrhythmia	7400	20	2	UCI
Robot	5456	24	4	UCI
Opt	5620	64	10	UCI
Image	2310	19	7	UCI
page	5473	10	5	UCI

In Fig. 2(a), ‘LMPNN_distance1’ and ‘LMPNN_distance2’ denote the values of the distances between the query point x and the seven pseudo nearest neighbors for two classes. As shown in Fig. 2(a), the query pattern x is wrongly classified to class 1 by using LMPNN. In MANPNN rule, $k (k - 1) / 2$ ( $k > 1$ ) local mean vectors for each class are calculated. Then, k pseudo local nearest neighbors are selected from $k (k - 1) / 2$ ( $k > 1$ ) local mean vectors. The k pseudo nearest neighbors can further reduce the negative influence of outliers. In Fig. 2(b), ‘MAPNN_distance1’ and ‘MAPNN_distance2’ are the values of the distances between the query point x and the pseudo nearest neighbors for two classes. From Fig. 2(b), it can be seen that the classification result is correct.

Table 1 lists the distances between the query point x of class 2 and seven selected nearest neighbors of class 2 by using MAPNN and LMPNN, respectively. It can be seen that the distance values of class 2 in MAPNN are less than the distance values in LMPNN. It implies that MAPNN can choose nearer pseudo nearest neighbors from more local mean neighbors comparing to LMPNN. Hence, MAPNN may further reduce the impact of outliers in some degree.

5. Experiments

To validate the classification performance of MAPNN rule, we conduct the extensive experiments from numerical real data sets and synthetic data sets to evaluate the performance of our proposed algorithm.

5.1. Data sets

In this subsection, we briefly give the information of the selected data sets used in our experiments.

5.1.1. Real data sets

Table 3
Synthetic data set: I

Parameter Val

$u_{1}$ 0

$\sum_{1}$ $I_{8}$

$u_{2}$ ${[3.86, 3.10, 0.84, 0.84, 1.64, 1.08, 0.26, 0.01]}^{T}$

$\sum_{2}$ $diag [8.41, 12.06, 0.12, 0.22, 1.49, 1.77, 0.35, 2.73]$

$u_{3}$ ${[3.86, 3.10, 0.84, 0.84, 1.64, 1.08, 0.26, 0.01]}^{T}$

$\sum_{3}$ $I_{8}$

$u_{4}$ 0

$\sum_{4}$ $diag [8.41, 12.06, 0.12, 0.22, 1.49, 1.77, 0.35, 2.73]$

Parameter	Val
$u_{1}$	0
$\sum_{1}$	$I_{8}$
$u_{2}$	${[3.86, 3.10, 0.84, 0.84, 1.64, 1.08, 0.26, 0.01]}^{T}$
$\sum_{2}$	$diag [8.41, 12.06, 0.12, 0.22, 1.49, 1.77, 0.35, 2.73]$
$u_{3}$	${[3.86, 3.10, 0.84, 0.84, 1.64, 1.08, 0.26, 0.01]}^{T}$
$\sum_{3}$	$I_{8}$
$u_{4}$	0
$\sum_{4}$	$diag [8.41, 12.06, 0.12, 0.22, 1.49, 1.77, 0.35, 2.73]$

Table 4

Synthetic data set: $II$

Parameter	Val
$u_{1}$	0
$\sum_{1}$	$I_{8}$
$u_{2}$	0
$\sum_{2}$	$4 I_{8}$
$u_{3}$	0
$\sum_{3}$	$6 I_{8}$
$u_{4}$	0
$\sum_{4}$	$8 I_{8}$

Table 5

Synthetic data set: $I - I$

Parameter	Val
$u_{1}$	0
$\sum_{1}$	$I_{8}$
$u_{2}$	${[2.56, 0, 0, 0, 0, 0, 0, 0]}^{T}$
$\sum_{2}$	$I_{8}$

The twenty-one numerical real data sets are taken from the UCI Machine Learning Repository [2] and the KEEL Repository [1], which are Vehicle, Balance, Blood, Bupa, Ionosphere, hill, musk-1, climate, Sonar, Wine, Seeds, Cardiotocography, Band, Wine(keel), Thyroid, segment, arrhythmia, Robot, Opt, Image and Page, respectively. Among these twenty-one real-world data sets, The numbers of total samples, attributes, classes of each data set and source are also listed in Table 2. From Table 2, it can be seen that the data sets have different characteristics in numbers of attributes, samples, and classes. The numbers of all samples in the selected data sets are mostly small. These data sets can be well used to verify the proposed method classification performance in the small training sample size cases.

Table 6

Synthetic data set: Ness

Parameter	Val
$u_{1}$	0
$\sum_{1}$	$I_{8}$
$u_{2}$	${[1 / 2, 0, 0, 0, 0, 0, 0, 1 / 2]}^{T}$
$\sum_{2}$	$diag [0.5, 0, 0, 0, 0, 0, 0, 0.5]$

Table 7

The average error rate of each classifier with the corresponding standard deviations (stds) on twenty one real-world datasets

Data	KNN	WKNN	LMKNN	PNN	LMPNN	MAPNN
Vehicle	0.2972 ± 0.0131	0.2854 ± 0.0068	0.2676 ± 0.0138	0.2848 ± 0.0055	0.2526 ± 0.0146	0.2437 ± 0.0179
Balance	0.1295 ± 0.0391	0.1722 ± 0.0216	0.1 ± 0.0237	0.1068 ± 0.0219	0.0946 ± 0.0284	0.0741 ± 0.0323
Blood	0.2366 ± 0.0329	0.2586 ± 0.0283	0.2390 ± 0.0199	0.2432 ± 0.0164	0.2472 ± 0.0138	0.2330 ± 0.0210
Bupa	0.3592 ± 0.0190	0.3655 ± 0.0109	0.3420 ± 0.0241	0.3527 ± 0.0140	0.3594 ± 0.0153	0.3434 ± 0.0169
Ionosphere	0.1575 ± 0.0117	0.1514 ± 0.0090	0.1039 ± 0.0132	0.1437 ± 0.0080	0.1047 ± 0.0122	0.0938 ± 0.0153
Hill	0.4694 ± 0.0245	0.4251 ± 0.0055	0.3925 ± 0.0271	0.4357 ± 0.0117	0.3339 ± 0.0251	0.2641 ± 0.0636
Musk-1	0.1927 ± 0.0320	0.1524 ± 0.0034	0.1276 ± 0.0113	0.1448 ± 0.0049	0.1061 ± 0.0122	0.1012 ± 0.0134
Climate	0.0824 ± 0.0115	0.0800 ± 0.0133	0.0812 ± 0.0076	0.0795 ± 0.0087	0.0787 ± 0.0089	0.0746 ± 0.0091
Sonar	0.2384 ± 0.0681	0.1463 ± 0.0063	0.1562 ± 0.0287	0.1399 ± 0.0042	0.1098 ± 0.0113	0.1105 ± 0.0100
Wine	0.0373 ± 0.0092	0.0415 ± 0.0057	0.0233 ± 0.0087	0.0359 ± 0.0082	0.0216 ± 0.0077	0.0224 ± 0.0079
Seeds	0.0735 ± 0.0072	0.0669 ± 0.0038	0.0716 ± 0.0041	0.0738 ± 0.0046	0.0690 ± 0.0046	0.0640 ± 0.0035
Cardiotocography	0.2488 ± 0.0202	0.2182 ± 0.0024	0.2109 ± 0.0046	0.2086 ± 0.0030	0.1910 ± 0.0065	0.1925 ± 0.0071
Band	0.3208 ± 0.0292	0.2632 ± 0.0058	0.2917 ± 0.0133	0.2590 ± 0.0104	0.2620 ± 0.0058	0.2652 ± 0.0090
Wine(keel)	0.0337 ± 0.0064	0.0373 ± 0.0054	0.0286 ± 0.0063	0.0320 ± 0.0071	0.0292 ± 0.0028	0.0272 ± 0.0051
Thyroid	0.0618 ± 0.0043	0.0597 ± 0.0060	0.0579 ± 0.0049	0.0594 ± 0.0035	0.0545 ± 0.0057	0.0546 ± 0.0057
Segment	0.0471 ± 0.0087	0.0339 ± 0.0024	0.0376 ± 0.0061	0.0340 ± 0.0038	0.0253 ± 0.0014	0.0259 ± 0.0007
Arrhythmia	0.3351 ± 0.0416	0.2927 ± 0.0210	0.1919 ± 0.0577	0.2971 ± 0.0206	0.1395 ± 0.0497	0.0558 ± 0.0476
Robot	0.1490 ± 0.0188	0.1158 ± 0.0016	0.1249 ± 0.0086	0.1165 ± 0.0029	0.1038 ± 0.0050	0.1055 ± 0.0037
Opt	0.0143 ± 0.0019	0.0118 ± 0.0005	0.0098 ± 0.0012	0.0111 ± 0.0005	0.0082 ± 0.0011	0.0085 ± 0.0011
Image	0.0469 ± 0.0092	0.0306 ± 0.0020	0.0369 ± 0.0065	0.0320 ± 0.0039	0.0241 ± 0.0008	0.0243 ± 0.0008
Page	0.0458 ± 0.0042	0.0394 ± 0.0008	0.0401 ± 0.0023	0.0406 ± 0.0024	0.0340 ± 0.0022	0.0338 ± 0.0025
Average	0.1704 ± 0.0197	0.1547 ± 0.0078	0.1398 ± 0.0140	0.1491 ± 0.0080	0.1262 ± 0.0112	0.1152 ± 0.0141

Fig. 3.

The classification error rates of each method via k on each numerical data set.

Fig. 4.

The classification error rates of each method via k on each numerical data set.

5.1.2. Synthetic data sets

In addition to the real data sets, the Gaussian synthesized data sets are used in this experiment. I [25], II [25], I-I [7,25] and Ness [19,20] are adopted in the experiments, the four data sets consists of 8-dimensional Gaussian data. The settings of the four data sets are shown in the following Table 3–6, where $u_{i}$ is the mean vector, $\sum_{i}$ is the covariance matrix of ith class, the number of each class for the four synthetic data sets is 100. The advantage of adopting these four data sets is that the number of training and testing samples can be easily adjusted.

5.2. Experiments on the real data sets

The experiments are carried out by 10-fold cross validation. The value of the nearest neighbor parameter k are varied from 1 to 20 with a step of 1. Classification performance is determined by averaging 10 times classification error rate.

The comparison results in terms of classification performance of each classifier by giving the average error rate is shown in Table 7. Note that the best classification results for each data set are shown in bold. It can be observed that the proposed MAPNN method achieves better performance, compared to other five classifiers on the whole. The experimental result illustrates the fact that: 1) MAPNN is effective in the situation of small size samples case, such as Vehicle, Balance, Blood, Ionosphere and seeds. 2) MAPNN rule achieves almost better performance on the whole, compared to other five methods.

To further illustrate the effectiveness of the proposed MAPNN, The experimental comparisons on each real numerical data set at different values of k are displayed in Figs 3 and 4. It can be seen from Figs 3 and 4 that the classification error rate of the proposed MAPNN is lower than other five classification methods in most cases at relatively large values of k, and MAPNN is less sensitive to k.

The experimental results illustrate that the proposed MAPNN method is effective for classification.

However, the comparisons of six classification methods in terms of classification error rates cannot be convincing in statistics. Hence, a nonparametric statistical test called Friedman test [4,5,8] is conducted to compare the differences of the experimental results of the proposed MAPNN method and other five classifiers. The Friedman test ranks the classification results shown in Table 7 of the six classifiers on each data set. Let $R_{m}^{i}$ be the rank of the mth of n methods on the ith of t data sets. The average rank of mth method is $R_{m} = \frac{1}{t} \sum_{i = 1}^{t} R_{i}^{m}$ . The Friedman statistics is defined as Equation (11): $\begin{matrix} (11) & χ_{F}^{2} = \frac{12 t}{n (n + 1)} [\sum_{m} R_{m}^{2} - \frac{n {(n + 1)}^{2}}{4}] . \end{matrix}$ When $t > 10$ and $n > 5$ , Friedman statistics is distributed according to $χ_{F}^{2}$ with $n - 1$ degrees of freedom.

Table 8
The average ranks of six methods using Friedman test on twenty-one real-world datasets

Method KNN WKNN LMKNN PNN LMPNN MAPNN

Average $rank (R_{i})$ 5.523 4.428 3.523 3.904 2.047 1.571

Method	KNN	WKNN	LMKNN	PNN	LMPNN	MAPNN
Average $rank (R_{i})$	5.523	4.428	3.523	3.904	2.047	1.571

Fig. 5.

The classification error rates of each method via k on each synthetic data set.

The Friedman test are adopted to investigate the classification performance of the proposed MAPNN, compared to the other five classification methods. The average ranks of six methods are listed in Table 8 according to the classification results from Table 7. According to Equation (11), $χ_{F}^{2} = 65.70$ is obtained by using the average ranks. Under the null hypothesis, the average rank $\bar{R} = 3.499$ can be obtained if six methods perform similarly and $χ_{F}^{2}$ with $6 - 1$ degrees of freedom at $α = 0.05$ is ${(χ_{F}^{2})}_{0.05} = 11.07$ . As shown in Table 8, the average ranks $R_{i}$ of six methods are different from the average rank $\bar{R}$ , and the average ranks between the proposed MAPNN and the other methods are also different. Moreover, $χ_{F}^{2}$ for six methods is significantly larger than ${(χ_{F}^{2})}_{0.05}$ . Thus, the nonparametric statistical test demonstrates that the six KNN-based classifiers are different, and the proposed MAPNN is effective.

Table 9

The average error rate of each classifier with the corresponding standard deviations (stds) on the four synthetic data sets

Data sets	KNN	WKNN	LMKNN	PNN	LMPNN	MAPNN
I	0.2711 ± 0.0258	0.2932 ± 0.0172	0.2297 ± 0.0326	0.2605 ± 0.0212	0.2511 ± 0.0295	0.2277 ± 0.0356
II	0.2742 ± 0.0207	0.2831 ± 0.0230	0.2447 ± 0.0317	0.2538 ± 0.0244	0.2591 ± 0.0230	0.236 ± 0.0340
I-I	0.3895 ± 0.0716	0.3082 ± 0.0267	0.279 ± 0.0573	0.3282 ± 0.0411	0.2432 ± 0.0308	0.1575 ± 0.0306
Ness	0.398 ± 0.0731	0.3232 ± 0.0253	0.2497 ± 0.0419	0.3385 ± 0.0395	0.1992 ± 0.0261	0.1502 ± 0.0355
Average	0.3332 ± 0.0478	0.3019 ± 0.0230	0.2508 ± 0.0409	0.2952 ± 0.0315	0.2381 ± 0.0274	0.1928 ± 0.0339

Fig. 6.

The error rates of MAPNN, KNN, WKNN, LMKNN, PNN and LMPNN with varying the training sample size N on each synthetic data set.

5.3. Experiments on synthetic data sets

In this section, in order to further verify the performance of the proposed MAPNN, the experiments are conducted on four synthetic data sets in terms of classification error comparing to KNN, WKNN, LMKNN, PNN, LMPNN. The values of k are varied from 1 to 20 with a step of 1. The classification results of each method via k are shown in Fig. 5.

From Fig. 5, it can be seen that the classification error rate of the proposed MAPNN method is mostly lower than other five methods with relatively large values of the neighborhood size k. We can also observe that the error rates of KNN, WKNN, LMKNN, PNN, and LMPNN increase incrementally as the value of k increases on I-I and Ness data sets. Unlike KNN, WKNN, PNN, LMKNN and LMPNN, the error rates of MAPNN decrease initially when the values of k are small, and then tend to be a consistently stable at large values of k on these four artificial data sets. It illustrates that the proposed MAPNN method exhibits stronger robustness over a large interval of k values. The average error rate of six methods are listed in Table 9. The lowest classification error of six methods on each synthetic data set is highlighted in bold-face. It is obvious that the proposed MAPNN method almost has a better classification performance than other five methods.

Furthermore, the experiments on these artificial data sets by adjusting the number of training samples are conducted to verify the classification performance of the proposed MAPNN method. The comparison results are obtained according to classification errors. For I and II data sets, the training samples from 60 to 240 with an interval of 20 are randomly generated, and the number of testing samples is 400. The training sample numbers of I-I and Ness data sets are randomly generated with the number from 120 to 480 with step 40, and the number of the test samples is set as 400. The experimental results are shown in Fig. 6. The neighborhood size k are randomly preset as 11 and 13 on I and II, 13 and 15 on I-I and Ness. From Fig. 6, it can be observed that the classification performance of the proposed MAPNN method is superior to KNN, WKNN, LMKNN, PNN and LMPNN with varying the number of the training samples on the whole. The experimental results demonstrate that MAPNN has good classification performance.

6. Conclusions

In this paper, a multi-average based pseudo nearest neighbor classifier (MAPNN) is newly proposed to improve the classification performance in the case of small size training samples with outliers. $k (k - 1) / 2$ ( $k > 1$ ) pseudo local mean vectors are obtained corresponding to k nearest neighbors of each class in MAPNN rule. Then, the k pseudo local vectors of each class are selected to determine the class label of a query point. The extensive experiments on twenty-one real numerical data sets and four synthetic data sets are conducted by comparing MAPNN with other five KNN-based methods. The experimental results illustrate that this new classification method is effective and less sensitive to the neighborhood size k in some degree.

References

Alcalá-Fdez ,

Fernández ,

Luengo ,

Derrac ,

Garála ,

Sánchez and

Herrera , Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework, Journal of Multiple-Valued Logic and Soft Computing 17 (2011), 255–287.

Bache and

Lichman , UCI Machine Learning Repository, 2013, http://archive.ics.uci.edu/ml/ .

Cover and

Hart , Nearest neighbor pattern classification, Journal of Robotics and Control 13(1) (1967), 21–27.

Demšar , Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7 (2006), 1–30.

Derrac ,

Garc’ia ,

Molina and

Herrera , A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms, Swarm and Evolutionary Computation 1(1) (2011), 3–18. doi:10.1016/j.swevo.2011.02.002.

S.A.

Dudani , The distance-weighted k-nearest-neighbor rule, IEEE Transactions on Systems, Man, and Cybernetics 6(4) (1976), 325–327. doi:10.1109/TSMC.1976.5408784.

Fukunaga , Introduction to Statistical Pattern Recognition, 2th edn, Academic Press, San Diego, 1990.

Garcia and

Herrera , An extension on statistical comparisons of classifiers over multiple datasets for all pairwise comparisons, Journal of Machine Learning Research 9(12) (2008), 2677–2694.

Gou ,

Ma ,

Ou ,

Zeng ,

Rao and

Yang , A generalized mean distance-based k-nearest neighbor classifier, Expert Systems With Applications 115 (2019), 356–372. doi:10.1016/j.eswa.2018.08.021.

10.

Gou ,

Qiu ,

Zhang ,

Shen ,

Zhan and

Ou , Locality constrained representation-based K-nearest neighbor classification, Knowledge-Based Systems 167(1) (2019), 38–52. doi:10.1016/j.knosys.2019.01.016.

11.

Gou ,

Qiu ,

Zhang ,

Xu ,

Mao and

Zhan , A local mean representation-based K-nearest neighbor classifier, ACM Transactions on Intelligent Systems and Technology 10(3) (2019), 1–25. doi:10.1145/3319532.

12.

Gou ,

Qiu ,

Zhang ,

Xu ,

Mao and

Zhan , A local mean representation-based K-nearest neighbor classifier, ACM Transactions on Intelligent Systems and Technology 10(3) (2019), 1–25. doi:10.1145/3319532.

13.

Gou ,

Sun ,

Du ,

Ma ,

Xiong and

Ou , A representation coefficient-based k-nearest centroid neighbor classifier, Expert Systems With Applications 194(15) (2022), 38–52.

14.

Gou ,

Zhan ,

Rao ,

Shen ,

Wang and

He , Improved pseudo nearest neighbor classification, Knowledge-Based Systems 70 (2014), 361–375. doi:10.1016/j.knosys.2014.07.020.

15.

Gou ,

Zhang ,

Du and

Xiong , A local mean-based k-nearest centroid neighbor classifier, Computer Journal 55(9) (2012), 1058–1071. doi:10.1093/comjnl/bxr131.

16.

Li ,

Y.W.

Chen and

Y.Q.

Chen , The nearest neighbor algorithm of local probability centers, IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics) 38(1) (2008), 141–154. doi:10.1109/TSMCB.2007.908363.

17.

Ma ,

Huang ,

Yan ,

Li and

Wang , Attention-based local mean k-nearest centroid neighbor classifier, Pattern Recognition 201(1) (2022).

18.

Memis ,

Enginoglu and

Erkan , Fuzzy parameterized fuzzy soft k-nearest neighbor classifier, Neurocomputing 500 (2022), 351–378. doi:10.1016/j.neucom.2022.05.041.

19.

Mitani and

Hamamoto , A local mean-based nonparametric classifier, Pattern Recognition Letters 27(10) (2006), 1151–1159. doi:10.1016/j.patrec.2005.12.016.

20.

J.V.

Ness , On the dominance of non-parametric Bayes rule discriminant algorithms in high dimensions, Pattern Recognition 12(6) (1980), 355–368. doi:10.1016/0031-3203(80)90012-6.

21.

Pan ,

Wang and

Wang , A new globally adaptive k-nearest neighbor classifier based on local mean optimization, Soft Computing 25 (2021), 2417–2431. doi:10.1007/s00500-020-05311-x.

22.

Pan ,

Wang and

Ku , A new k-harmonic nearest neighbor classifier based on the multi-local means, Expert Systems with Applications 67 (2017), 115–125. doi:10.1016/j.eswa.2016.09.031.

23.

Wu ,

Kumar ,

J.R.

Quinlan ,

Ghosh ,

Yang ,

Motoda and

Zhou , Top 10 algorithms in data mining, Knowledge Information System 14(1) (2008), 1–37. doi:10.1007/s10115-007-0114-2.

24.

Zeng ,

Yang and

Zhao , Nonparametric classification based on local mean and class statistics, Expert Systems with Applications 36(4) (2009), 8443–8448. doi:10.1016/j.eswa.2008.10.041.

25.

Zeng ,

Yang and

Zhao , Pseudo nearest neighbor rule for pattern classification, Expert Systems with Applications 36(2) (2009), 3587–3595. doi:10.1016/j.eswa.2008.02.003.

26.

Zhang , Challenges in KNN classification, IEEE Transactions on Knowledge and Data Engineering 34(10) (2021), 4663–4675. doi:10.1109/TKDE.2021.3049250.

27.

Zhang ,

Li ,

Zong ,

Zhu and

Wang , Efficient kNN classification with different numbers of nearest neighbors, IEEE Transactions on Neural Networks and Learning Systems 29(5) (2019), 1774–1785. doi:10.1109/TNNLS.2017.2673241.

Method	Distance

	dist1	dist2	dist3	dist4	dist5	dist6	dist7
LMPNN	$6.0586 * 10^{- 5}$	$1.1077 * 10^{- 4}$	$1.1699 * 10^{- 4}$	$2.4291 * 10^{- 4}$	$5.9442 * 10^{- 4}$	$2.4527 * 10^{- 4}$	$2.5013 * 10^{- 4}$
MAPNN	$4.7795 * 10^{- 5}$	$4.9135 * 10^{- 5}$	$4.9381 * 10^{- 5}$	$5.8242 * 10^{- 5}$	$5.9442 * 10^{- 5}$	$6.8083 * 10^{- 5}$	$7.0208 * 10^{- 5}$

A multi-average based pseudo nearest neighbor classifier

Abstract

Keywords

1. Introduction

2. Related work

2.1. LMKNN classifier

2.2. PNN classifier

2.3. LMPNN classifier

3. The proposed MAPNN classifier

3.1. The basic idea

3.2. The MAPNN classification rule

5.1. Data sets

5.1.1. Real data sets

5.2. Experiments on the real data sets

Table 8 The average ranks of six methods using Friedman test on twenty-one real-world datasets Method KNN WKNN LMKNN PNN LMPNN MAPNN Average rank ( R i ) 5.523 4.428 3.523 3.904 2.047 1.571

6. Conclusions

References

Table 8
The average ranks of six methods using Friedman test on twenty-one real-world datasets

Method KNN WKNN LMKNN PNN LMPNN MAPNN

Average $rank (R_{i})$ 5.523 4.428 3.523 3.904 2.047 1.571