A novel adaptive k -NN classifier for handling imbalance: Application to brain MRI

Abstract

The problem of efficiently classifying imbalanced data has become one of the most challenging tasks in machine learning. Some real world examples include medical image analysis, fraud detection, fault diagnosis, and anomaly detection. Although several data-level algorithms have been developed to address imbalance, they are typically subject to some restrictions. We propose a novel variant of the $k$ -NN family of classifiers, and name this as Density-based Adaptive-distance $k$ NN (DA $k$ NN). It can effectively handle data with skewed distributions and varying class-densities using the concept of adaptive distance. Comparative superiority is experimentally established over related data-level algorithms (SMOTE, ADASYN), using ten sets of two-class data, in terms of geometric mean (of the true positive and negative rates) and accuracy. Additionally, five sets of multi-class data are considered and compared with different variants of $k$ -NN, which are currently very popular. Finally, DA $k$ NN is successfully applied on the highly imbalanced Lower Grade Glioma (LGG) MR images, with an Average-Dice score of 0.9082 for delineating the tumor regions. The results demonstrate clear superiority over state-of-the-art algorithms.

Keywords

k-NN classifier adaptive distance skewed data brain tumors medical imagery

1. Introduction

The problem of imbalanced data arises in a variety of applications, and continues to attract the machine learning community around the world to explore from various perspectives. A dataset is termed imbalanced or skewed when the number of patterns in one or more class(es) is considerably higher than that of the other class(es). Classifying such imbalanced dataset is difficult due to the inherent bias of training towards the majority class(es). Usually the error rate increases when the minority classes do not have sufficient patterns to represent them in the dataset [1], thereby often getting treated as noise. In the year 2009 He and Garcia [2], published an article on learning from imbalanced data. This was followed by several research-works demonstrating the necessity of different approaches and associated variety of methodologies for handling varied applications of imbalanced data.

Designing a classifier for imbalanced datasets is a challenging problem in machine learning. It is caused by the skewed distribution of data among the classes. Most standard algorithms assume balanced class distribution with equal misclassification cost. Therefore, when presented with large imbalanced datasets, most of these algorithms fail to properly represent their distributive characteristics. There are two main approaches to be considered, viz. data-level and algorithm-level. Most of the studies in the first category apply resampling techniques to get a balanced distribution. One can either under-sample the majority class [3] or over-sample the minority class [4], or pursue both strategies concurrently; the Synthetic Minority Over-sampling TEchnique (SMOTE) [5] being a typical example. At algorithmic level, one can apply boosting or adjust the misclassification cost. A boosting algorithm constructs a strong classifier from some baseline classifiers, like SVM or $k$ -NN [2].

1.1 Contribution

The contribution is threefold: Firstly a new algorithm based on $k$ -NN classifier, using a modified adaptive distance measure, is proposed for handling skewed distribution involving different densities (number of points in unit area/volume). Secondly, it is validated with state-of-the-art algorithms and found to be superior for both two- and multi-class problem. Thirdly, it is also evaluated on medical image segmentation, which typically consists of imbalanced data, and found to conform to the ground truth. The main advantage of the proposed method is that it can be applied to any type of data, whether small or large, balanced or imbalanced.

The rest of the paper is organized as follows. Related works are reported in Section 2. Section 3 provides an overview of the existing variants of nearest neighbor classifiers, as reported in literature. Section 4 presents the proposed algorithm Density-based Adaptive distance $k$ -NN (DA $k$ NN). This is followed by the experimental results in Section 5, on 15 publicly available datasets along with an application in medical image analysis, to establish its effectiveness in different scenarios. Some state-of-the-art methods, described in the following section, are compared with our algorithm DA $k$ NN. Finally Section 6 concludes the article.

2. Related works

In recent years, different approaches to classify imbalanced data has become one of the focused tasks in machine learning [6]. This situation is extremely common in many real world classification problems, like fault diagnosis [7, 8, 9, 10], anomaly detection [11, 12, 13], fraud detection [14], medical diagnosis [15, 16, 17, 18, 19, 20], image segmentation [20], face recognition [21], spam detection [22], vehicle classification [23], land cover classification [24, 25], string space [26], face detection [27, 28], train management [29], transportation system [30], etc. To overcome this problem many algorithms have been developed. Japkowicz [31] provided a comparison between performances of different classifiers on imbalanced data. Hulse et al. [32] presented a comprehensive and systematic experimental analysis of learning from imbalanced data. Batista et al. [33] studied the behavior of several methods for balancing training data in machine learning. Different sampling methods were developed [2, 34, 35], for imbalanced data, using Synthetic Minority Over-sampling TEchnique (SMOTE) [36, 5, 37, 38], cluster-based under-sampling [39], feature selection [40] and suitable classifiers [41, 42, 43, 44, 45, 46, 47]. However most of them were implemented in the binary classification scenario, and are not suited for handling multi-class problems. Good review articles on handling imbalanced data are also available [48, 49, 50, 51]. Many of the recent publications demonstrate the importance, necessity and difficulty in solving problems with imbalance data and its possible solutions [41, 42, 48, 7, 22, 15, 16, 29, 36, 8, 20, 24, 23, 11, 17, 25, 30, 26, 18, 40, 27, 43, 44, 28, 34, 45, 46, 52, 53].

All these methods, although successful in several applications, may not be suitable for all problems, particularly those involving medical images; where simply changing the data statistics by augmentation may not be an appropriate strategy. In medical data typically the disease (or abnormal) region(s) in images are very small in size as compared to the larger normal regions; thus making it an important application of imbalanced data. Detection of the diseased class, which normally smaller in size, and its accurate delineation in a medical image is typically more important for proper diagnosis and prognosis. Since most of the treatment and cure are based on this analysis, it makes the problem even more challenging in the framework of machine learning.

Although many algorithms have been developed over the years, yet finding a better (or improved) solution remains a major issue in medical image analysis. It is to be noted that data-level approaches (like over-sampling, under-sampling) are not suitable for this kind of problems. Although use of some variations of SMOTE have been reported in medical image segmentation [20], yet it is more appropriate to use an algorithm-level method such that the data statistics remains unchanged. In this scenario we propose the Density-based Adaptive distance $k$ -NN (DA $k$ NN) algorithm.

3. Nearest neighbour classification and some variants of k-NN

Nearest neighbor (NN) classifier [54] constitutes a popular family of non-parametric supervised learning models which can effectively identify an unknown sample $x_{0}$ with one of the predefined classes $\mathcal{C}_{j}$ . It is one of the fundamental and simplest classifiers in literature, and requires little or no prior domain knowledge about the data. It was developed from the need to perform discriminant analysis in the absence of reliable parametric estimates of probability densities. Initially a set of distances is computed from the query instance $x_{0}$ to each of the training patterns $x_{i}$ . The decision is based on the class label of its nearest neighbor from the training patterns.

Let $X=\{x_{i}\},i=1,\ldots,N$ , be the training set with each $x_{i}$ belonging to a class $\mathcal{C}_{j},j=1,\ldots,c$ . The distance of $x_{0}$ from $x_{i}$ is computed as

$\displaystyle d_{i0}=||x_{i}-x_{0}||\mbox{ for }i=1,2,\ldots,N,$ (1)

where $||.||$ indicates the distance norm (like absolute, Euclidean, etc.). The nearest neighbor $x_{i^{\prime}}$ is selected as

$\displaystyle x_{i^{\prime}}=\text{argmin}_{x_{i}}{\{d_{i0}\}},$ (2)

with $x_{0}$ being assigned the class label $\mathcal{C}_{j^{\prime}}$ corresponding to $x_{i^{\prime}}$ .

It can be shown mathematically, for $k=1$ and $n\rightarrow\infty$ , that the $k$ -NN classification error is bounded above by twice the Bayesian error rate [54]. The well-known $k$ -NN classifier [54] applies majority voting to determine the class label of $x_{0}$ , based on those of its $k$ nearest neighbors $\mathcal{N}_{k}(x_{0})$ . This is computed after sorting the $d_{i0}$ ’s of Eq. (1) in an ascending order, and considering the first $k$ such training patterns with the class label for $x_{0}$ being assigned as the one having the maximum votes $\mathcal{C}_{j^{\prime}}$ amongst the $\mathcal{N}_{k}(x_{0})$ , here $k$ can be 1, 2, 3, …etc.

This was followed by a series of investigations encompassing new approaches with a reject option [55], refinements with respect to Bayes error rate [56], distance weighted approaches [57, 58], soft computing [59], and fuzzy sets [60, 61]. Subsequently a simple adaptive distance measure [62] and threshold on voting between two classes [63] were introduced. However in situations involving skewed data, where the cardinality of a class is very high as compared to that of the other class(es), all these algorithms are found to present a bias favoring the class with high cardinality.

The basic $k$ -NN, as defined by Cover and Hart [54], is described in Section 1. We consider here some of its variants, popularly used for avoiding the dangers of improperly modeling the underlying data distribution in various applications.

A simple and intuitively appealing extension to the $k$ -NN classifier uses a neighbor weighting function for assigning a class to a query pattern [57]. The $i$ th nearest neighbor of pattern ${x}_{0}$ is provided with a weight $w_{i}$ , for $i=1,2,\ldots,k$ , defined as

$\displaystyle w_{i}=\left\{\begin{array}[]{ll}\frac{d_{k0}-d_{i0}}{d_{k0}-d_{1% 0}}&\textrm{if }d_{k0}\neq d_{10},\\ 1&\textrm{otherwise},\\ \end{array}\right.$ (3)

where $d_{10}$ , $d_{k0}$ are the distances from its nearest and farthest ( $k$ th) neighbors, respectively. The algorithm assigns $x_{0}$ to that class, for which the sum of weights of all its representatives (amongst the $k$ nearest neighbors) is the maximum. However, this algorithm was shown to be not necessarily better than $k$ -NN for small sample sizes, when ties among classes are resolved judiciously [58]. In the infinite sample case $k$ -NN is found to be the best [58].

The Pseudo nearest neighbor (P $k$ NN) classifier [64] considers local learning in each class, with $N$ training patterns from the “ $c$ ” classes being weighted class-wise on the basis of their distances from pattern $x_{0}$ . This helps incorporate not only the information about its nearest pattern, but also that of its $(k-1)$ neighbors from every class. We have $w^{\prime}_{ij}=\frac{1}{i}\textrm{ for }i=1,2,\ldots,k,j=1,2,\ldots,c$ , such that $w^{\prime}_{ij}$ decreases with increasing $i$ (because $i$ is the rank, proportional to distance, of the $k$ nearest neighbourhood). The weighted distance of $x_{0}$ , from its $k$ -nearest neighbors, is computed as

$\displaystyle d_{0}^{j}=\sum_{i=1}^{k}d_{i0}^{j}*w^{\prime}_{ij},\textrm{ for % }j=1,2,\ldots,c,$ (4)

with its class assignment $j^{\prime}$ for $x_{0}$ being made by

$\displaystyle j^{\prime}=\text{argmin}_{j}{\{d_{0}^{j}\}}.$ (5)

Algorithm T $k$ NN [63] was developed to handle high-dimensional document data involving sparsity. Here $k$ is not fixed, but is dependent on the threshold between the highest and second-highest cardinalities of neighboring pattern classes. Authors consider the distance of the $N$ (normally taken as a suitably large value) patterns from ${x}_{0}$ . For a choice of $k>\beta$ , where $\beta$ is a user-defined parameter, let the class having the highest count be $\mathcal{C}_{j^{\prime}}$ and that with the second-highest count be $\mathcal{C}_{j^{\prime\prime}}$ .

The algorithm gets outlined as follows. Initialization: At start the value of $k$ will be same as $\beta$ . { $\textrm{nearest neighbors}\ k<N$ } $\{\arrowvert\mathcal{C}_{j^{\prime}}\arrowvert-\arrowvert\mathcal{C}_{j^{% \prime\prime}}\arrowvert\}=\beta$ $\textrm{assign}\ x_{0}\ \textrm{with\ class}\ \mathcal{C}_{j^{\prime}};$ exit $\textrm{increment}\ k=k+1;$ $x_{0}$ remains unassigned [otherwise] exit

The $k$ -NN family of classifiers has also been fuzzified to design the fuzzy $k$ -NN (F $k$ NN) model [61]. This is another approach to handle overlapped classes. Here each pattern is assigned a finite membership value $\mu_{j}$ to each of the $c$ classes, with ${x}_{0}$ being labeled with that class $\mathcal{C}_{j^{\prime}}$ having the maximum value of $\mu_{j^{\prime}}$ . While in $k$ -NN every point of a class is considered to be equally important for identifying the class label of an unknown query pattern, the F $k$ NN associates each such pattern ${x}_{i}\in X$ with a membership value based on the local overlapping amongst different classes. Besides, here each NN pattern is assigned a finite membership value inversely proportional to its distance from the query pattern.

Initially the $k$ nearest neighbors $\mathcal{N}_{k}({x}_{0})$ of ${x}_{i}\in X$ are identified, and sorted in ascending order of their distance $d_{i0}$ by Eq. (1). Let the number of patterns in $\mathcal{N}_{k}({x}_{0})$ belonging to class $\mathcal{C}_{j}$ be $n_{j}$ . Then all $x_{i}\in\mathcal{N}_{k}({x}_{0})$ are assigned membership $\mu_{j}({x}_{i})$ to each class $\mathcal{C}_{j}$ using

$\displaystyle\mu_{j}({x}_{i})=\left\{\begin{array}[]{ll}0.51+\frac{n_{j^{% \prime}}}{k}*0.49,&\textrm{if }j=j^{\prime}\\ \frac{n_{j}}{k}*0.49,&\textrm{for }j=1,2,\ldots,c,\ j\neq j^{\prime}.\end{% array}\right.$ (6)

The test pattern ${x}_{0}$ is assigned membership $\mu_{j}({x}_{0})$ to class $\mathcal{C}_{j}$ , $j=1,2,\ldots,c$ , as

$\displaystyle\mu_{j}({x}_{0})=\frac{\sum_{i=1}^{k}\mu_{j}({x}_{i})*\omega_{i0}% }{\sum_{i=1}^{k}\omega_{i0}},$ (7)

where $\omega_{i0}=\frac{1}{(d_{i0})^{\frac{2}{p-1}}}$ and $p>1$ is a fuzzifier which determines the weighting of the distance component. The class $\mathcal{C}_{j^{\prime}}$ , for which $\mu_{j^{\prime}}({x}_{0})$ is the maximum, is assigned as the class label of ${x}_{0}$ . As $p\rightarrow 1$ , the nearer neighbors get weighted more heavily than those farther away; such that the number of training patterns influencing the classification of query pattern ${x}_{0}$ get reduced.

Apart from those mentioned above algorithmic variants there are some popular data-level over-sampling approaches also. SMOTE [5] is a simple and intuitively-appealing algorithm which over-samples the minority class(es) in a systematic manner. It is very effective in handling imbalanced data, and generates data points in a minority class by using the feature-space instead of the data-space.

Let us consider $k$ nearest points $x_{i_{k}}$ of a minority class sample $x_{i}$ . This is over-sampled by introducing synthetic points along the straight line connecting all or any nearest point $x_{i_{k}}$ with $x_{i}$ . Each straight line can contain only one synthetic point. The number of synthetic points for $x_{i}$ is selected using the imbalance ratio between the classes, with the selection of the straight lines (from among the $k$ lines) being purely random. The algorithm, thereby, can generate imaginary data points in the feature-space to be used to train any classifier. As the reality of such imaginary points in the data-space remains unknown, therefore this algorithm is often unsuitable for many applications (like in image segmentation).

A major disadvantage of SMOTE is that, the algorithm gives same priority to all points whether or not they lie in the overlapped or non-overlapped regions; thereby leading to the generation of some unnecessary data points in non-overlapped regions. The objective of the ADAptive SYNthetic (ADASYN) [38] sampling approach for imbalanced learning is to minimize this problem. ADASYN is a variant of SMOTE, and handles the skewed distribution in imbalanced data by assigning different priority (weight) to the minority class points. First the number of minority points $G$ , required to balance the dataset, is calculated. Using the density distribution of the dataset, it then chooses the minority class regions which need to be over-sampled. Finally it randomly over-samples the chosen region(s), by inserting data point(s) between randomly chosen pair(s) of samples belonging to the minority class.

Although both SMOTE and ADASYN perform well in a class imbalanced scenario, there exist some problems. Both the algorithms use pre-processing (over-sampling), thereby increasing the computational cost by generating these (imaginary) data points. The physical significance of these imaginary points remains unknown. Moreover both the algorithms operate mainly on two-class problems, and extending them to the multi-class scenario is not straight-forward. Research has considered One vs one [65] or One-against-all [66] strategies, to extend this to the multi-class domain.

In order to circumvent this situation, we propose DA $k$ NN. The algorithm works on both the two-class as well as multi-class scenarios, without requiring any modification of data statistics. It is elaborated in the next section.

4. Incorporating adaptive distance and data density: Algorithm DAkNN

Our algorithm considers the density of a class and the adaptive distance with respect to neighboring points. Both these concepts, separately, have been used in machine learning in various ways, towards achieving better estimation.

One such adaptive distance based $k$ -NN is the Simple Adaptive Distance Measure (SADM) [62]. It introduces a new distance measure to circumvent some of the problems in a two-class scenario. Let $R(x_{i})$ be the radius of the training pattern $x_{i}$ , such that

$\displaystyle R({x}_{i})=\{\arrowvert{x}_{i}-{x}_{r}\arrowvert-\varepsilon\},$ (8)

where ${x}_{r}$ is the nearest pattern of ${x}_{i}$ in X which belongs to a different class $\mathcal{C}_{j^{\prime}}$ from the class $\mathcal{C}_{j}$ of ${x}_{i}$ , $r\in\{1,\ldots,N\}\ \forall\ r\neq i$ , and $j=1,2,\ldots,c\ \forall\ j\neq j^{\prime}$ . This corresponds to the radius of the largest circle, drawn with center ${x}_{i}$ , which excludes training patterns in $X$ from any other class. In a binary framework this reduces to $\{\mathcal{C}_{j},\neg\mathcal{C}_{j}\}$ for $c=2,\mbox{ where }\neg\mathcal{C}\mbox{ is the compliment of }\mathcal{C}$ . Now the modified distance is scaled from Eqs (1) and (8) as

$\displaystyle d^{\prime}_{i0}=\frac{d_{i0}}{R({x}_{i})}\textrm{ for }i=1,2,% \ldots,N.$ (9)

Therefore if ${x}_{i}$ belongs to a region of high overlap then $R({x}_{i})$ is less with increasing $d^{\prime}_{i0}$ , and vice versa. The user-defined parameter $\varepsilon$ in Eq. (8) ensures that $R({x}_{i})<\arrowvert{x}_{i}-{x}_{r}\arrowvert$ , where $\varepsilon>0$ is an arbitrarily small real value. The distances of ${x}_{0}$ (the query pattern) from $\{{x}_{i}\}$ , computed using Eq. (9), are sorted in an ascending order to determine the set of $k$ nearest neighbors $\mathcal{N^{\prime}}_{k}({x}_{0})$ in this modified situation. The class label of ${x}_{0}$ is assigned from the majority vote of class labels in $\mathcal{N}^{\prime}_{k}({x}_{0})$ . Therefore patterns ${x}_{i}$ lying nearer the center of a class $\mathcal{C}_{j}$ have a larger $R({x}_{i})$ as compared to those towards the class boundaries.

Let us consider the two-class scenario of Fig. 1a, where the symbols “ $+$ ” and “*” indicate classes $\mathcal{C}_{j}$ and $\neg\mathcal{C}_{j}$ , respectively. It is evident that with $k=5$ , $k$ -NN would always assign ${x}_{0}$ to class $\mathcal{C}_{j}$ . Figure 1b depicts the radii $R({x}_{i})$ of these five neighbors using Eq. (8). Applying the scaled distance of Eq. (9), the five new nearest neighbors are shown in Fig. 1c. This is because the radii of all points in class $\mathcal{C}_{j}$ are less than their physical distance from ${x}_{0}$ . On the other hand, the modified distances computed by the training patterns of class $\neg\mathcal{C}_{j}$ are less than “one” because their corresponding radii are greater than their physical distances from ${x}_{0}$ . Hence SADM assigns ${x}_{0}$ to class $\neg\mathcal{C}_{j}$ .

However problems arise in SADM when the pattern distribution in the different classes is largely skewed and with varying class density. Thereby causing uneven domination by a class with a large radius and/or density, and leads to increased misclassification; and sometimes the smaller class could even get missed from detection. Thus query pattern ${x}_{0}$ from an overlapped region experiences greater influence from training patterns lying towards the class centers.

Figure 1.

Two-class discrimination. (a) Using $k$ -NN classifier, for $k=$ 5 with (b) radii of five nearest neighbors, marked on surrounding circle, and (c) using SADM classifier.

Figure 2.

Two-class skewed distribution.

The proposed algorithm is a new approach to handle skewed distributions by using the concept of adaptive distance modified by the class density factor. Let us intuitively explain with an example. Consider two class $\mathcal{C}_{1}$ and $\mathcal{C}_{2}$ of Fig. 2 having different densities. The training patterns $x_{i}$ are numbered in an ascending order, sorted according to their distance from the test pattern $x_{0}$ . It is obvious that $k$ -NN would assign $x_{0}$ to class $\mathcal{C}_{1}$ . Algorithm SADM defines a radius around every training pattern $x_{i}$ to compute the scaled distance $d^{\prime}_{i0}$ from $x_{0}$ using Eqs (8) and (9). Thereby $x_{0}$ would again be assigned to class $\mathcal{C}_{1}$ (as $x_{7}$ , $x_{14}$ , $x_{15}$ would become closest). But in reality $x_{0}$ belongs to class $\mathcal{C}_{2}$ . In order to circumvent this problem in underlying skewed distributions of data, involving widely varying class cardinalities, we introduce algorithm DA $k$ NN using the principle of density. The proposed algorithm, using the concept of density, estimates an adaptive distance for classifying the data point; thereby providing balanced weights to larger and smaller classes.

It is evident from the figure that $x_{1}$ and $x_{2}$ are the two (closest) training patterns from the two classes $\mathcal{C}_{1}$ and $\mathcal{C}_{2}$ , respectively. We incorporate the notion of density while computing the modified distance of $x_{i}$ $\in$ $\mathcal{C}_{j}$ as

$\displaystyle d^{\prime\prime}_{i0}=d^{\prime}_{i0}*d^{\prime}_{{\mathcal{C}_{% j}}0},$ (10)

where, $d^{\prime}_{i0}$ is defined by Eq. (9) and $d^{\prime}_{{\mathcal{C}_{j}}0}$ is the corresponding distance of the closest pattern (belonging to $\mathcal{C}_{j}$ ) to $x_{0}$ . Note that only the training patterns belonging to $\mathcal{C}_{j}$ are considered when evaluating $d^{\prime\prime}_{i0}\forall x_{i}\in\mathcal{C}_{j}$ . Thereby all patterns from $\mathcal{C}_{j}$ will come closer if $d^{\prime}_{{\mathcal{C}_{j}}0}<1$ [if $R(x_{i})>d_{i0}$ in Eq. (9)], and vice versa. The central tendency of $\mathcal{C}_{j}$ is calculated as

$\displaystyle\textit{Avg}(\mathcal{C}_{j})=\sum_{x_{i}\in\mathcal{C}_{j}}\frac% {d_{i0}}{|\mathcal{C}_{j}|},$ (11)

where $d_{i0}$ is given by Eq. (1) and $|\mathcal{C}_{j}|$ is the cardinality of the class $\mathcal{C}_{j}$ . Finally, using Eqs (10) and (11), we express the modified distance as

$\displaystyle d^{\prime\prime\prime}_{i0}=d^{\prime\prime}_{i0}*\textit{Avg}(% \mathcal{C}_{j}),\ \forall\ x_{i}\in\mathcal{C}_{j},\ \ j=1,2,\cdots,c.$ (12)

This helps emphasize the impact of the rare class on the computation of distance. The steps of algorithm is given below.

Algorithm 1: Algorithm DA $k$ NN:
1. At first for a given point $x_{0}$ , compute $d^{\prime}_{i0}$ using Eq. (9).
2. Find the nearest pattern, belonging to $\mathcal{C}_{j}$ , to compute $d^{\prime}_{{\mathcal{C}_{j}}0}$ from it.
3. Compute the modified distance $d^{\prime\prime}_{i0}$ using Eq. (10).
4. Compute the final modified distance $d^{\prime\prime\prime}_{i0}$ using Eq. (12).
5. Based on this $d^{\prime\prime\prime}_{i0}$ as the final modified distance, the proposed classifier is used to determine the class label of the query point.

4.1 Observations

•
Nearest neighbors may change after this distance modification.
•
$k$ -nearest neighbors may be completely different.
•
Voting can result in a different class label for $x_{0}$ .

The proposed algorithm may completely change the final results depending on the distribution of patterns from the different classes. It is observed from our experiments, in Section 4, that the results by our algorithm are generally better than those of existing $k$ -NN versions, as well as those employing over-sampling strategies like SMOTE and ADASYN.
5. Experimental results

The performance of the proposed algorithm DA $k$ NN was exhaustively evaluated and compared with state-of-the-art related methods, for classification and image segmentation in the framework of imbalanced data. These are elucidated in this section.

5.1 Classification

The performance of the algorithm DA $k$ NN was evaluated on 15 benchmark datasets, from which ten were two-class and the remaining five were multi-class. Among the ten two-class datasets, five were taken from KEEL [67], three from UCI [68] (Sonar, Heart, Liver) and two were synthetically generated (SYNTH_DATA1, SYNTH_BIG). Among the five multi-class datasets, three were taken from the publicly available UCI [68], one was a publicly available highly overlapped Vowel data [69], and the remaining one was synthetically generated (SYNTH_DATA2). The details of these datasets are shown in Table 1.

Table 1
Imbalanced datasets used

S. no.	Name	Source	Classes	Features	Instances	IR
1	glass-0-6_vs_5	KEEL	2	9	108	11.0
2	Ecoli-0_vs_1	KEEL	2	7	220	1.86
3	Pima	KEEL	2	8	768	1.87
4	Glass-0-1-2-3_vs_4-5-6	KEEL	2	9	214	3.20
5	Ecoli2	KEEL	2	7	336	5.46
6	Sonar	UCI	2	60	208	1.14
7	Heart	UCI	2	13	270	1.25
8	Liver	UCI	2	6	345	1.38
9	SYNTH_DATA1	Synthetic	2	2	400	1.67
10	SYNTH_BIG	Synthetic	2	2	20000	1.50
11	Glass	UCI	6	10	214	7.60
12	Ecoli	UCI	8	7	336	28.6
13	Wine	UCI	3	13	178	1.48
14	Vowel	UCI	6	3	871	2.86
15	SYNTH_DATA2	Synthetic	3	3	800	2.67

${}^{*}$ IR for multi-class datasets are measured by the ratio of smallest to largest.

All the datasets, chosen for our experiments, are imbalanced. The Imbalance Ratio (IR) was computed between the majority and minority class cardinalities [40]. In this scenario, only the classification accuracy is not sufficient to evaluate the performance of a classifier. Many authors suggest the use of geometric mean of the true positive and negative rates (GM) [70] to evaluate a classifier over imbalanced data. However, an increased GM with decreased overall classification accuracy becomes meaningless. Therefore we have used both the GM and the classification accuracy to evaluate the performance of a classifier. GM is defined as follows:

$\displaystyle GM=\sqrt{\frac{TP}{TP+FN}*\frac{TN}{FP+TN}},$ (13)

where TP is True Positive, TN is True Negative, FN is False Negative and FP is False Positive.

The performance of DA $k$ NN was first compared with SMOTE [5] and one of its variants ADASYN [38]. To evaluate SMOTE and ADASYN we have used k-NN as the base classifier. The results in Table 2 (with highest scores marked in bold), generated using Leave-One-Out (LOO) technique averaged over $k$ (where $k=$ 1, 3, and 5), depict that our DA $k$ NN (with an $\textit{Average-GM}(S.D.)=$ 0.8044 (0.1356)) generally outperforms SMOTE and ADASYN in the two-class scenario. Moreover, it is also found to be superior with respect to these methods in terms of classification accuracy. This leads to the conclusion that, unlike SMOTE, DA $k$ NN does not compromise overall classification accuracy to achieve a better GM. It is also observed that ADASYN and DA $k$ NN both consistently have a higher Accuracy as compared to GM values. Interestingly, in the case of data “Glass-0-6_vs_5”, ADASYN exhibits an adverse effect; this is perhaps due to the presence of overlapped data. On the whole, our algorithm DA $k$ NN was found to perform consistently better than SMOTE and ADASYN over all the datasets used.

Table 2

Comparative accuracy and GM averaged over $k$ , on two-class imbalanced data, using Leave-One-Out strategy

S. no.	Datasets	Accuracy			GM
		SMOTE	ADASYN	DA $k$ NN	SMOTE	ADASYN	DA $k$ NN
1	Glass-0-6_vs_5	0.9475	0.9259	0.9753	0.7613	0.1925	0.8976
2	Ecoli-0_vs_1	0.9788	0.9318	0.9909	0.9692	0.9000	0.9869
3	Pima	0.6007	0.7053	0.7352	0.6289	0.6507	0.6671
4	Glass-0-1-2-3_vs_4-5-6	0.9502	0.9424	0.9720	0.9603	0.9183	0.9681
5	Ecoli2	0.9137	0.9474	0.9633	0.9123	0.9040	0.9243
6	Liver	0.5072	0.6280	0.6512	0.4509	0.6131	0.6421
7	Sonar	0.8029	0.8365	0.8638	0.8033	0.8297	0.8578
8	Heart	0.6259	0.6802	0.6963	0.4545	0.6636	0.6840
9	SYNTH_DATA1	0.5900	0.7675	0.7700	0.6082	0.7242	0.7345
10	SYNTH_BIG	0.5148	0.6977	0.7257	0.5611	0.6692	0.6820
Mean		0.7432	0.8063	0.8344	0.7110	0.7065	0.8044
Standard deviation		0.1938	0.1250	0.1330	0.1982	0.2150	0.1356

The three synthetic datasets are depicted in Fig. 3, with Table 3 indicating the skewed class distribution and corresponding density differences (as evidenced from the Radius of circle encompassing the data and the number of points within it) between the classes.

Table 3

Synthetic data description

S. no.	Dataset	No. of classes	Class 1		Class 2		Class 3
			Radius	Points	Radius	Points	Radius	Points
			or axes
1	SYNTH_DATA1	2	120	150	80	250	–	–
2	SYNTH_BIG	2	120	12000	[550 and 100] ${}^{*}$	8000	–	–
3	SYNTH_DATA2	3	120	251	80	150	50	400

${}^{*}$ SYNTH_BIG: class 2 is an elliptic region with axes as 550 and 100.

Figure 3.

Synthetic data (a) SYNTH_DATA1, (b) SYNTH_DATA2, and (c) SYNTH_BIG.

In Table 4, the average GM of DA $k$ NN is compared with that of $k$ -NN, Fuzzy- $k$ NN (F $k$ NN) [61] and Pseudo- $k$ NN (P $k$ NN) [64] over $k=$ 1, 3 and 5, whereas in case of T $k$ NN [63] $\beta$ is taken as 1, 2 and 3, using Leave-One-Out (LOO) strategy. The best performance is marked in bold. As SADM was developed mainly for binary classification [62], therefore we did not consider it here. Our algorithm DA $k$ NN performed best in most cases.

Table 4

Comparative GM averaged over $k$ ( $\beta$ for T $k$ NN), on multi-class imbalanced data using Leave-One-Out strategy

S. no.	Datasets	$k$ -NN	F $k$ NN	T $k$ NN	P $k$ NN	DA $k$ NN
1	Glass	0.970	0.970	0.940	0.970	0.980
2	Ecoli	0.800	0.810	0.740	0.810	0.830
3	Wine	0.790	0.810	0.780	0.820	0.790
4	Vowel	0.840	0.830	0.840	0.850	0.860
5	SYNTH_DATA2	0.920	0.920	0.920	0.930	0.940
Mean		0.864	0.866	0.842	0.874	0.882
Standard deviation		0.0783	0.0729	0.0865	0.0706	0.0784

Of particular interest is dataset Ecoli, which contains some very small minority classes. Therefore GM becomes zero for all the algorithms compared. To circumvent this problem, the two classes having 2 points each were removed preceding the computation of GM. Except the wine data, where P $k$ NN performed better than the others, for the remaining datasets our algorithm DA $k$ NN performed the best and resulted in an Average-GM(S.D.) value of 0.882(0.0784).

5.2 MR image segmentation

Next the algorithm DA $k$ NN was employed for medical image segmentation on $T2$ sequence slices (one per patient) of ten sample patient MRIs from The Cancer Genome Atlas-Low Grade Glioma (TCGA-LGG) image dataset [71]. A 13 $\times$ 13 window was defined around each pixel, from every MRI slice. Next features, like pixel intensity, Histogram of Oriented Gradients (HOG), standard deviation and mean value, were extracted from each such window. Leave-One-Out (LOO) strategy (one image for testing and remaining nine images for training) was used.

Table 5 represents the accuracy, GM, and the Dice score of the segmentation, as compared to the ground truth (manually segmented by radiologist). Dice score is intended to be applied on binary data. Considering the boolean definition of true positive (TP), false positive (FP), and false negative (FN), we have

$\displaystyle\textit{Dice}={\frac{2TP}{2TP+FP+FN}}.$ (14)

It is evident that our algorithm DA $k$ NN performs good segmentation of the LGG brain tumor MRI. As SMOTE or ADASYN generate imaginary points in the feature space corresponding to any image, and since the reality of such samples remains unknown, therefore we did not compare DA $k$ NN with these algorithms on the image data. It is observed from Table 5 that our algorithm performed very well on serial nos. 2, 3, 4, 6, 8, with Dice greater than 0.9. The overall Average-Dice(S.D.) score was found to be 0.8830 (0.0513) . A choice of $k=$ 3 was made after several experiments. The overall Average-Accuracy(S.D.) 0.9706 (0.0133) and the Average-GM(S.D.) over all images was 0.9080 (0.0555) . This demonstrates that the smaller class, consisting of the tumor region, was detected and delineated well.

The enhanced (histogram equalized) images were then considered for segmentation, and exhibited slight improvement as edvident from Table 6. With a high IR of 5.0182, in serial no. 4, the algorithm generates the highest accuracy, GM and Dice. Even with a small IR, in serial no. 1, the results are quite good. For a very small tumor region in serial nos. 2, 9, the value of Dice is relatively low but GM is very high. Although IR is very high in serial nos. 8, 10, yet the detection rates are found to be very good. It can be concluded from Table 6 that IR is not very important in defining the accuracy, GM and Dice, thereby establishing the robustness of our algorithm to IR.

The choice of DA $k$ NN, which has performed better in all the cases, can be justified from Tables 2 and 4–6. Again Tables 5 and 6 demonstrate that there is an improvement on the segmentation after enhancement with histogram equalization on MR images. Enhancement of the images, using other techniques like filtering, noise removal, bias field correction, etc. can further improve the output.

Table 5

Accuracy, GM, and Dice score of tumor segmentation using DA $k$ NN on TCGA-LGG dataset

S. no.	Patient ID	IR	Accuracy	GM	Dice
1	393	4.2726	0.9399	0.8265	0.8117
2	298	4.3688	0.9779	0.9511	0.9388
3	109	4.8453	0.9694	0.9226	0.9056
4	402	5.0182	0.9792	0.9462	0.9349
5	2	5.0868	0.9641	0.9634	0.8979
6	152	5.9482	0.9748	0.9457	0.9119
7	249	8.1351	0.9631	0.8200	0.7997
8	11	10.7733	0.9838	0.9342	0.9020
9	15	15.3012	0.9831	0.8627	0.8441
10	8	17.8853	0.9798	0.9613	0.8314
Mean		8.1635	0.9706	0.9080	0.8830
Standard deviation			0.0133	0.0555	0.0513

Figure 4.

TCGA-LGG T2 MR images. (a) Original, and (b) Enhanced (histogram equalized) views. (c) Ground truth. Segmented output using: (d) original, and (e) enhanced images.

Table 6

Accuracy, GM, and Dice of tumor segmentation using DA $k$ NN on enhanced (histogram-equalized) TCGA-LGG dataset

S. no.	Patient ID	IR	Accuracy	GM	Dice
1	393	4.2726	0.9832	0.9693	0.9553
2	298	4.3688	0.9658	0.9072	0.8998
3	109	4.8453	0.9773	0.9476	0.9318
4	402	5.0182	0.9914	0.9806	0.9740
5	2	5.0868	0.9696	0.9490	0.9085
6	152	5.9482	0.9848	0.9582	0.9458
7	249	8.1351	0.9771	0.8943	0.8845
8	11	10.7733	0.9894	0.9544	0.9360
9	15	15.3012	0.9609	0.9530	0.7478
10	8	17.8853	0.9896	0.9312	0.8986
Mean		8.1635	0.9789	0.9445	0.9082
Standard deviation			0.0107	0.0267	0.0630

Finally the segmented images were visually compared with the ground truth in Fig. 4. The ground truth depicts the affected tumor region in white in part (c). While part (a) corresponds to the original brain MRI, in $T2$ , part (b) represents the enhanced (histogram equalized) image. The segmented image, using DA $k$ NN on the original image, is presented in part (d). Analogously, the output of DA $k$ NN on the histogram equalized image is illustrated in part (e).The Red and Green pixels, in the segmented output of parts (d)-(e), correspond to the misclassified tumor and non-tumor regions, respectively. This corroborates the results of Tables 5 and 6, that DA $k$ NN performs very well in LGG tumor segmentation of brain images.

6. Conclusions

Handling imbalanced data is a major problem in pattern recognition and machine learning. Although several algorithms have been developed for this purpose, yet none of them consider the concept of data density; which constitutes an inherent problem in imbalanced data. Principles of Â over-sampling and under-sampling have also been employed to solve the problem of imbalanced data, but are unfortunately not suitable or appropriate in many domains. Therefore we have introduced a novel algorithm DA $k$ NN, which takes care of density and adaptive distance while classifying the pattern points without employing any sampling technique. DA $k$ NN was tested and compared on fifteen datasets consisting of two-classes and multiple classes, to establish its superiority. Its effectiveness on MRI image segmentation was also successfully demonstrated, both qualitatively and quantitatively.

References

Japkowicz

and Stephen

, The class imbalance problem: a systematic study, Intelligent Data Analysis 6(5) (2002), 429–449.

and Garcia

E.A.

, Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering 21(9) (2009), 1263–1284.

Guo

Zhou

C.-A.

She

and Xu

, Ensemble based on feature projection and under-sampling for imbalanced learning, Intelligent Data Analysis 22(5) (2018), 959–980.

Zou

Feng

and Jiang

, Improved over-sampling techniques based on sparse representation for imbalance problem, Intelligent Data Analysis 22(5) (2018), 939–958.

Chawla

N.V.

Bowyer

K.W.

Hall

L.O.

and Kegelmeyer

W.P.

, SMOTE: synthetic minority over-ampling TEchnique, Journal of Artificial Intelligence Research 16 (2002), 321–357.

Yang

and Wu

, 10 challenging problems in data mining research, International Journal of Information Technology & Decision Making 5(4) (2006), 597–604.

Liu

Dou

and Liu

, Helical fault diagnosis model based on data-driven incremental mergence, Computers & Industrial Engineering (In Press) (2018), 1–16.

Santos

Maudes

and Bustillo

, Identifying maximum imbalance in datasets for fault diagnosis of gearboxes, Journal of Intelligent Manufacturing 29(2) (2018), 333–351.

Yang

Tang

Shintemirov

and Wu

, Association rule mining-based dissolved gas analysis for fault diagnosis of power transformers, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 39(6) (2009), 597–610.

10.

Zhu

Z.-B.

and Song

Z.-H.

, Fault diagnosis based on imbalance modified kernel fisher discriminant analysis, Chemical Engineering Research and Design 88(8) (2010), 936–951.

11.

Buzau

M.-M.

Tejedor-Aguilera

Cruz-Romero

and G’omez-Exp’osito

, Detection of non-technical losses using smart meter data and supervised learning, IEEE Transactions on Smart Grid (accepted) (2018).

12.

Khreich

Granger

Miri

and Sabourin

, Iterative boolean combi-nation of classifiers in the ROC space: an application to anomaly detection with HMMs, Pattern Recognition 43(8) (2010), 2732–2752.

13.

Tavallaee

Stakhanova

and Ghorbani

A.A.

, Toward credible evalua-tion of anomaly-based intrusion-detection methods, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40(5) (2010), 516–524.

14.

Fawcett

and Provost

, Adaptive fraud detection, Data Mining and Knowledge Discovery 1(3) (1997), 291–316.

15.

Molinari

Raghavendra

Gudigar

Meiburger

K.M.

and Acharya

U.R.

, An efficient data mining framework for the characterization of symptomatic and asymptomatic carotid plaque using bidimensional empirical mode decomposition technique, Medical & Biological Engineering & Computing (2018), 1–15.

16.

Gao

Hao

Zhang

and Han

, Predicting pathological response to neoadjuvant chemotherapy in breast cancer patients based on imbalanced clinical data, Personal Ubiquitous Computing 22(5–6) (2018), 1039–1047.

17.

Pliakos

and Vens

, Mining features for biomedical data using clustering tree ensembles, Journal of Biomedical Informatics 85 (2018), 40–48.

18.

Nejatian

Parvin

and Faraji

, Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification, Neurocomputing 276 (2018), 55–66.

19.

Mazurowski

M.A.

Habas

P.A.

Zurada

J.M.

J.Y.

Baker

J.A.

and Tourassi

G.D.

, Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance, Neural Networks 21(2) (2008), 427–436.

20.

Liu

Hall

L.O.

Bowyer

K.W.

Goldgof

D.B.

Gatenby

and Ahmed

K.B.

, Synthetic minority image over-sampling technique: how to improve AUC for glioblastoma patient survival prediction, in: Systems, Man, and Cybernetics (SMC), 2017 IEEE International Conference on, 2017, pp. 1357–1362.

21.

Liu

Y.-H.

and Chen

Y.-T.

, Total margin based adaptive fuzzy support vector machines for multiview face recognition, in: Proceedings of IEEE International Conference on Systems, Man and Cybernetics, IEEE, Vol. 2, 2005, pp. 1704–1711.

22.

and Liu

, A comparative study of the class imbalance problem in Twitter spam detection, Concurrency and Computation: Practice and Experience 30(5) (2018), e4281.

23.

Wang

Bao

and Li

, Vehicle classification using an imbalanced dataset based on a single magnetic sensor, Sensors 18(6) (2018), 1690.

24.

Heydari

S.S.

and Mountrakis

, Effect of classifier selection, reference sample size, reference class distribution and scene heterogeneity in per-pixel classification accuracy using 26 Landsat sites, Remote Sensing of Environment 204 (2018), 648–658.

25.

Thanh Noi

and Kappas

, Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery, Sensors 18(1) (2017), 18.

26.

Castellanos

F.J.

Valero-Mas

J.J.

Calvo-Zaragoza

and Rico-Juan

J.R.

, Oversampling imbalanced data in the string space, Pattern Recognition Letters 103 (2018), 32–38.

27.

Ryu

H.J.

Mitchell

and Adam

, Improving smiling detection with race and gender diversity, arXiv preprint arXiv:171200193. (2017).

28.

Huang

Loy

C.C.

and Tang

, Deep imbalanced learning for face recognition and attribute prediction, arXiv preprint arXiv:180600194. (2018).

29.

Wang

Tang

Wang

and Xun

, Intelligent operation of heavy haul train with data imbalance: a machine learning method, Knowledge-Based Systems (accepted) (2018).

30.

Liu

and Zio

, A scalable fuzzy support vector machine for fault detection in transportation systems, Expert Systems with Applications 102 (2018), 36–43.

31.

Japkowicz

et al., Learning from imbalanced data sets: A comparison of various strategies, in: Proceedings of AAAI Workshop on Learning from Imbalanced Data Sets, Menlo Park, CA, Vol. 68, 2000, pp. 10–15.

32.

Van Hulse

Khoshgoftaar

T.M.

and Napolitano

, Experimental perspectives on learning from imbalanced data, in: Proceedings of the 24th International Conference on Machine Learning, ACM, 2007, pp. 935–942.

33.

Batista

G.E.

Prati

R.C.

and Monard

M.C.

, A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter 6(1) (2004), 20–29.

34.

Douzas

and Bacao

, Effective data generation for imbalanced learn- ing using conditional generative adversarial networks, Expert Systems with Applications 91 (2018), 464–471.

35.

Chawla

N.V.

Japkowicz

and Kotcz

, Special issue on learning from imbalanced data sets, ACM SIGKDD Explorations Newsletter 6(1) (2004), 1–6.

36.

Koziarski

and Wożniak

, CCR: a combined cleaning and resampling algorithm for imbalanced data classification, International Journal of Applied Mathematics and Computer Science 27(4) (2017), 727–736.

37.

Han

Wang

W.-Y.

and Mao

B.-H.

, Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, Advances in Intelligent Computing (2005), 878–887.

38.

Bai

Garcia

E.A.

and Li

, ADASYN: ADAptive SYNthetic sampling approach for imbalanced learning, in: Proceedings of IEEE International Joint Conference on Neural Networks (IJCNN), 2008, pp. 1322–1328.

39.

Yen

S.-J.

and Lee

Y.-S.

, Cluster-based under-sampling approaches for imbalanced data distributions, Expert Systems with Applications 36(3) (2009), 5718–5727.

40.

Nikpour

and Nezamabadi-pour

, HTSS: a hyper-heuristic training set selection method for imbalanced data sets, Iran Journal of Computer Science 1(2) (2018), 109–128.

41.

Garc’ıa

Zhang

Z.-L.

Altalhi

Alshomrani

and Herrera

, Dynamic ensemble selection for multi-class imbalanced datasets, Information Sciences 445 (2018), 22–37.

42.

Feng

Huang

and Ren

, Class imbalance ensemble learning based on the margin theory, Applied Sciences 8(5) (2018), 815.

43.

Mullick

S.S.

Datta

and Das

, Adaptive learning-based k-nearest neighbor classifiers with resilience to class imbalance, IEEE Transactions on Neural Networks and Learning Systems (accepted) (2018).

44.

Cadenas

J.M.

Garrido

M.C.

Mart’ınez

Muñoz

and Bonissone

P.P.

, A fuzzy k-nearest neighbor classifier to deal with imperfect data, Soft Computing 22(10) (2018), 3313–3330.

45.

Zhang

Zong

Zhu

and Wang

, Efficient kNN classification with different numbers of nearest neighbors, IEEE Transactions on Neural Networks and Learning Systems 29(5) (2018), 1774–1785.

46.

Zhao

Zhang

and Qin

, kNN-DP: handling data skewness in kNN joins using MapReduce, IEEE Transactions on Parallel and Distributed Systems 29(3) (2018), 600–613.

47.

Zhang

Kotagiri

Tari

and Cheriet

, kRNN: k rare-class nearest neighbour classification, Pattern Recognition 62 (2017), 33–44.

48.

Lin

W.-J.

and Chen

J.J.

, Class-imbalanced classifiers for high-dimensional data, Briefings in Bioinformatics 14(1) (2012), 13–26.

49.

Sun

Wong

A.K.

and Kamel

M.S.

, Classification of imbalanced data: a review, International Journal of Pattern Recognition and Artificial Intelligence 23(4) (2009), 687–719.

50.

Galar

Fernandez

Barrenechea

Bustince

and Herrera

, A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(4) (2012), 463–484.

51.

Haixiang

Yijing

Shang

Mingyun

Yuanyue

and Bing

, Learning from class-imbalanced data: review of methods and applications, Expert Systems with Applications 73 (2017), 220–239.

52.

Stefanowski

and Wilk

, Combining rough sets and rule based classifiers for handling imbalanced data, Fundamenta Informaticae 72(1–3) (2006).

53.

Napierala

and Stefanowski

, Post-processing of BRACID rules induced from imbalanced data, Fundamenta Informaticae 148(1–2) (2016), 51–64.

54.

Cover

and Hart

, Nearest neighbor pattern classification, IEEE Transactions on Information Theory 13(1) (1967), 21–27.

55.

Hellman

M.E.

, The nearest neighbor classification rule with a reject option, IEEE Transactions on Systems Science and Cybernetics 6(3) (1970), 179–185.

56.

Fukunaga

and Hostetler

, K-nearest-neighbor bayes-risk estimation, IEEE Transactions on Information Theory 21(3) (1975), 285–293.

57.

Dudani

S.A.

, The distance-weighted k-nearest-neighbor rule, IEEE Transactions on Systems, Man, and Cybernetics 6(4) (1976), 325–327.

58.

Bailey

and Jain

A.K.

, A note on distance-weighted k-nearest neighbor rules, IEEE Transactions on Systems, Man, and Cybernetics 8(4) (1978), 311–313.

59.

Bermejo

and Cabestany

, Adaptive soft k-nearest-neighbour classifiers, Pattern Recognition 33(12) (2000), 1999–2005.

60.

Jóówik

, A learning scheme for a fuzzy k-NN rule, Pattern Recognition Letters 1(5–6) (1983), 287–289.

61.

Keller

J.M.

Gray

M.R.

and Givens

J.A.

, A fuzzy k-nearest neighbor algozrithm, IEEE Transactions on Systems, Man, and Cybernetics 15(4) (1985), 580–585.

62.

Wang

Neskovic

and Cooper

L.N.

, Improving nearest neighbor rule with a simple adaptive distance measure, Pattern Recognition Letters 28(2) (2007), 207–213.

63.

Basu

and Murthy

C.A.

, Towards enriching the quality of k-nearest neighbor rule for document classification, International Journal of Machine Learning and Cybernetics 5(6) (2014), 897–905.

64.

Zeng

Yang

and Zhao

, Pseudo nearest neighbor rule for pattern classification, Expert Systems with Applications 36(2) (2009), 3587–3595.

65.

Fern’andez

Del Jesus

M.J.

and Herrera

, Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning, in: Proceeding of International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Springer, 2010, pp. 89–98.

66.

Wang

and Yao

, Multiclass imbalance problems: Analysis and poten- tial solutions, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42(4) (2012), 1119–1130.

67.

Alcal’a-Fdez

Fern’andez

Luengo

Derrac

Garc’ıa

S’anchez

and Herrera

, KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework, Journal of Multiple-Valued Logic and Soft Computing 17(2–3) (2011), 255–287.

68.

Dheeru

and Karra Taniskidou

, UCI Machine Learning Repository, 2017. http://archive.ics.uci.edu/ml.

69.

Pal

S.K.

and Dutta Majumder

, Fuzzy sets and decision making approaches in vowel and speaker recognition, IEEE Transactions on Systems, Man, and Cybernetics 7(8) (1977), 625–629.

70.

Barandela

S’anchez

J.S.

Garc’ıa

and Rangel

, Strategies for learning in class imbalance problems, Pattern Recognition 36(3) (2003), 849–851.

71.

Clark

Vendt

Smith

Freymann

Kirby

Koppel

Moore

Phillips

Maffitt

Pringle

et al., The cancer imaging archive (TCIA): maintaining and operating a public information repository, Journal of Digital Imaging 26(6) (2013), 1045–1057.

A novel adaptive k -NN classifier for handling imbalance: Application to brain MRI

Abstract

Keywords

1. Introduction

1.1 Contribution

2. Related works

3. Nearest neighbour classification and some variants of k-NN

5.1 Classification

Table 1 Imbalanced datasets used

References

Table 1
Imbalanced datasets used